1 00:00:07,680 --> 00:00:13,410 In this video, we will take a look at an easy-to-use, graphical way to build machine learning models 2 00:00:13,410 --> 00:00:15,260 and pipelines. 3 00:00:15,260 --> 00:00:21,960 SPSS Modeler Flows is a part of Watson Studio, which was inspired by another product, IBM 4 00:00:21,960 --> 00:00:23,880 SPSS Modeler. 5 00:00:23,880 --> 00:00:26,760 We'll discuss that product in a later unit. 6 00:00:26,760 --> 00:00:31,420 Let’s have a look again at the overview of different tool categories. 7 00:00:31,420 --> 00:00:37,539 Modeler flows include some data management capabilities, as well as tools for data preparation, 8 00:00:37,539 --> 00:00:40,540 visualization, and model building. 9 00:00:40,540 --> 00:00:45,410 All flows are created using a drag-and-drop editor and consist of “nodes” of various 10 00:00:45,410 --> 00:00:51,470 types, with data “flowing” from one node to the next according to their connections. 11 00:00:51,470 --> 00:00:57,110 A sample Modeler flow shown here includes two data source nodes shown in purple on the 12 00:00:57,110 --> 00:01:04,180 left; type, aggregate, filter, merge, filler, and partition nodes in the middle; 2 model 13 00:01:04,180 --> 00:01:06,969 building nodes shown in pentagons. 14 00:01:06,969 --> 00:01:12,549 Once a flow is executed and the models are built, the upside-down pentagon “model nuggets” 15 00:01:12,549 --> 00:01:13,979 are created. 16 00:01:13,979 --> 00:01:18,330 They can be used to see information about the models and to get predictions for new 17 00:01:18,330 --> 00:01:19,820 data. 18 00:01:19,820 --> 00:01:25,040 And the three green square nodes on the right provide model evaluation information in the 19 00:01:25,040 --> 00:01:28,400 form of tables and charts. 20 00:01:28,400 --> 00:01:34,010 You can build your SPSS Modeler flows by dragging different types of nodes from the left, the 21 00:01:34,010 --> 00:01:39,830 part of the screen called the “palette,” to the "canvas," the main part of the screen. 22 00:01:39,830 --> 00:01:44,900 Each flow starts with one or more data sources located in the “Import” group, and can 23 00:01:44,900 --> 00:01:48,480 include some or all other types of nodes. 24 00:01:48,480 --> 00:01:52,710 Watson Studio provides some sample flows to help new users. 25 00:01:52,710 --> 00:01:58,610 In the Drug Study example shown here, we are using a small artificial data set. 26 00:01:58,610 --> 00:02:04,000 The target variable is a categorical field, “Drug,” that has five categories, and 27 00:02:04,000 --> 00:02:06,790 there are several predictor variables. 28 00:02:06,790 --> 00:02:12,040 This flow creates a new “derived” field by dividing the values of one of the predictors 29 00:02:12,040 --> 00:02:18,430 by values of another one, and at the end builds a small neural network model and a decision 30 00:02:18,430 --> 00:02:19,450 tree model. 31 00:02:19,450 --> 00:02:24,569 When a user clicks the “Run” button on the top panel, denoted by a triangle, the 32 00:02:24,569 --> 00:02:27,659 flow is executed and the models build. 33 00:02:27,659 --> 00:02:32,849 This is reflected in the new pentagon nodes, called “model nuggets,” that display under 34 00:02:32,849 --> 00:02:34,939 each model node. 35 00:02:34,939 --> 00:02:39,769 If you click on the three dots in the upper right corner of one of those nodes and select 36 00:02:39,769 --> 00:02:44,310 “View Model”, you will see various types of model information. 37 00:02:44,310 --> 00:02:50,519 By connecting new data sources to the model nugget, you can get predictions on new data. 38 00:02:50,519 --> 00:02:55,629 The first window in the model viewer shows model accuracy and related measures, such 39 00:02:55,629 --> 00:02:58,219 as precision and recall. 40 00:02:58,219 --> 00:03:04,239 This toy data example enabled us to get perfect accuracy, which is normally not the case with 41 00:03:04,239 --> 00:03:06,110 real life data. 42 00:03:06,110 --> 00:03:11,180 The Confusion Matrix view shows how model predictions on the training data matched the 43 00:03:11,180 --> 00:03:13,950 observed target values. 44 00:03:13,950 --> 00:03:18,819 Once again, in this toy example all cases were classified correctly. 45 00:03:18,819 --> 00:03:24,239 We can also look at Model Information, which displays a table that tells us more about 46 00:03:24,239 --> 00:03:26,379 the details of the model. 47 00:03:26,379 --> 00:03:31,599 Feature Importance displays a diagram that indicates the relative predictive strength 48 00:03:31,599 --> 00:03:33,790 of various model inputs. 49 00:03:33,790 --> 00:03:39,249 Finally, the Network Diagram gives a visual representation of the neural network model 50 00:03:39,249 --> 00:03:41,169 we built. 51 00:03:41,169 --> 00:03:46,069 On the left is the input layer, with units corresponding to each continuous predictor 52 00:03:46,069 --> 00:03:52,809 and each category of the categorical predictors, plus a bias unit that is usually present in 53 00:03:52,809 --> 00:03:55,549 each layer of a neural network. 54 00:03:55,549 --> 00:04:02,199 In the middle, we see a “hidden layer” with 7 units, or neurons, and a bias unit. 55 00:04:02,199 --> 00:04:09,299 On the right is the output layer with 5 units corresponding to the five target categories. 56 00:04:09,299 --> 00:04:14,909 Controls on the right and bottom of the diagram enable some interactive exploration of the 57 00:04:14,909 --> 00:04:15,909 model. 58 00:04:15,909 --> 00:04:21,010 The colors of the connections between units indicate the values of the weights on those 59 00:04:21,010 --> 00:04:22,290 connections. 60 00:04:22,290 --> 00:04:27,430 We can also look at the decision tree model built using the C5 algorithm. 61 00:04:27,430 --> 00:04:32,460 A Model Information table and Feature Importance chart appear as before. 62 00:04:32,460 --> 00:04:37,720 Additionally, a Top Decision Rules table is displayed. 63 00:04:37,720 --> 00:04:42,220 Decision tree models are popular because they have a special structure that makes it easy 64 00:04:42,220 --> 00:04:46,319 to explain predictions or extract decision rules. 65 00:04:46,319 --> 00:04:50,430 The tree diagram is also displayed. 66 00:04:50,430 --> 00:04:55,229 On the left side of the canvas, we see a part of the model palette that can be used in the 67 00:04:55,229 --> 00:04:56,229 flows. 68 00:04:56,229 --> 00:05:01,990 At the top are “Auto Classifier” and “Auto Numeric” nodes that can be used for categorical 69 00:05:01,990 --> 00:05:05,380 and continuous targets, respectively. 70 00:05:05,380 --> 00:05:11,689 Those nodes will build several kinds of models and pick the best one based on a certain criterion. 71 00:05:11,689 --> 00:05:18,740 Later, we will talk about the AutoAI feature of Watson Studio; AutoAI takes this capability 72 00:05:18,740 --> 00:05:24,669 to the next level by automatically finding not only the best model, but an entire data 73 00:05:24,669 --> 00:05:29,139 pipeline, which includes various data transformations. 74 00:05:29,139 --> 00:05:34,069 In this video, you've learned how Modeler Flows in Watson Studio can help analysts to 75 00:05:34,069 --> 00:05:39,509 create powerful machine learning pipelines using a graphical interface without the need 76 00:05:39,509 --> 00:05:41,699 to write any code. 77 00:05:41,699 --> 00:05:45,410 This feature was based on IBM SPSS Modeler. 78 00:05:45,410 --> 00:05:51,560 Next, after completing a lab to give you hands-on experience with this powerful technology, 79 00:05:51,560 --> 00:05:58,629 we will take a look at two other IBM products that can be used for Data Science: IBM SPSS 80 00:05:58,629 --> 00:06:01,840 Modeler and IBM SPSS Statistics.