1 00:00:07,540 --> 00:00:15,010 In earlier sections we saw how IBM SPSS Modeler and Watson Studio Modeler flows allow you 2 00:00:15,010 --> 00:00:21,001 to graphically create a stream or flow that includes data transformation steps and machine 3 00:00:21,001 --> 00:00:23,180 learning models. 4 00:00:23,180 --> 00:00:29,040 Such sequences of steps are called data pipelines or ML pipelines. 5 00:00:29,040 --> 00:00:34,110 This section examines a feature of Watson Studio that helps to automate the creation 6 00:00:34,110 --> 00:00:36,520 of machine learning pipelines. 7 00:00:36,520 --> 00:00:42,430 This allows data scientists to produce results much faster and to focus on more creative 8 00:00:42,430 --> 00:00:44,290 work. 9 00:00:44,290 --> 00:00:48,090 There is currently a shortage of qualified data scientists. 10 00:00:48,090 --> 00:00:53,220 Many operations that a data scientist typically performs are repetitive and time-consuming. 11 00:00:53,220 --> 00:01:00,760 Therefore, automating some of that repetitive work will help free up both new and experienced 12 00:01:00,760 --> 00:01:06,100 data scientists to do the important work that they are trained to do. 13 00:01:06,100 --> 00:01:13,400 The AutoAI system was developed by IBM Research experts in collaboration with IBM Distinguished 14 00:01:13,400 --> 00:01:18,830 Engineer and two-time Kaggle Grandmaster Jean-Francois Puget. 15 00:01:18,830 --> 00:01:24,060 It provides a graphical interface to create and deploy machine learning models with real 16 00:01:24,060 --> 00:01:26,750 time visualizations. 17 00:01:26,750 --> 00:01:32,140 AutoAI automatically performs typical machine learning steps, such as: 18 00:01:32,140 --> 00:01:34,690 Data preparation Model selection 19 00:01:34,690 --> 00:01:38,890 Feature engineering Hyper-parameter optimization 20 00:01:38,890 --> 00:01:43,460 Users can view the progress on the graphical interface. 21 00:01:43,460 --> 00:01:48,530 This example shows the training of a model to predict whether or not a customer is likely 22 00:01:48,530 --> 00:01:52,140 to buy a tent from an outdoor equipment store. 23 00:01:52,140 --> 00:01:54,560 We start with structured data. 24 00:01:54,560 --> 00:01:59,549 In this historical data, there are four feature, or “predictor,” columns: 25 00:01:59,549 --> 00:02:03,790 GENDER: The customer’s gender AGE: The customer’s age 26 00:02:03,790 --> 00:02:09,080 MARITAL_STATUS: “Married”, “Single”, or “Unspecified” 27 00:02:09,080 --> 00:02:12,760 and PROFESSION: The general category of the customer’s 28 00:02:12,760 --> 00:02:18,150 profession, such “Hospitality” or “Sales”, or simply “Other.” 29 00:02:18,150 --> 00:02:23,469 The model will learn to predict the value for the IS_TENT column; that is, whether or 30 00:02:23,469 --> 00:02:26,260 not the customer bought a tent. 31 00:02:26,260 --> 00:02:32,840 After we choose IS_TENT as the column to predict, AutoAI analyzes the data and determines that 32 00:02:32,840 --> 00:02:40,499 the IS_TENT column contains True/False information, making this data suitable for a binary classification 33 00:02:40,499 --> 00:02:41,579 model. 34 00:02:41,579 --> 00:02:48,190 The default metric for a binary classification is ROC/AUC. 35 00:02:48,190 --> 00:02:53,819 After we click Run experiment, an infographic shows the process of building the pipelines 36 00:02:53,819 --> 00:02:55,859 as the model trains. 37 00:02:55,859 --> 00:03:00,809 Once the pipeline creation is complete, we can view and compare the ranked pipelines 38 00:03:00,809 --> 00:03:02,739 in a leaderboard. 39 00:03:02,739 --> 00:03:08,709 The pipelines for the sample binary classification model are quite uniform because of the underlying 40 00:03:08,709 --> 00:03:10,519 sample data. 41 00:03:10,519 --> 00:03:16,749 To see pipelines in action, re-run the experiment as a regression experiment to predict purchase 42 00:03:16,749 --> 00:03:17,999 amount. 43 00:03:17,999 --> 00:03:22,709 That experiment gives better variation in the resulting pipelines. 44 00:03:22,709 --> 00:03:27,859 After clicking “Pipeline comparison,” we can see how the pipelines differ on various 45 00:03:27,859 --> 00:03:30,269 measures of model quality. 46 00:03:30,269 --> 00:03:35,720 The pipelines can be saved as Machine Learning assets in the Watson Studio project. 47 00:03:35,720 --> 00:03:39,349 Then they can be deployed and tested. 48 00:03:39,349 --> 00:03:45,449 Currently AutoAI is available only for classification and regression models; there is a plan to 49 00:03:45,449 --> 00:03:48,760 add time series model support in the future. 50 00:03:48,760 --> 00:03:54,690 In this unit, you have learned how AutoAI automates typical data science tasks and helps 51 00:03:54,690 --> 00:04:00,999 get better performing data pipelines more quickly, while also simplifying pipeline deployment 52 00:04:00,999 --> 00:04:04,709 into production in Watson Machine Learning. 53 00:04:04,709 --> 00:04:09,969 In the next section, we will discuss Watson OpenScale, which helps to ensure that your 54 00:04:09,969 --> 00:04:13,640 models are fair, explainable, and up to date.