1
00:00:07,540 --> 00:00:15,010
In earlier sections we saw how IBM SPSS Modeler
and Watson Studio Modeler flows allow you

2
00:00:15,010 --> 00:00:21,001
to graphically create a stream or flow that
includes data transformation steps and machine

3
00:00:21,001 --> 00:00:23,180
learning models.

4
00:00:23,180 --> 00:00:29,040
Such sequences of steps are called data pipelines
or ML pipelines.

5
00:00:29,040 --> 00:00:34,110
This section examines a feature of Watson
Studio that helps to automate the creation

6
00:00:34,110 --> 00:00:36,520
of machine learning pipelines.

7
00:00:36,520 --> 00:00:42,430
This allows data scientists to produce results
much faster and to focus on more creative

8
00:00:42,430 --> 00:00:44,290
work.

9
00:00:44,290 --> 00:00:48,090
There is currently a shortage of qualified
data scientists.

10
00:00:48,090 --> 00:00:53,220
Many operations that a data scientist typically
performs are repetitive and time-consuming.

11
00:00:53,220 --> 00:01:00,760
Therefore, automating some of that repetitive
work will help free up both new and experienced

12
00:01:00,760 --> 00:01:06,100
data scientists to do the important work that
they are trained to do.

13
00:01:06,100 --> 00:01:13,400
The AutoAI system was developed by IBM Research
experts in collaboration with IBM Distinguished

14
00:01:13,400 --> 00:01:18,830
Engineer and two-time Kaggle Grandmaster Jean-Francois
Puget.

15
00:01:18,830 --> 00:01:24,060
It provides a graphical interface to create
and deploy machine learning models with real

16
00:01:24,060 --> 00:01:26,750
time visualizations.

17
00:01:26,750 --> 00:01:32,140
AutoAI automatically performs typical machine
learning steps, such as:

18
00:01:32,140 --> 00:01:34,690
Data preparation
Model selection

19
00:01:34,690 --> 00:01:38,890
Feature engineering
Hyper-parameter optimization

20
00:01:38,890 --> 00:01:43,460
Users can view the progress on the graphical
interface.

21
00:01:43,460 --> 00:01:48,530
This example shows the training of a model
to predict whether or not a customer is likely

22
00:01:48,530 --> 00:01:52,140
to buy a tent from an outdoor equipment store.

23
00:01:52,140 --> 00:01:54,560
We start with structured data.

24
00:01:54,560 --> 00:01:59,549
In this historical data, there are four feature,
or “predictor,” columns:

25
00:01:59,549 --> 00:02:03,790
GENDER: The customer’s gender
AGE: The customer’s age

26
00:02:03,790 --> 00:02:09,080
MARITAL_STATUS: “Married”, “Single”,
or “Unspecified”

27
00:02:09,080 --> 00:02:12,760
and
PROFESSION: The general category of the customer’s

28
00:02:12,760 --> 00:02:18,150
profession, such “Hospitality” or “Sales”,
or simply “Other.”

29
00:02:18,150 --> 00:02:23,469
The model will learn to predict the value
for the IS_TENT column; that is, whether or

30
00:02:23,469 --> 00:02:26,260
not the customer bought a tent.

31
00:02:26,260 --> 00:02:32,840
After we choose IS_TENT as the column to predict,
AutoAI analyzes the data and determines that

32
00:02:32,840 --> 00:02:40,499
the IS_TENT column contains True/False information,
making this data suitable for a binary classification

33
00:02:40,499 --> 00:02:41,579
model.

34
00:02:41,579 --> 00:02:48,190
The default metric for a binary classification
is ROC/AUC.

35
00:02:48,190 --> 00:02:53,819
After we click Run experiment, an infographic
shows the process of building the pipelines

36
00:02:53,819 --> 00:02:55,859
as the model trains.

37
00:02:55,859 --> 00:03:00,809
Once the pipeline creation is complete, we
can view and compare the ranked pipelines

38
00:03:00,809 --> 00:03:02,739
in a leaderboard.

39
00:03:02,739 --> 00:03:08,709
The pipelines for the sample binary classification
model are quite uniform because of the underlying

40
00:03:08,709 --> 00:03:10,519
sample data.

41
00:03:10,519 --> 00:03:16,749
To see pipelines in action, re-run the experiment
as a regression experiment to predict purchase

42
00:03:16,749 --> 00:03:17,999
amount.

43
00:03:17,999 --> 00:03:22,709
That experiment gives better variation in
the resulting pipelines.

44
00:03:22,709 --> 00:03:27,859
After clicking “Pipeline comparison,”
we can see how the pipelines differ on various

45
00:03:27,859 --> 00:03:30,269
measures of model quality.

46
00:03:30,269 --> 00:03:35,720
The pipelines can be saved as Machine Learning
assets in the Watson Studio project.

47
00:03:35,720 --> 00:03:39,349
Then they can be deployed and tested.

48
00:03:39,349 --> 00:03:45,449
Currently AutoAI is available only for classification
and regression models; there is a plan to

49
00:03:45,449 --> 00:03:48,760
add time series model support in the future.

50
00:03:48,760 --> 00:03:54,690
In this unit, you have learned how AutoAI
automates typical data science tasks and helps

51
00:03:54,690 --> 00:04:00,999
get better performing data pipelines more
quickly, while also simplifying pipeline deployment

52
00:04:00,999 --> 00:04:04,709
into production in Watson Machine Learning.

53
00:04:04,709 --> 00:04:09,969
In the next section, we will discuss Watson
OpenScale, which helps to ensure that your

54
00:04:09,969 --> 00:04:13,640
models are fair, explainable, and up to date.