1
00:00:07,680 --> 00:00:13,410
In this video, we will take a look at an easy-to-use,
graphical way to build machine learning models

2
00:00:13,410 --> 00:00:15,260
and pipelines.

3
00:00:15,260 --> 00:00:21,960
SPSS Modeler Flows is a part of Watson Studio,
which was inspired by another product, IBM

4
00:00:21,960 --> 00:00:23,880
SPSS Modeler.

5
00:00:23,880 --> 00:00:26,760
We'll discuss that product in a later unit.

6
00:00:26,760 --> 00:00:31,420
Let’s have a look again at the overview
of different tool categories.

7
00:00:31,420 --> 00:00:37,539
Modeler flows include some data management
capabilities, as well as tools for data preparation,

8
00:00:37,539 --> 00:00:40,540
visualization, and model building.

9
00:00:40,540 --> 00:00:45,410
All flows are created using a drag-and-drop
editor and consist of “nodes” of various

10
00:00:45,410 --> 00:00:51,470
types, with data “flowing” from one node
to the next according to their connections.

11
00:00:51,470 --> 00:00:57,110
A sample Modeler flow shown here includes
two data source nodes shown in purple on the

12
00:00:57,110 --> 00:01:04,180
left; type, aggregate, filter, merge, filler,
and partition nodes in the middle; 2 model

13
00:01:04,180 --> 00:01:06,969
building nodes shown in pentagons.

14
00:01:06,969 --> 00:01:12,549
Once a flow is executed and the models are
built, the upside-down pentagon “model nuggets”

15
00:01:12,549 --> 00:01:13,979
are created.

16
00:01:13,979 --> 00:01:18,330
They can be used to see information about
the models and to get predictions for new

17
00:01:18,330 --> 00:01:19,820
data.

18
00:01:19,820 --> 00:01:25,040
And the three green square nodes on the right
provide model evaluation information in the

19
00:01:25,040 --> 00:01:28,400
form of tables and charts.

20
00:01:28,400 --> 00:01:34,010
You can build your SPSS Modeler flows by dragging
different types of nodes from the left, the

21
00:01:34,010 --> 00:01:39,830
part of the screen called the “palette,”
to the "canvas," the main part of the screen.

22
00:01:39,830 --> 00:01:44,900
Each flow starts with one or more data sources
located in the “Import” group, and can

23
00:01:44,900 --> 00:01:48,480
include some or all other types of nodes.

24
00:01:48,480 --> 00:01:52,710
Watson Studio provides some sample flows to
help new users.

25
00:01:52,710 --> 00:01:58,610
In the Drug Study example shown here, we are
using a small artificial data set.

26
00:01:58,610 --> 00:02:04,000
The target variable is a categorical field,
“Drug,” that has five categories, and

27
00:02:04,000 --> 00:02:06,790
there are several predictor variables.

28
00:02:06,790 --> 00:02:12,040
This flow creates a new “derived” field
by dividing the values of one of the predictors

29
00:02:12,040 --> 00:02:18,430
by values of another one, and at the end builds
a small neural network model and a decision

30
00:02:18,430 --> 00:02:19,450
tree model.

31
00:02:19,450 --> 00:02:24,569
When a user clicks the “Run” button on
the top panel, denoted by a triangle, the

32
00:02:24,569 --> 00:02:27,659
flow is executed and the models build.

33
00:02:27,659 --> 00:02:32,849
This is reflected in the new pentagon nodes,
called “model nuggets,” that display under

34
00:02:32,849 --> 00:02:34,939
each model node.

35
00:02:34,939 --> 00:02:39,769
If you click on the three dots in the upper
right corner of one of those nodes and select

36
00:02:39,769 --> 00:02:44,310
“View Model”, you will see various types
of model information.

37
00:02:44,310 --> 00:02:50,519
By connecting new data sources to the model
nugget, you can get predictions on new data.

38
00:02:50,519 --> 00:02:55,629
The first window in the model viewer shows
model accuracy and related measures, such

39
00:02:55,629 --> 00:02:58,219
as precision and recall.

40
00:02:58,219 --> 00:03:04,239
This toy data example enabled us to get perfect
accuracy, which is normally not the case with

41
00:03:04,239 --> 00:03:06,110
real life data.

42
00:03:06,110 --> 00:03:11,180
The Confusion Matrix view shows how model
predictions on the training data matched the

43
00:03:11,180 --> 00:03:13,950
observed target values.

44
00:03:13,950 --> 00:03:18,819
Once again, in this toy example all cases
were classified correctly.

45
00:03:18,819 --> 00:03:24,239
We can also look at Model Information, which
displays a table that tells us more about

46
00:03:24,239 --> 00:03:26,379
the details of the model.

47
00:03:26,379 --> 00:03:31,599
Feature Importance displays a diagram that
indicates the relative predictive strength

48
00:03:31,599 --> 00:03:33,790
of various model inputs.

49
00:03:33,790 --> 00:03:39,249
Finally, the Network Diagram gives a visual
representation of the neural network model

50
00:03:39,249 --> 00:03:41,169
we built.

51
00:03:41,169 --> 00:03:46,069
On the left is the input layer, with units
corresponding to each continuous predictor

52
00:03:46,069 --> 00:03:52,809
and each category of the categorical predictors,
plus a bias unit that is usually present in

53
00:03:52,809 --> 00:03:55,549
each layer of a neural network.

54
00:03:55,549 --> 00:04:02,199
In the middle, we see a “hidden layer”
with 7 units, or neurons, and a bias unit.

55
00:04:02,199 --> 00:04:09,299
On the right is the output layer with 5 units
corresponding to the five target categories.

56
00:04:09,299 --> 00:04:14,909
Controls on the right and bottom of the diagram
enable some interactive exploration of the

57
00:04:14,909 --> 00:04:15,909
model.

58
00:04:15,909 --> 00:04:21,010
The colors of the connections between units
indicate the values of the weights on those

59
00:04:21,010 --> 00:04:22,290
connections.

60
00:04:22,290 --> 00:04:27,430
We can also look at the decision tree model
built using the C5 algorithm.

61
00:04:27,430 --> 00:04:32,460
A Model Information table and Feature Importance
chart appear as before.

62
00:04:32,460 --> 00:04:37,720
Additionally, a Top Decision Rules table is
displayed.

63
00:04:37,720 --> 00:04:42,220
Decision tree models are popular because they
have a special structure that makes it easy

64
00:04:42,220 --> 00:04:46,319
to explain predictions or extract decision
rules.

65
00:04:46,319 --> 00:04:50,430
The tree diagram is also displayed.

66
00:04:50,430 --> 00:04:55,229
On the left side of the canvas, we see a part
of the model palette that can be used in the

67
00:04:55,229 --> 00:04:56,229
flows.

68
00:04:56,229 --> 00:05:01,990
At the top are “Auto Classifier” and “Auto
Numeric” nodes that can be used for categorical

69
00:05:01,990 --> 00:05:05,380
and continuous targets, respectively.

70
00:05:05,380 --> 00:05:11,689
Those nodes will build several kinds of models
and pick the best one based on a certain criterion.

71
00:05:11,689 --> 00:05:18,740
Later, we will talk about the AutoAI feature
of Watson Studio; AutoAI takes this capability

72
00:05:18,740 --> 00:05:24,669
to the next level by automatically finding
not only the best model, but an entire data

73
00:05:24,669 --> 00:05:29,139
pipeline, which includes various data transformations.

74
00:05:29,139 --> 00:05:34,069
In this video, you've learned how Modeler
Flows in Watson Studio can help analysts to

75
00:05:34,069 --> 00:05:39,509
create powerful machine learning pipelines
using a graphical interface without the need

76
00:05:39,509 --> 00:05:41,699
to write any code.

77
00:05:41,699 --> 00:05:45,410
This feature was based on IBM SPSS Modeler.

78
00:05:45,410 --> 00:05:51,560
Next, after completing a lab to give you hands-on
experience with this powerful technology,

79
00:05:51,560 --> 00:05:58,629
we will take a look at two other IBM products
that can be used for Data Science: IBM SPSS

80
00:05:58,629 --> 00:06:01,840
Modeler and IBM SPSS Statistics.