1 00:00:07,620 --> 00:00:12,590 Since we’ve previously covered open source tools for data science, let’s look at the 2 00:00:12,590 --> 00:00:17,110 commercial options you’ll find in many enterprise projects. 3 00:00:17,110 --> 00:00:20,890 Take another look at the overview of different tool categories. 4 00:00:20,890 --> 00:00:26,230 Since cloud products are a newer species, they follow the trend of having multiple tasks 5 00:00:26,230 --> 00:00:28,710 integrated in tools. 6 00:00:28,710 --> 00:00:33,399 This especially holds true for the tasks marked green in the diagram. 7 00:00:33,399 --> 00:00:38,450 Let’s start with the fully integrated visual tools category. 8 00:00:38,450 --> 00:00:43,910 Since these tools introduce a component where large scale execution of data science workflows 9 00:00:43,910 --> 00:00:50,379 happens in compute clusters, we’ve changed the title here and added the word “Platform.” 10 00:00:50,379 --> 00:00:55,739 These clusters are composed of multiple server machines, transparently for the user, in the 11 00:00:55,739 --> 00:00:57,159 background. 12 00:00:57,159 --> 00:01:02,780 Watson Studio, together with Watson OpenScale, covers the complete development life cycle 13 00:01:02,780 --> 00:01:08,320 for all data science, machine learning, and AI tasks. 14 00:01:08,320 --> 00:01:12,030 Another example is Microsoft Azure Machine Learning. 15 00:01:12,030 --> 00:01:16,900 This is also a fully cloud-hosted offering supporting the complete development life cycle 16 00:01:16,900 --> 00:01:22,030 of all data science, machine learning, and AI tasks. 17 00:01:22,030 --> 00:01:28,130 And finally, another example is H2O Driverless AI, which we’ve already introduced in the 18 00:01:28,130 --> 00:01:29,670 last video. 19 00:01:29,670 --> 00:01:34,920 Although it is a product that you download and install, one-click deployment is available 20 00:01:34,920 --> 00:01:37,910 for the common cloud service providers. 21 00:01:37,910 --> 00:01:42,670 Since operations and maintenance are not done by the cloud provider, as is the case with 22 00:01:42,670 --> 00:01:48,540 Watson Studio, Open Scale, and Azure Machine Learning, this delivery model should not be 23 00:01:48,540 --> 00:01:56,780 confused with Platform or Software as a Service -- PaaS or SaaS. 24 00:01:56,780 --> 00:02:03,400 In data management, with some exceptions, there are SaaS versions of existing open source 25 00:02:03,400 --> 00:02:04,400 and commercial tools. 26 00:02:04,400 --> 00:02:09,410 Remember, SaaS stands for “software as a service.” 27 00:02:09,410 --> 00:02:14,950 It means that the cloud provider operates the tool for you in the cloud. 28 00:02:14,950 --> 00:02:20,900 As an example, the cloud provider operates the product by backing up your data and configuration 29 00:02:20,900 --> 00:02:23,220 and installing updates. 30 00:02:23,220 --> 00:02:29,290 As mentioned, there is proprietary tooling, which is only available as a cloud product. 31 00:02:29,290 --> 00:02:32,969 Sometimes it’s only available from a single cloud provider. 32 00:02:32,969 --> 00:02:40,239 One example of such a service is Amazon Web Services DynamoDB, a NoSQL database that allows 33 00:02:40,239 --> 00:02:46,590 storage and retrieval of data in a key-value or a document store format. 34 00:02:46,590 --> 00:02:51,400 The most prominent document data structure is JSON (pronounced “jay-sun”). 35 00:02:51,400 --> 00:02:56,540 Another flavour of such a service is Cloudant, which is a database-as-a-service offering. 36 00:02:56,540 --> 00:03:02,010 But, under the hood it is based on the open source Apache CouchDB. 37 00:03:02,010 --> 00:03:08,439 It has an advantage: although complex operational tasks like updating, backup, restore, and 38 00:03:08,439 --> 00:03:14,079 scaling are done by the cloud provider, under the hood this offering is compatible with 39 00:03:14,079 --> 00:03:15,209 CouchDB. 40 00:03:15,209 --> 00:03:21,260 Therefore, the application can be migrated to another CouchDB server without changing 41 00:03:21,260 --> 00:03:22,510 the application. 42 00:03:22,510 --> 00:03:28,049 And IBM offers Db2 as a service as well. 43 00:03:28,049 --> 00:03:32,720 This is an example of a commercial database made available as a software-as-a-service 44 00:03:32,720 --> 00:03:39,299 offering in the cloud, taking operational tasks away from the user. 45 00:03:39,299 --> 00:03:44,200 When it comes to commercial data integration tools, we talk not only about “extract, 46 00:03:44,200 --> 00:03:51,230 transform, and load,” or “ETL” tools, but also about “extract, load, and transform,” 47 00:03:51,230 --> 00:03:53,980 or “ELT,” tools. 48 00:03:53,980 --> 00:03:59,669 This means the transformation steps are not done by a data integration team but are pushed 49 00:03:59,669 --> 00:04:04,409 towards the domain of the data scientist or data engineer. 50 00:04:04,409 --> 00:04:10,180 Two widely used commercial data integration tools are Informatica Cloud Data Integration 51 00:04:10,180 --> 00:04:13,059 and IBM’s Data Refinery. 52 00:04:13,059 --> 00:04:19,079 Data Refinery enables transformation of large amounts of raw data into consumable, quality 53 00:04:19,079 --> 00:04:23,130 information in a spreadsheet-like user interface. 54 00:04:23,130 --> 00:04:26,940 Data Refinery is part of IBM Watson Studio. 55 00:04:26,940 --> 00:04:32,930 The market for cloud data visualization tools is huge, and every major cloud vendor has 56 00:04:32,930 --> 00:04:33,930 one. 57 00:04:33,930 --> 00:04:39,450 An example of a smaller company’s cloud-based data visualization tool is DataMeer. 58 00:04:39,450 --> 00:04:46,449 IBM offers it’s famous Cognos Business intelligence suite as cloud solution as well. 59 00:04:46,449 --> 00:04:52,729 IBM Data Refinery also offers data exploration and visualization functionality in Watson 60 00:04:52,729 --> 00:04:53,759 Studio. 61 00:04:53,759 --> 00:05:00,330 Again, these are just some examples of a rapidly changing and growing commercial ecosystem 62 00:05:00,330 --> 00:05:05,970 among a huge number of established and emerging vendors. 63 00:05:05,970 --> 00:05:11,750 In Watson Studio, an abundance of different visualizations can be used to better understand 64 00:05:11,750 --> 00:05:12,750 data. 65 00:05:12,750 --> 00:05:18,280 For example, this 3D bar chart enables you to visualize a target value on the vertical 66 00:05:18,280 --> 00:05:24,060 dimension, which is dependent on two other values on the horizontal dimensions. 67 00:05:24,060 --> 00:05:28,930 Coloring enables you to visualize a third dimension. 68 00:05:28,930 --> 00:05:35,470 Hierarchical edge bundling enables you to visualize correlations and affiliations between 69 00:05:35,470 --> 00:05:36,470 entities. 70 00:05:36,470 --> 00:05:42,409 If sufficient, a classic bar chart can do the job as well, whereas a 2D scatter plot 71 00:05:42,409 --> 00:05:50,729 with a heat map shows two dependent data fields, one on the y axis and one as color intensity. 72 00:05:50,729 --> 00:05:57,080 A tree map shows distribution of subsets within a set, the famous pie chart does the same 73 00:05:57,080 --> 00:06:03,610 but in a non-hierarchical manner, and finally, a word cloud pops out significant terms in 74 00:06:03,610 --> 00:06:05,870 a document corpus. 75 00:06:05,870 --> 00:06:10,310 Model building can be done using a service such as Watson Machine Learning. 76 00:06:10,310 --> 00:06:16,379 Watson Machine Learning can train and build models using various open source libraries. 77 00:06:16,379 --> 00:06:21,500 Google has a similar service on their cloud called AI Platform Training. 78 00:06:21,500 --> 00:06:26,949 Nearly every cloud provider has a solution for this task. 79 00:06:26,949 --> 00:06:32,199 Model deployment in commercial software is usually tightly integrated to the model building 80 00:06:32,199 --> 00:06:33,250 process. 81 00:06:33,250 --> 00:06:38,879 Here is an example of the SPSS Collaboration and Deployment Services, which can be used 82 00:06:38,879 --> 00:06:44,639 to deploy any type of asset created by the SPSS software tools suite. 83 00:06:44,639 --> 00:06:47,240 The same holds for other vendors. 84 00:06:47,240 --> 00:06:52,280 In addition, commercial software can export models in an open format. 85 00:06:52,280 --> 00:06:58,670 As an example, SPSS Modeler supports exporting models as Predictive Model Markup Language, 86 00:06:58,670 --> 00:07:05,970 or “PMML,” which can be read by numerous other commercial and open software packages. 87 00:07:05,970 --> 00:07:11,340 Watson Machine Learning can also be used to deploy a model and make it available to consumers 88 00:07:11,340 --> 00:07:13,950 using a REST interface. 89 00:07:13,950 --> 00:07:19,479 Amazon SageMaker Model Monitor is an example of a cloud tool that continuously monitors 90 00:07:19,479 --> 00:07:23,340 deployed machine learning and deep learning models. 91 00:07:23,340 --> 00:07:27,820 Again, every major cloud provider has similar tooling. 92 00:07:27,820 --> 00:07:31,310 This is also the case for Watson OpenScale. 93 00:07:31,310 --> 00:07:33,289 OpenScale and Watson Studio… 94 00:07:33,289 --> 00:07:35,520 …unify the landscape. 95 00:07:35,520 --> 00:07:40,139 Everything marked in green can be done using Watson Studio and Watson OpenScale. 96 00:07:40,139 --> 00:07:44,020 We’ll cover Open Scale will be covered in a later video. 97 00:07:44,020 --> 00:07:49,039 You’ve learned how the most common tasks in data science are supported by commercial 98 00:07:49,039 --> 00:07:51,340 cloud tools. 99 00:07:51,340 --> 00:07:56,650 Integration provides us the ability to use the same tools for multiple tasks. 100 00:07:56,650 --> 00:08:03,319 In the next videos, we’ll look at packages, APIs, datasets, and models for data science.