1 00:00:07,029 --> 00:00:13,480 Most organizations have huge amounts of data stored in many forms in various locations. 2 00:00:13,480 --> 00:00:18,560 Finding relevant data quickly and connecting disparate data sources can be challenging 3 00:00:18,560 --> 00:00:24,890 and time-consuming. Watson Knowledge Catalog unites all information assets into a single 4 00:00:24,890 --> 00:00:31,000 metadata-rich catalog, based on Watson’s understanding of relationships between assets 5 00:00:31,000 --> 00:00:37,890 and how they’re being used and socialized among users in existing projects. Let’s 6 00:00:37,890 --> 00:00:43,679 have a look at the overview of different tool categories that we’ve previously discussed. 7 00:00:43,679 --> 00:00:49,489 Watson Knowledge Catalog corresponds to the Data Asset Management, Code Asset Management, 8 00:00:49,489 --> 00:00:55,609 Data Management, and Data Integration and Transformation. Watson Knowledge Catalog is 9 00:00:55,609 --> 00:01:00,909 a data catalog that is integrated with an enterprise data governance platform. It also 10 00:01:00,909 --> 00:01:08,340 merges the analytics capabilities of Watson Studio. The data catalog assists data scientists 11 00:01:08,340 --> 00:01:14,960 to easily find, prepare, understand, and use the data as needed. Watson Knowledge Catalog 12 00:01:14,960 --> 00:01:20,700 protects data from misuse and enables the sharing of assets with automated, dynamic 13 00:01:20,700 --> 00:01:27,390 masking of sensitive data elements. Data-profile visualizations, built-in charts and statistics 14 00:01:27,390 --> 00:01:33,909 help users to understand data assets. Seamless integration with Watson Studio helps data 15 00:01:33,909 --> 00:01:38,410 citizens to drive production of their data in a suite of powerful data science, AI, machine-learning 16 00:01:38,410 --> 00:01:40,420 and deep-learning tools. Joining with Watson Studio directs the building, training, and 17 00:01:40,420 --> 00:01:44,460 deploying of models. Users can interactively discover, cleanse, and prepare data with a 18 00:01:44,460 --> 00:01:52,630 built-in data refinery. Possible connections to more than 30 IBM and third-party data sources 19 00:01:52,630 --> 00:02:00,000 help to catalog and use your data in the original locations. IBM Watson Knowledge Catalog has 20 00:02:00,000 --> 00:02:06,380 various deployment choices on IBM Cloud™ and can be run anywhere with IBM Cloud Pak™ 21 00:02:06,380 --> 00:02:13,730 for Data. The latter is a fully-integrated data and AI platform built on Red Hat® OpenShift® 22 00:02:13,730 --> 00:02:19,310 Container base. It can be deployed easily into any public or private cloud or other 23 00:02:19,310 --> 00:02:24,959 enterprise platforms. A catalog contains metadata about the contents 24 00:02:24,959 --> 00:02:30,970 of assets and how to access them. And a set of collaborators who need to use the assets 25 00:02:30,970 --> 00:02:37,599 for data analysis. The metadata is stored in an encrypted IBM Cloud object storage instance. 26 00:02:37,599 --> 00:02:42,849 Any data that you want to store in the Cloud, you can upload to the cloud object storage 27 00:02:42,849 --> 00:02:48,901 of your choice, and then specify that object storage when you create the catalog. This 28 00:02:48,901 --> 00:02:53,690 split between where the data's metadata is stored and the actual location of the data 29 00:02:53,690 --> 00:02:58,860 is important. It means that you can keep your data where ever it is. You don't need to move 30 00:02:58,860 --> 00:03:05,110 it into the catalog because the catalog only contains metadata. You can have the data in 31 00:03:05,110 --> 00:03:11,799 unpremises data repositories in other IBM cloud services like Cloudant or Db2 on Cloud 32 00:03:11,799 --> 00:03:19,689 and in non-IBM cloud services like Amazon or Azure, in streaming data services or even 33 00:03:19,689 --> 00:03:28,299 dark data sources like PDFs. Included in the metadata is how to access the data asset. 34 00:03:28,299 --> 00:03:33,550 In other words, the location and credentials. That means that anyone who is a member of 35 00:03:33,550 --> 00:03:40,290 the catalog and has sufficient permissions can get to the data without knowing the credentials 36 00:03:40,290 --> 00:03:48,820 or having to create their own connection to the data. Since the new catalog is empty, 37 00:03:48,820 --> 00:03:55,370 let's take a look at an existing catalog. On the Browse Assets tab you can see "recommendations", 38 00:03:55,370 --> 00:04:02,680 "highly rated assets", and "recently created assets", as well as a list of all the assets. 39 00:04:02,680 --> 00:04:11,049 You can type a search term to find assets, and you can filter by asset type, such as 40 00:04:11,049 --> 00:04:18,400 Data Asset or Notebook. Or filter by tags that were assigned to the asset when it was 41 00:04:18,400 --> 00:04:26,960 added to the catalog. When you view an asset, you get a preview of the data and other information 42 00:04:26,960 --> 00:04:34,710 like a description, ratings, tags, where the source is located, and any classifications. 43 00:04:34,710 --> 00:04:40,100 On the Access tab, those with permission can add members to view this particular asset. 44 00:04:40,100 --> 00:04:46,520 And the Review tab shows reviews and lets you contribute a review. When assets are added 45 00:04:46,520 --> 00:04:52,340 to a catalog with Data Policies enabled, Watson Knowledge Catalog automatically profiles and 46 00:04:52,340 --> 00:04:57,610 classifies the content of the asset based on the values in those columns. The Profile 47 00:04:57,610 --> 00:05:03,370 tab contains more detailed information about the inferred classifications. You can see 48 00:05:03,370 --> 00:05:08,190 the other possibilities for classifying each column and the confidence scores for those 49 00:05:08,190 --> 00:05:15,050 other possibilities. On the Lineage tab, you'll see the various events that Watson Knowledge 50 00:05:15,050 --> 00:05:21,110 Catalog has captured that occurred in the lifecycle of this data asset, allowing you 51 00:05:21,110 --> 00:05:27,680 to trace what's happened to the asset since it was created. On the Access Control tab, 52 00:05:27,680 --> 00:05:33,169 you can see the current list of catalog members. you can also add members which is pretty similar 53 00:05:33,169 --> 00:05:38,690 to adding collaborators in a project. Most catalog members will likely have the editor 54 00:05:38,690 --> 00:05:44,690 role. The viewer role is intentionally restricted and only a select few users will have the 55 00:05:44,690 --> 00:05:50,530 admin role. Watson Knowledge Catalog includes capabilities 56 00:05:50,530 --> 00:05:56,620 to automatically mask sensitive data according to your organization's governance policies. 57 00:05:56,620 --> 00:06:02,220 For example, you can see in the diagram that the first name, last name, and gender data 58 00:06:02,220 --> 00:06:08,190 in the data set have been masked. You’ve learned how IBM Watson Knowledge Catalog can 59 00:06:08,190 --> 00:06:14,060 help organizations deal with their numerous data and other assets. In the next video we’ll 60 00:06:14,060 --> 00:06:19,379 look at Data Refinery, a powerful tool for analyzing and preparing data.