1 00:00:06,260 --> 00:00:09,750 Let us begin with the fundamentals of visualization. 2 00:00:09,750 --> 00:00:13,175 What you see on the right side is a picture graph or 3 00:00:13,175 --> 00:00:17,550 a bar chart prepared by students in kindergarten class, 4 00:00:17,550 --> 00:00:19,200 four-year-olds at the St. 5 00:00:19,200 --> 00:00:21,240 Luke Catholic school in Mississauga. 6 00:00:21,240 --> 00:00:23,130 This bar chart presents 7 00:00:23,130 --> 00:00:27,180 the frequency for transportation modes 8 00:00:27,180 --> 00:00:28,810 the kids have used to come to school. 9 00:00:28,810 --> 00:00:30,990 Some have been dropped to school by car, 10 00:00:30,990 --> 00:00:34,110 others by bus, and two students walked to school. 11 00:00:34,110 --> 00:00:37,465 In the brave, big world of big data, 12 00:00:37,465 --> 00:00:39,950 we can see that children are being trained at 13 00:00:39,950 --> 00:00:43,175 the earliest possible age with data and data science. 14 00:00:43,175 --> 00:00:45,560 The most important thing to realize is 15 00:00:45,560 --> 00:00:48,230 that the type of visualization 16 00:00:48,230 --> 00:00:50,720 you will use depends upon the type of 17 00:00:50,720 --> 00:00:53,525 variables you're trying to analyze. 18 00:00:53,525 --> 00:00:55,265 For instance, if you're working with 19 00:00:55,265 --> 00:00:57,830 categorical variables such as gender, 20 00:00:57,830 --> 00:00:59,840 you have to rely on 21 00:00:59,840 --> 00:01:02,900 a certain type of charting tools than if you 22 00:01:02,900 --> 00:01:05,239 were to be working with continuous variables 23 00:01:05,239 --> 00:01:08,200 such as age and income. 24 00:01:08,200 --> 00:01:10,750 In one case, you may be using bar charts, 25 00:01:10,750 --> 00:01:12,155 in another case, you will be 26 00:01:12,155 --> 00:01:14,270 required to use scatter plots. 27 00:01:14,270 --> 00:01:17,180 I would like to draw your attention to 28 00:01:17,180 --> 00:01:19,790 the Extreme Presentation Method 29 00:01:19,790 --> 00:01:22,835 developed by Dr. Andrew Abela. 30 00:01:22,835 --> 00:01:26,600 Essentially, it lays out the possible ways of depicting 31 00:01:26,600 --> 00:01:28,370 data based on what kind of 32 00:01:28,370 --> 00:01:30,695 variables you have at your disposal. 33 00:01:30,695 --> 00:01:32,765 This visualization, this graphic 34 00:01:32,765 --> 00:01:34,820 is developed by Dr. Abela that 35 00:01:34,820 --> 00:01:36,545 shows that if you are interested in 36 00:01:36,545 --> 00:01:38,869 comparing variables or demonstrating 37 00:01:38,869 --> 00:01:41,300 their distribution or composition or 38 00:01:41,300 --> 00:01:44,365 the relationship between two variables or more, 39 00:01:44,365 --> 00:01:46,790 you have to rely on specific type of graph. 40 00:01:46,790 --> 00:01:50,150 If you're comparing items with few categories, 41 00:01:50,150 --> 00:01:53,300 you can use bar charts or column charts. 42 00:01:53,300 --> 00:01:56,690 If you are comparing behaviors over time and if you 43 00:01:56,690 --> 00:02:01,040 have the time periods running into several months, 44 00:02:01,040 --> 00:02:02,880 you may have to use a line chart. 45 00:02:02,880 --> 00:02:04,985 If the time periods are not that many, 46 00:02:04,985 --> 00:02:08,080 then you can use columns and other approaches. 47 00:02:08,080 --> 00:02:10,100 If you're trying to depict relationships 48 00:02:10,100 --> 00:02:12,630 between two continuous variables, 49 00:02:12,630 --> 00:02:14,335 your choice is a scatterplot. 50 00:02:14,335 --> 00:02:18,110 The bubble chart will depict two variables on x and 51 00:02:18,110 --> 00:02:20,720 y axis and the third variable 52 00:02:20,720 --> 00:02:22,610 will be depicted by the size of the circle. 53 00:02:22,610 --> 00:02:24,680 So you can essentially have three variables. 54 00:02:24,680 --> 00:02:26,060 If you're interested in depicting 55 00:02:26,060 --> 00:02:27,995 the distribution of the dataset, 56 00:02:27,995 --> 00:02:29,675 you can use the histogram, 57 00:02:29,675 --> 00:02:31,600 which could be a bar chart type of 58 00:02:31,600 --> 00:02:34,060 histogram or a line histogram. 59 00:02:34,060 --> 00:02:35,540 The distribution could also be 60 00:02:35,540 --> 00:02:37,365 shown through scatter plots. 61 00:02:37,365 --> 00:02:39,105 If you would like to show the composition, 62 00:02:39,105 --> 00:02:40,490 then if it's a static data, 63 00:02:40,490 --> 00:02:41,840 you can use pie charts. 64 00:02:41,840 --> 00:02:44,180 If you're showing the composition that 65 00:02:44,180 --> 00:02:46,700 changes over time for few periods, 66 00:02:46,700 --> 00:02:49,070 you can use stacked columns. 67 00:02:49,070 --> 00:02:51,065 If you have several periods, 68 00:02:51,065 --> 00:02:54,245 you can use the stacked area charts. 69 00:02:54,245 --> 00:02:56,225 For hands-on in Python, 70 00:02:56,225 --> 00:02:58,610 we will be using the Seaborn library and 71 00:02:58,610 --> 00:03:00,350 the Matplotlib library to 72 00:03:00,350 --> 00:03:02,755 create visualizations in the labs. 73 00:03:02,755 --> 00:03:05,390 We will learn how to use different functions within 74 00:03:05,390 --> 00:03:08,850 the library to create different kinds of charts.