1 00:00:00,025 --> 00:00:10,025 [MUSIC] 2 00:00:13,020 --> 00:00:19,010 In this video I will introduce the Teachings Ratings Data. 3 00:00:19,010 --> 00:00:23,670 We have been working with teaching ratings data from University of Texas. 4 00:00:23,670 --> 00:00:27,580 And the underlying question is, if students teaching evaluations 5 00:00:27,580 --> 00:00:31,500 are influenced by the looks of individual instructors. 6 00:00:31,500 --> 00:00:36,400 Or you can ask if they're teaching evaluations differ by gender, 7 00:00:36,400 --> 00:00:41,430 or if good looking instructors get higher teaching evaluations. 8 00:00:43,750 --> 00:00:49,220 I obtained this data from Professor Daniel Hamermesh, who has written 9 00:00:49,220 --> 00:00:55,250 a paper about how beauty may impact in an instructors teaching evaluation. 10 00:00:55,250 --> 00:00:58,480 In fact, he has written an amazing book called Beauty Pays, 11 00:00:58,480 --> 00:01:00,780 in which he answers these questions. 12 00:01:00,780 --> 00:01:05,251 Such as do you think good looking employees get higher pay or 13 00:01:05,251 --> 00:01:07,000 faster promotions? 14 00:01:07,000 --> 00:01:11,900 Do you think good looking instructors get higher teaching evaluations and 15 00:01:11,900 --> 00:01:13,560 the data comes from University of Texas? 16 00:01:13,560 --> 00:01:15,470 It's a survey. 17 00:01:15,470 --> 00:01:20,115 And data obtained from 463 courses. 18 00:01:21,990 --> 00:01:26,110 So the data is first referenced in the book, 19 00:01:26,110 --> 00:01:31,520 Getting Started with Data Science, making sense of data with Analytics in Chapter 4. 20 00:01:31,520 --> 00:01:36,050 They're variables that essentially define the attributes of instructors and 21 00:01:36,050 --> 00:01:37,600 characteristics of the courses. 22 00:01:37,600 --> 00:01:42,030 Some variables are continuous, others that dichotomous or categorical variables. 23 00:01:42,030 --> 00:01:46,350 So the primary two variables of our interests 24 00:01:46,350 --> 00:01:50,710 are beauty score, which is basically the physical appearance 25 00:01:50,710 --> 00:01:55,140 of an instructor which was ranked by a panel of six students. 26 00:01:55,140 --> 00:01:59,190 And I think that they have normalized the beauty 27 00:01:59,190 --> 00:02:03,830 score such that it had the mean of zero and variance of one or 28 00:02:03,830 --> 00:02:08,259 a standard deviation of one the same as Z transformation. 29 00:02:08,259 --> 00:02:13,850 The dependent variable, the variable of interest is evaluation, 30 00:02:13,850 --> 00:02:18,710 which is basically the teaching evaluation ranging between the scale of 1 to 5, 31 00:02:18,710 --> 00:02:21,150 one being very unsatisfactory and 32 00:02:21,150 --> 00:02:25,830 the student found the course to be excellent, then it's 5. 33 00:02:25,830 --> 00:02:29,210 And then there are other dichotomous or binary or 34 00:02:29,210 --> 00:02:32,460 categorical variables such as minority. 35 00:02:32,460 --> 00:02:37,555 If the instructor was non Caucasian, age is a continuous variable 36 00:02:37,555 --> 00:02:42,590 as the professors age or instructors age, gender being male or female. 37 00:02:42,590 --> 00:02:45,640 Native stands for native English speaker. 38 00:02:45,640 --> 00:02:48,970 If the instructor with a native speaker of English language10, otherwise. 39 00:02:48,970 --> 00:02:53,930 If the professor was tenured, 10 otherwise. 40 00:02:53,930 --> 00:02:58,150 I have produced some descriptors statistics for your reference. 41 00:02:58,150 --> 00:03:02,170 For instance for the continuous variables such as age and beauty and 42 00:03:02,170 --> 00:03:06,230 teaching evaluation and the number of students who are enrolled in the course 43 00:03:06,230 --> 00:03:09,388 on the number of students who performed the teaching evaluations. 44 00:03:09,388 --> 00:03:13,530 I produce the descriptive statistics such as the minimum, the maximum mean, 45 00:03:13,530 --> 00:03:14,944 and standard deviation. 46 00:03:14,944 --> 00:03:19,242 For categorical variables such as gender, female yes or no, 47 00:03:19,242 --> 00:03:24,463 visible minority yes or no person being a tenured professor or otherwise. 48 00:03:24,463 --> 00:03:27,541 I produced the frequency distributions and 49 00:03:27,541 --> 00:03:32,210 percentage of individuals falling in one category or otherwise. 50 00:03:32,210 --> 00:03:37,585 Notice that the teaching evaluation score is 3.99, with the standard deviation of 51 00:03:37,585 --> 00:03:42,560 0.55 and let's see if I were to produce a histogram 52 00:03:42,560 --> 00:03:47,730 of this variable teaching evaluation, how will it look like with raw data? 53 00:03:47,730 --> 00:03:52,060 And if I were to use the normal distribution and feed the two parameters, 54 00:03:52,060 --> 00:03:56,640 that is the mean and the standard deviation, how will the same 55 00:03:56,640 --> 00:04:00,260 distribution look like using a normal distribution? 56 00:04:00,260 --> 00:04:05,630 I am presenting here the distribution of the raw data on left side and 57 00:04:05,630 --> 00:04:08,740 the presentation of the same data with the same parameters of the mean and 58 00:04:08,740 --> 00:04:11,220 standard deviation using normal distribution. 59 00:04:11,220 --> 00:04:14,080 You could see that the data not exactly following a Bell curve. 60 00:04:14,080 --> 00:04:16,332 This is the raw data and it seldom does. 61 00:04:16,332 --> 00:04:19,615 But then the theoretical distribution looks like this. 62 00:04:19,615 --> 00:04:23,853 Essentially the same data set with a mean of 3.998 in standard deviation 63 00:04:23,853 --> 00:04:27,634 is presented here and then a normal distribution drawn from these two 64 00:04:27,634 --> 00:04:30,453 characters appear step, so it's much smoother. 65 00:04:30,453 --> 00:04:31,007 Theoretical distributions are much smoother than the raw data. 66 00:04:31,007 --> 00:04:31,507 [MUSIC]