1 00:00:06,590 --> 00:00:10,590 Let me illustrate how to obtain 2 00:00:10,590 --> 00:00:11,910 the probability of getting 3 00:00:11,910 --> 00:00:16,005 a high or low teaching evaluation score from our dataset. 4 00:00:16,005 --> 00:00:18,690 First, an important concept is 5 00:00:18,690 --> 00:00:21,270 the standardization of a variable such that it 6 00:00:21,270 --> 00:00:23,040 returns a dataset with a mean of 7 00:00:23,040 --> 00:00:26,025 zero and a standard deviation of one. 8 00:00:26,025 --> 00:00:29,025 I use the formula in equation shown here, 9 00:00:29,025 --> 00:00:30,840 where the standardization is taking 10 00:00:30,840 --> 00:00:32,250 a variable X and 11 00:00:32,250 --> 00:00:35,280 subtracting from it the average value mu, 12 00:00:35,280 --> 00:00:38,540 then dividing it by the standard deviation so that if 13 00:00:38,540 --> 00:00:40,100 the teaching evaluation score of 14 00:00:40,100 --> 00:00:43,985 an instructor on a scale of one to five is 4.5, 15 00:00:43,985 --> 00:00:46,430 we subtract the average teaching evaluation of 16 00:00:46,430 --> 00:00:50,779 3.998 from it and divide it by the standard deviation, 17 00:00:50,779 --> 00:00:58,090 which is 0.554, resulting in a Z score of 0.906. 18 00:00:58,090 --> 00:01:01,745 If we were to just display the data as a histogram, 19 00:01:01,745 --> 00:01:04,680 you would see that it has a mean around zero. 20 00:01:04,680 --> 00:01:06,590 And the spread is shown on a scale where 21 00:01:06,590 --> 00:01:10,390 the X axis varies from minus three to two. 22 00:01:10,390 --> 00:01:12,950 In a case where you do not have access to 23 00:01:12,950 --> 00:01:15,050 a computer with statistical software, 24 00:01:15,050 --> 00:01:17,150 you can still compute probabilities from 25 00:01:17,150 --> 00:01:19,070 a probability table using 26 00:01:19,070 --> 00:01:21,265 a simple and standard normal table found in 27 00:01:21,265 --> 00:01:24,650 statistics textbooks or downloaded online. 28 00:01:24,650 --> 00:01:27,335 A copy of such a table is on the right. 29 00:01:27,335 --> 00:01:29,720 Notice that the normal distribution graph 30 00:01:29,720 --> 00:01:32,105 to the left is grayed out in some parts. 31 00:01:32,105 --> 00:01:34,250 That grayed-out area represents 32 00:01:34,250 --> 00:01:36,665 the probability of getting some value Z, 33 00:01:36,665 --> 00:01:38,860 in this case Z. 34 00:01:38,860 --> 00:01:41,570 This value of Z or less than, 35 00:01:41,570 --> 00:01:43,370 we will need to first standardize 36 00:01:43,370 --> 00:01:45,440 the variable to determine the probability of 37 00:01:45,440 --> 00:01:47,660 a teaching evaluation score higher 38 00:01:47,660 --> 00:01:51,565 than 4.5 or less than 4.5. 39 00:01:51,565 --> 00:01:54,080 Let's say, we have a dataset where 40 00:01:54,080 --> 00:01:56,120 the average teaching evaluation is 41 00:01:56,120 --> 00:02:01,715 3.998 and the standard deviation is 0.554. 42 00:02:01,715 --> 00:02:03,140 And we are interested in 43 00:02:03,140 --> 00:02:04,970 determining the probability of getting 44 00:02:04,970 --> 00:02:08,590 a teaching evaluation score of 4.5 or less. 45 00:02:08,590 --> 00:02:10,910 So, from the table that I showed 46 00:02:10,910 --> 00:02:13,640 in the last slide, we can determine this. 47 00:02:13,640 --> 00:02:16,130 If we were to standardize the data, 48 00:02:16,130 --> 00:02:18,470 it becomes 0.906 because 49 00:02:18,470 --> 00:02:19,955 the accuracy of this table is 50 00:02:19,955 --> 00:02:22,060 only good for two decimal places. 51 00:02:22,060 --> 00:02:27,315 So 0.906 effectively becomes 0.91. 52 00:02:27,315 --> 00:02:30,620 We get a 0.8186 value here, hence, 53 00:02:30,620 --> 00:02:32,045 the probability of obtaining 54 00:02:32,045 --> 00:02:34,010 a teaching evaluation score of 55 00:02:34,010 --> 00:02:38,590 4.5 or less is 0.8186. 56 00:02:38,590 --> 00:02:41,075 If you were to look at this graphic, 57 00:02:41,075 --> 00:02:42,320 you will see that I have plotted 58 00:02:42,320 --> 00:02:44,720 the area under the curve by shading it gray. 59 00:02:44,720 --> 00:02:46,250 That's the area that depicts 60 00:02:46,250 --> 00:02:48,170 the probability of an instructor receiving 61 00:02:48,170 --> 00:02:52,360 a teaching evaluation of less than or equal to 4.5. 62 00:02:52,360 --> 00:02:59,115 And that probability is 0.8176 or 81.76 percent. 63 00:02:59,115 --> 00:03:01,670 Now, what will be the probability of receiving 64 00:03:01,670 --> 00:03:05,090 a teaching evaluation score of greater than 4.5. 65 00:03:05,090 --> 00:03:07,040 In fact, you can see from 66 00:03:07,040 --> 00:03:09,095 the next graphic that the probability 67 00:03:09,095 --> 00:03:13,600 is the reverse of one that we saw earlier. 68 00:03:13,600 --> 00:03:15,655 And hence, the probability of obtaining 69 00:03:15,655 --> 00:03:18,040 a teaching evaluation score of greater than 70 00:03:18,040 --> 00:03:21,100 4.5 is 18.24 percent, 71 00:03:21,100 --> 00:03:23,360 which is the area shaded in gray. 72 00:03:23,360 --> 00:03:25,915 The reason for this is because the area 73 00:03:25,915 --> 00:03:28,705 under the normal distribution curve is equal to one. 74 00:03:28,705 --> 00:03:32,410 So one minus 0.8176 will give you 75 00:03:32,410 --> 00:03:36,845 the area for evaluation scores greater than 4.5. 76 00:03:36,845 --> 00:03:39,460 Let me illustrate the example of getting 77 00:03:39,460 --> 00:03:42,610 a teaching evaluation score of greater than 4.5. 78 00:03:42,610 --> 00:03:44,830 In Python, when we use the norm 79 00:03:44,830 --> 00:03:49,310 dot cdf function in the scipy.stats package. 80 00:03:49,310 --> 00:03:52,455 After finding the mean standard deviation, 81 00:03:52,455 --> 00:03:57,050 we plug it into the function with the X value of 4.5. 82 00:03:57,050 --> 00:03:59,045 And we will get the area to the left, 83 00:03:59,045 --> 00:04:01,885 which is the less than 4.5 area. 84 00:04:01,885 --> 00:04:06,440 Because we want the area to the right of 4.5, that is, 85 00:04:06,440 --> 00:04:09,035 the probability of greater than 4.5, 86 00:04:09,035 --> 00:04:13,350 we will remove the value from one as indicated here.