1 00:00:05,510 --> 00:00:09,780 One needs to understand the theory behind 2 00:00:09,780 --> 00:00:12,840 the hypothesis testing and how do 3 00:00:12,840 --> 00:00:16,139 you reject a null hypothesis or otherwise. 4 00:00:16,139 --> 00:00:17,790 There are rules of thumb. 5 00:00:17,790 --> 00:00:20,340 That is in case of a two-tail test, 6 00:00:20,340 --> 00:00:24,990 one can use 1.96 as the calculated threshold for 7 00:00:24,990 --> 00:00:26,820 either Z or T statistics to reject 8 00:00:26,820 --> 00:00:30,115 a null hypothesis or for a one-tail test, 9 00:00:30,115 --> 00:00:34,370 absolute value of 1.64 to reject a null hypothesis. 10 00:00:34,370 --> 00:00:35,760 But what does it mean, 11 00:00:35,760 --> 00:00:39,215 how do you get to these 1.64 or 1.96? 12 00:00:39,215 --> 00:00:40,940 There is some theory to it. 13 00:00:40,940 --> 00:00:42,920 It involves statistical distributions and 14 00:00:42,920 --> 00:00:45,305 now perhaps is a good time to, 15 00:00:45,305 --> 00:00:46,840 to learn about those. 16 00:00:46,840 --> 00:00:48,860 Now imagine if the mean values 17 00:00:48,860 --> 00:00:50,120 of two variables is the same. 18 00:00:50,120 --> 00:00:52,280 That is, we are assuming that the difference 19 00:00:52,280 --> 00:00:54,470 between the two means is essentially zero. 20 00:00:54,470 --> 00:00:55,865 Let's say the mean of 21 00:00:55,865 --> 00:00:58,670 variable A and the mean of Variable B, 22 00:00:58,670 --> 00:00:59,900 we assume that they are equal, 23 00:00:59,900 --> 00:01:01,700 that is mu_A equals mu_B. 24 00:01:01,700 --> 00:01:03,545 And if that were to be the case, 25 00:01:03,545 --> 00:01:05,870 the difference between the two should be equal to zero. 26 00:01:05,870 --> 00:01:08,390 So the alternative hypothesis could 27 00:01:08,390 --> 00:01:11,620 be that the differences mean is not equal to zero. 28 00:01:11,620 --> 00:01:15,310 So, and you would say that mu_A is not equal to mu_B, 29 00:01:15,310 --> 00:01:17,885 or the difference is greater than zero. 30 00:01:17,885 --> 00:01:20,360 That is, mu_A is greater than mu_B, 31 00:01:20,360 --> 00:01:22,580 or the difference is less than zero, 32 00:01:22,580 --> 00:01:24,935 where mu_A is less than mu_B. 33 00:01:24,935 --> 00:01:28,070 And in these three circumstances, 34 00:01:28,070 --> 00:01:31,250 the rejection region or the how do 35 00:01:31,250 --> 00:01:32,885 you reject the null hypothesis 36 00:01:32,885 --> 00:01:34,625 means three different things. 37 00:01:34,625 --> 00:01:36,110 And for this, we were to revert 38 00:01:36,110 --> 00:01:38,930 to the normal distribution curve. 39 00:01:38,930 --> 00:01:42,065 Imagine that you're conducting a Z-test using 40 00:01:42,065 --> 00:01:43,760 a normal distribution and 41 00:01:43,760 --> 00:01:45,660 the shape of the curve would be very similar. 42 00:01:45,660 --> 00:01:47,390 This image will be very similar if we were to do 43 00:01:47,390 --> 00:01:52,365 a two-tailed test for T-distribution. 44 00:01:52,365 --> 00:01:53,870 But let's assume that we are working with 45 00:01:53,870 --> 00:01:57,065 normal distribution and you have a rejection region 46 00:01:57,065 --> 00:02:03,140 that is to the left and to the right of the curve, 47 00:02:03,140 --> 00:02:05,310 as you saw the normal distribution curve. 48 00:02:05,310 --> 00:02:07,700 Let's say our alternative hypothesis is 49 00:02:07,700 --> 00:02:09,800 that the mean difference is not equal to zero. 50 00:02:09,800 --> 00:02:12,660 It could be greater than or less than zero, 51 00:02:12,660 --> 00:02:14,085 but it's not equal to zero. 52 00:02:14,085 --> 00:02:17,090 So we will call this a two-tail test because we are not 53 00:02:17,090 --> 00:02:18,200 making an assumption of 54 00:02:18,200 --> 00:02:20,515 the difference being greater or less than zero. 55 00:02:20,515 --> 00:02:22,294 Then we have to define 56 00:02:22,294 --> 00:02:24,140 the rejection region in both tails, 57 00:02:24,140 --> 00:02:25,160 that is the left tail and 58 00:02:25,160 --> 00:02:27,485 the right tail of the normal distribution. 59 00:02:27,485 --> 00:02:30,350 Remember we only consider 5 percent of the area under 60 00:02:30,350 --> 00:02:33,215 the normal curve to define the rejection region 61 00:02:33,215 --> 00:02:36,800 and for the two-tail tests that 5 percent gets divided 62 00:02:36,800 --> 00:02:38,540 into half of it goes into 63 00:02:38,540 --> 00:02:39,830 the left tail and the other half 64 00:02:39,830 --> 00:02:41,320 goes into the right tail, 65 00:02:41,320 --> 00:02:45,450 so two and half percent under the curve in each tail. 66 00:02:45,450 --> 00:02:47,225 Graphically you can see this 67 00:02:47,225 --> 00:02:50,345 again as the same as we saw earlier, 68 00:02:50,345 --> 00:02:52,445 that this is a normal distribution curve 69 00:02:52,445 --> 00:02:53,900 and two-and-a-half percent is 70 00:02:53,900 --> 00:02:55,130 in the left tailed into the other 71 00:02:55,130 --> 00:02:57,515 two-and-a-half percent is in the right tail. 72 00:02:57,515 --> 00:03:00,740 If the test statistic is 1.96, 73 00:03:00,740 --> 00:03:02,420 if the absolute value of 74 00:03:02,420 --> 00:03:04,400 the test statistic is greater 75 00:03:04,400 --> 00:03:07,205 than 1.96 or less than 1.96, 76 00:03:07,205 --> 00:03:08,855 it falls in the rejection region 77 00:03:08,855 --> 00:03:10,835 and you can safely reject the null. 78 00:03:10,835 --> 00:03:13,160 The null would be that the difference of mean equals to 79 00:03:13,160 --> 00:03:15,740 zero or in common parlance, 80 00:03:15,740 --> 00:03:17,180 what we are saying is 81 00:03:17,180 --> 00:03:19,525 that the two means are not the same. 82 00:03:19,525 --> 00:03:23,090 Now let us work with the assumption of the situation 83 00:03:23,090 --> 00:03:25,430 where we're testing if 84 00:03:25,430 --> 00:03:28,280 the difference of mean is less than zero, 85 00:03:28,280 --> 00:03:31,130 we are only interested in the left tail. 86 00:03:31,130 --> 00:03:33,350 Our alternative hypothesis is 87 00:03:33,350 --> 00:03:36,290 that the difference of mean is less than zero. 88 00:03:36,290 --> 00:03:38,675 In this case, the entire rejection region, 89 00:03:38,675 --> 00:03:41,330 that is 5 percent of the rejection region is 90 00:03:41,330 --> 00:03:44,285 to the left and in any situation in, 91 00:03:44,285 --> 00:03:45,995 for a one-tailed test, 92 00:03:45,995 --> 00:03:50,870 if we were to get the T-stat of 1.64 or less, 93 00:03:50,870 --> 00:03:56,300 we would reject the null that the mean is greater than 94 00:03:56,300 --> 00:03:58,595 zero in favor of 95 00:03:58,595 --> 00:04:03,680 the alternative that the difference is less than zero. 96 00:04:03,680 --> 00:04:06,050 The exact opposite to 97 00:04:06,050 --> 00:04:08,120 this would be the right tailed test. 98 00:04:08,120 --> 00:04:10,610 Where the alternative hypothesis is 99 00:04:10,610 --> 00:04:13,350 that the mean is greater than zero. 100 00:04:13,350 --> 00:04:15,890 And if you get the T-test statistics 101 00:04:15,890 --> 00:04:17,750 of greater than 1.64, 102 00:04:17,750 --> 00:04:19,460 for a right-tailed test, 103 00:04:19,460 --> 00:04:22,430 you reject the null in favor of the alternative that 104 00:04:22,430 --> 00:04:26,590 the mean difference is greater than zero.