1 00:00:00,025 --> 00:00:06,970 [MUSIC] 2 00:00:06,970 --> 00:00:11,555 In this video we will illustrate how to use regression analysis in place of 3 00:00:11,555 --> 00:00:12,740 at t-test. 4 00:00:12,740 --> 00:00:17,380 We will begin with a question, and the question is, is there a statistically 5 00:00:17,380 --> 00:00:22,110 significant difference in teaching evaluation scores for men and women? 6 00:00:22,110 --> 00:00:26,841 When we compute, the averages while using the teaching evaluation data set, 7 00:00:26,841 --> 00:00:31,571 we find that the teaching evaluation score for women is around 3.9, and for 8 00:00:31,571 --> 00:00:33,448 men, it's around 4.06. 9 00:00:33,448 --> 00:00:37,589 The question is, is this difference, even though it's small, 10 00:00:37,589 --> 00:00:39,563 statistically significant. 11 00:00:39,563 --> 00:00:44,299 We can run at t-test using Python and compute the statistical significance for 12 00:00:44,299 --> 00:00:45,180 the t-test. 13 00:00:45,180 --> 00:00:49,740 Here, our conclusion is that the teaching evaluation scores difference between men 14 00:00:49,740 --> 00:00:53,010 and women, is statistically significant. 15 00:00:53,010 --> 00:00:56,635 What if we were to do the same thing with the regression model? 16 00:00:56,635 --> 00:00:59,223 We will do the linear regression in Python. 17 00:00:59,223 --> 00:01:01,830 We will be using the statsmodel library. 18 00:01:01,830 --> 00:01:06,360 We will create a list for the independent variable, that is the female variable, 19 00:01:06,360 --> 00:01:09,160 which has been turned to a binary variable, 20 00:01:09,160 --> 00:01:12,800 where 1 equals female and 0 is male. 21 00:01:12,800 --> 00:01:15,951 We will also create a list for the dependent variable, 22 00:01:15,951 --> 00:01:17,674 teaching evaluation score. 23 00:01:17,674 --> 00:01:22,212 We will manually add the constant beta zero, then we will fit and 24 00:01:22,212 --> 00:01:26,010 make predictions, and print out the model summary. 25 00:01:26,010 --> 00:01:30,624 The model summary will print out a table like this. 26 00:01:31,800 --> 00:01:36,324 But we are only interested in this part of this table for the t-test, 27 00:01:36,324 --> 00:01:40,700 it prints out the coefficient error, t statistics and pin value. 28 00:01:40,700 --> 00:01:45,666 We can see the t statistics for the female variable is negative 3.25, 29 00:01:45,666 --> 00:01:48,242 and the P value is less than 0.05. 30 00:01:48,242 --> 00:01:53,220 That means that there is a statistical difference in mean values for male and 31 00:01:53,220 --> 00:01:54,710 female instructors. 32 00:01:54,710 --> 00:01:59,415 The coefficient means that you are most likely to lose about 0.17 marks for 33 00:01:59,415 --> 00:02:00,421 being a female. 34 00:02:00,421 --> 00:02:03,928 We can see that the results from using a regression model, and 35 00:02:03,928 --> 00:02:06,842 the conclusion is identical, if we run a t-test. 36 00:02:06,842 --> 00:02:07,342 [MUSIC]