1 00:00:00,025 --> 00:00:06,610 [MUSIC] 2 00:00:06,610 --> 00:00:09,300 Let me introduce you to normal distribution, 3 00:00:09,300 --> 00:00:14,660 which is one of the most commonly used distributions in statistical analysis, 4 00:00:14,660 --> 00:00:17,880 and even in everyday conversations. 5 00:00:17,880 --> 00:00:22,920 A large body of academic, scholarly and professional work rests 6 00:00:22,920 --> 00:00:27,890 on the assumption that the underlying data follows a normal distribution. 7 00:00:27,890 --> 00:00:32,350 The defining characteristics of normal distribution is this bell shaped curve 8 00:00:32,350 --> 00:00:36,570 which you're familiar with from your textbooks. 9 00:00:36,570 --> 00:00:41,710 Mathematically, normal distribution is presented by this equation. 10 00:00:41,710 --> 00:00:46,640 We can say that the normal distribution relies on three inputs and 11 00:00:46,640 --> 00:00:50,020 f stands for functions, so function of x, mu and sigma. 12 00:00:50,020 --> 00:00:55,880 X is your data, x is a random variable and it can attain any reasonable value. 13 00:00:55,880 --> 00:00:57,740 Mu stands for the mean. 14 00:00:57,740 --> 00:01:00,670 And sigma is standard deviation. 15 00:01:00,670 --> 00:01:03,910 And the mathematical formulation is right here, 16 00:01:03,910 --> 00:01:07,490 which is 1 divided by sigma times and 17 00:01:07,490 --> 00:01:13,580 then square root of 2 times pi, pi is 3.142 or 22/7. 18 00:01:13,580 --> 00:01:15,420 And then you have the exponential here. 19 00:01:15,420 --> 00:01:19,560 Do not forget this minus sign so exponential of this entity 20 00:01:19,560 --> 00:01:24,240 which is minus and in the numerator x- mean or 21 00:01:24,240 --> 00:01:30,160 x- mu d hold squared divided by 2 times sigma squared. 22 00:01:30,160 --> 00:01:31,754 So let me explain. 23 00:01:31,754 --> 00:01:35,144 1 divided by the standard deviation and 24 00:01:35,144 --> 00:01:40,240 then square root of 2 times pi, 2 is known and pi's known. 25 00:01:40,240 --> 00:01:43,030 So is the value for exponential and 26 00:01:43,030 --> 00:01:46,650 what is not known is the sigma which you'd obtained from the data, 27 00:01:46,650 --> 00:01:51,270 that is standard deviation and the mean, which is also coming from data. 28 00:01:51,270 --> 00:01:54,230 So you have the mean and the standard deviation, and 29 00:01:54,230 --> 00:01:57,790 x is the random variable who's mean and standard deviation you're using. 30 00:01:57,790 --> 00:02:01,560 You put this all together and then you get the normal distribution, 31 00:02:01,560 --> 00:02:04,640 the bell shaped curve that you saw earlier. 32 00:02:04,640 --> 00:02:07,202 I also introduce you to the standard normal. 33 00:02:07,202 --> 00:02:13,085 And standard normal is when we say that x is a variable that has a mean 0 and 34 00:02:13,085 --> 00:02:15,310 standard deviation of 1. 35 00:02:15,310 --> 00:02:18,907 So what's mean 0 and standard deviation 1 look like? 36 00:02:18,907 --> 00:02:23,889 If you replace mu with 0 and standard deviation or sigma with 1, 37 00:02:23,889 --> 00:02:28,555 the equation reduce to this entity which is 1 / 2 times pi. 38 00:02:28,555 --> 00:02:32,700 Notice the sigma here which have great out a bit so that it doesn't interfere. 39 00:02:32,700 --> 00:02:37,690 Sigma is 1 so one times anything would be the same, so I've removed this. 40 00:02:37,690 --> 00:02:42,360 And then e to the power -(x- mu), but because mu is 0 so 41 00:02:42,360 --> 00:02:47,280 x- 0 is x, so ((x)2/2) times sigma squared, sigma, 42 00:02:47,280 --> 00:02:50,800 remember is 1, the square of 1 is 1, so 2 times 1. 43 00:02:50,800 --> 00:02:53,633 So removing sigma because it's 1 and 44 00:02:53,633 --> 00:02:57,268 anything multiplied with 1 is the same entity. 45 00:02:57,268 --> 00:03:01,220 So how do I generate this normal density or a bell curve? 46 00:03:01,220 --> 00:03:05,320 Let's assume that the underlying variable has a mean 0 and 47 00:03:05,320 --> 00:03:09,340 the stand deviation of 1 and x varies between -4 and 4. 48 00:03:09,340 --> 00:03:15,770 So the mean is 0 and the minimum value is -4, the maximum value is 4. 49 00:03:15,770 --> 00:03:20,570 And I would substitute these -4 to 4 values in this equation. 50 00:03:20,570 --> 00:03:24,250 This is the only thing that's changing, the x here is the only entity that is 51 00:03:24,250 --> 00:03:29,590 changing, and let's see if this could generate the standard normal curve. 52 00:03:29,590 --> 00:03:34,608 >> Let us do this in Python, we'll use the matplotlib function which 53 00:03:34,608 --> 00:03:39,536 you are already familiar with for the graphics, NumPy library, 54 00:03:39,536 --> 00:03:44,209 as well as the norm.pdf function in the SciPy stats library. 55 00:03:44,209 --> 00:03:47,954 In this example, I have used increments of 0.1. 56 00:03:47,954 --> 00:03:52,182 This will generate the standard normal curve that you hear about in statistics. 57 00:03:52,182 --> 00:03:52,682 [MUSIC]