1 00:00:00,025 --> 00:00:08,117 [MUSIC] 2 00:00:08,117 --> 00:00:13,947 Now, let us visit some basic definitions about probability, 3 00:00:13,947 --> 00:00:19,890 as it relates to the most commonly used concepts in statistics. 4 00:00:19,890 --> 00:00:23,790 Essentially, probability is a measure between zero and 5 00:00:23,790 --> 00:00:28,610 one for the likelihood that something or some event might occur. 6 00:00:28,610 --> 00:00:33,270 For instance, you may hear that the stock markets, the chance of stock market's 7 00:00:33,270 --> 00:00:37,930 rising above some point, or falling below some point is x%, 8 00:00:37,930 --> 00:00:42,370 or you may hear that the chance for rain is 45% tonight. 9 00:00:42,370 --> 00:00:47,410 These are all coming from this very concept of probability. 10 00:00:47,410 --> 00:00:51,497 Essentially, probability is a measure between zero and one, so 11 00:00:51,497 --> 00:00:56,329 45% would be 0.45, the discussion about probability is not complete 12 00:00:56,329 --> 00:00:59,345 without a discussion about random variables. 13 00:00:59,345 --> 00:01:04,319 Essentially, random variable is a quantity whose possible values 14 00:01:04,319 --> 00:01:09,680 depend in some clearly defined way on a set of some random events. 15 00:01:09,680 --> 00:01:14,160 It's a function that maps out outcomes, that is, points in a probability space. 16 00:01:14,160 --> 00:01:19,197 So, probability space essentially is all possible outcomes, If you roll a die, 17 00:01:19,197 --> 00:01:23,943 it can have one out of six outcomes, so that's the probability space there. 18 00:01:23,943 --> 00:01:28,751 If you roll two dice, you can have one out of 36 outcomes where each outcome could 19 00:01:28,751 --> 00:01:30,712 be considered a random outcome. 20 00:01:30,712 --> 00:01:35,080 And probability distribution is a theoretical model that depicts 21 00:01:35,080 --> 00:01:39,136 the possible values any random variable may assume along with 22 00:01:39,136 --> 00:01:41,563 the probability of its occurrence. 23 00:01:41,563 --> 00:01:44,764 We'll define this more with examples using two dice. 24 00:01:44,764 --> 00:01:50,139 So, consider two dice, a die has six faces, and if you roll two dice, 25 00:01:50,139 --> 00:01:53,890 it can assume one out of 36 discrete outcomes. 26 00:01:53,890 --> 00:01:58,976 So, if you were to roll two dice, the probability that 27 00:01:58,976 --> 00:02:04,173 both die will one as the outcome will be 1 + 1 = 2, and 28 00:02:04,173 --> 00:02:11,163 there's only one possibility of getting that and that's one out of 36. 29 00:02:11,163 --> 00:02:16,077 So, here we have two die, one black and one white, and if you were to look at 30 00:02:16,077 --> 00:02:21,390 the possibility of getting one on black and two on white and that's one outcome. 31 00:02:21,390 --> 00:02:25,194 So, that's 1 + 2 is 3 or you can have two on black and one on white, 32 00:02:25,194 --> 00:02:26,780 that's 2 + 1 = 3 again. 33 00:02:26,780 --> 00:02:33,070 So, there are two ways of getting three by rolling two dice, so the outcome or 34 00:02:33,070 --> 00:02:39,050 the probability is two out of 36 possible outcomes that are mapped out here. 35 00:02:39,050 --> 00:02:41,796 So, if you think about the sum of two dice being two, 36 00:02:41,796 --> 00:02:44,120 there's only one possibility out of 36. 37 00:02:44,120 --> 00:02:47,955 Getting a three, you have two possibilities, getting a four, 38 00:02:47,955 --> 00:02:52,785 you have three possibilities, getting a five, you have four possibilities and 39 00:02:52,785 --> 00:02:53,720 so on so forth. 40 00:02:53,720 --> 00:02:58,843 The maximum most frequently possible number to have as a sum of two 41 00:02:58,843 --> 00:03:04,910 dice is seven, and the probability is six out of 36, which is 0.167. 42 00:03:04,910 --> 00:03:08,574 And if you were to sum these probabilities up and 43 00:03:08,574 --> 00:03:12,800 that is 0.02 + 0.056, you get 0.083. 44 00:03:12,800 --> 00:03:16,876 So, if you sum up all these, they all sum up to one and 45 00:03:16,876 --> 00:03:21,886 the probability of getting six or greater than six is 0.58 or 46 00:03:21,886 --> 00:03:25,530 getting less than equal to six is 0.417. 47 00:03:25,530 --> 00:03:29,732 You sum these up, it's 1.028 + 0.972 is 1, 48 00:03:29,732 --> 00:03:34,820 this plus this is 1, this plus this is 1 and obviously 1 + 0 is 1. 49 00:03:34,820 --> 00:03:37,933 The probability of getting 12, that is, 50 00:03:37,933 --> 00:03:43,161 both die show six is 1 out of 36 possible outcomes which is 0.028. 51 00:03:43,161 --> 00:03:48,078 So, the probability sums up to 1 and the probability of getting more than 12 52 00:03:48,078 --> 00:03:52,490 is obviously 0 because the two dice can maximum produce this number. 53 00:03:52,490 --> 00:03:56,970 So, nice way of looking at the way a probability distribution 54 00:03:56,970 --> 00:04:01,590 space is created by rolling two dice. 55 00:04:01,590 --> 00:04:07,110 And if I were to look at the probability of say, getting some number or 56 00:04:07,110 --> 00:04:11,040 less than some number, that's called the cumulative distribution function. 57 00:04:11,040 --> 00:04:15,810 If you were to simply chart this probability 58 00:04:15,810 --> 00:04:20,459 outcomes in a chart, you get this graph here. 59 00:04:20,459 --> 00:04:24,980 So, here we have age as our variable and I created a histogram of age, and 60 00:04:24,980 --> 00:04:30,560 then using the mean value of 48.37 which is the mean average age and 61 00:04:30,560 --> 00:04:33,020 a standard deviation of 9.8 years. 62 00:04:33,020 --> 00:04:39,829 I can fit a theoretical normal distribution with these two parameters. 63 00:04:39,829 --> 00:04:40,329 [MUSIC]