1 00:00:07,910 --> 00:00:12,510 Welcome back. So, we've already gone over the accumulation pattern, 2 00:00:12,510 --> 00:00:15,090 where you iterate over a sequence and update 3 00:00:15,090 --> 00:00:19,480 an accumulator variable as you iterate through every item in that sequence. 4 00:00:19,480 --> 00:00:23,460 In this lesson, we're going to go over the dictionary accumulation pattern, 5 00:00:23,460 --> 00:00:26,460 which is the same idea except our accumulator variable is 6 00:00:26,460 --> 00:00:30,075 going to be a dictionary that has multiple key value pairs. 7 00:00:30,075 --> 00:00:34,125 So, let's start out with the standard accumulation pattern example. 8 00:00:34,125 --> 00:00:35,730 So, on one here, 9 00:00:35,730 --> 00:00:40,625 we open a file named scarlet.txt and we open it to read it. 10 00:00:40,625 --> 00:00:44,045 Then we say, text is f.read. 11 00:00:44,045 --> 00:00:47,360 So, in other words, txt is a variable which is 12 00:00:47,360 --> 00:00:51,695 a string and it's the contents of the file scarlet.txt. 13 00:00:51,695 --> 00:00:56,990 So, suppose scarlet.txt represents the text of the book, The Scarlet Letter. 14 00:00:56,990 --> 00:01:04,550 So, let's say that we want to keep track of how many t's are in scarlet.txt. 15 00:01:04,550 --> 00:01:07,575 The way that we do that, is with the accumulation pattern. 16 00:01:07,575 --> 00:01:11,305 So, here, t_count is our accumulator variable. 17 00:01:11,305 --> 00:01:13,835 We initialize it to be 0 at first, 18 00:01:13,835 --> 00:01:16,715 to say that we've seen 0 t's so far, 19 00:01:16,715 --> 00:01:22,840 and then we iterate through every character in our text file and we say, 20 00:01:22,840 --> 00:01:26,130 if that character is the letter t, 21 00:01:26,130 --> 00:01:29,625 and then say, t_count equals t_count plus one. 22 00:01:29,625 --> 00:01:34,190 In other words, we just saw one more character t. So, 23 00:01:34,190 --> 00:01:37,060 by the time we're on line eight and we're done with our for loop, 24 00:01:37,060 --> 00:01:39,340 if we print out the value of t_count, 25 00:01:39,340 --> 00:01:44,270 then we should have the number of t's in scarlet.txt. 26 00:01:44,270 --> 00:01:47,305 So, if I run my code, 27 00:01:47,305 --> 00:01:49,190 then I'll see that here, 28 00:01:49,190 --> 00:01:56,650 there are 17,584 occurrences of the character t in scarlet.txt. 29 00:01:56,650 --> 00:01:59,920 So, that's great, that's the standard accumulation pattern, 30 00:01:59,920 --> 00:02:05,315 but let's suppose that we wanted to count more letters than just the letter t. So, 31 00:02:05,315 --> 00:02:10,855 suppose that we also wanted to keep track of how many s's are in scarlet.txt? 32 00:02:10,855 --> 00:02:14,210 Well, we could do that in almost the same way. 33 00:02:14,210 --> 00:02:18,270 So, here, we open up scarlet.txt again, 34 00:02:18,760 --> 00:02:22,200 and then we read it, 35 00:02:22,490 --> 00:02:25,705 except now on lines four and five, 36 00:02:25,705 --> 00:02:28,375 we create two different accumulator variables. 37 00:02:28,375 --> 00:02:32,780 We have t_count to keep track of the number of t's and we initialize that to 38 00:02:32,780 --> 00:02:38,255 zero and then we have s_count to keep track of the number of s's. 39 00:02:38,255 --> 00:02:43,460 Like before, we still iterate over every character in txt, 40 00:02:43,460 --> 00:02:46,745 but now we say, if that character is a t, 41 00:02:46,745 --> 00:02:49,585 then t-count is t_count plus one. 42 00:02:49,585 --> 00:02:51,390 If the character is an s, 43 00:02:51,390 --> 00:02:53,805 then we instead update s_count. 44 00:02:53,805 --> 00:02:56,385 So, by the end of our for loop, 45 00:02:56,385 --> 00:03:01,770 t_count is going to be the number of t's and s_count is going to be the number of s's. 46 00:03:01,770 --> 00:03:06,200 So, you could imagine doing this for any number of characters, 47 00:03:06,200 --> 00:03:08,840 but for every character that we would want to accumulate, 48 00:03:08,840 --> 00:03:12,710 we would have to create a new accumulator variable here. 49 00:03:12,710 --> 00:03:14,600 So, we might have a_count, 50 00:03:14,600 --> 00:03:18,960 then b_count, then c_count and so on. 51 00:03:18,960 --> 00:03:23,510 You can imagine that if we wanted to count every character in the alphabet, 52 00:03:23,510 --> 00:03:29,755 initializing 26 accumulated variables might be just a little bit lengthy codewise. 53 00:03:29,755 --> 00:03:32,400 So, one alternative way to do this, 54 00:03:32,400 --> 00:03:35,440 and it's going to seem a little bit weird at first, 55 00:03:35,440 --> 00:03:37,750 but I'm going to get to why we want to do this, 56 00:03:37,750 --> 00:03:40,839 is by instead using a dictionary. 57 00:03:40,839 --> 00:03:42,910 So, like the code before, 58 00:03:42,910 --> 00:03:46,955 we open up the files scarlet.txt, 59 00:03:46,955 --> 00:03:49,340 and then we read it in. 60 00:03:49,340 --> 00:03:54,100 Now, I'm going to have one accumulator variable which is a dictionary. 61 00:03:54,100 --> 00:03:58,045 So, I say, x equals an empty dictionary. 62 00:03:58,045 --> 00:03:59,770 Inside of that dictionary, 63 00:03:59,770 --> 00:04:02,260 we're going to have multiple key value pairs. 64 00:04:02,260 --> 00:04:05,920 So, if we wanted to still only count t's and s's, 65 00:04:05,920 --> 00:04:08,710 then rather than saying t_count equals zero, 66 00:04:08,710 --> 00:04:11,975 I'm going to say x sub t equals zero, 67 00:04:11,975 --> 00:04:14,145 and rather than saying s_count equals zero, 68 00:04:14,145 --> 00:04:17,520 I'm going to say x sub s equals zero. 69 00:04:17,520 --> 00:04:23,170 Now, these are just different key value pairs in this same dictionary x. 70 00:04:23,170 --> 00:04:24,925 So, now what we can do, 71 00:04:24,925 --> 00:04:28,885 is we can loop through every character in our file once again, 72 00:04:28,885 --> 00:04:30,970 and again we say, 73 00:04:30,970 --> 00:04:32,890 if that character is a t, 74 00:04:32,890 --> 00:04:39,270 then our dictionary x sub t equals its previous value plus one. 75 00:04:39,270 --> 00:04:41,840 If that character is an s, 76 00:04:41,840 --> 00:04:46,705 then x sub s gets incremented by one instead. 77 00:04:46,705 --> 00:04:49,765 Again, by the time our for loop is done, 78 00:04:49,765 --> 00:04:54,550 then we're going to have x sub t as the number of t's in our dictionary, 79 00:04:54,550 --> 00:04:59,960 and x sub s is going to be the number of s's in scarlet.txt. 80 00:05:00,630 --> 00:05:03,075 So, when we run our code, 81 00:05:03,075 --> 00:05:07,630 then we can see the number of t's and the number of s's. 82 00:05:07,630 --> 00:05:11,600 So, now, I'm going to make one really small change to our code. 83 00:05:11,600 --> 00:05:14,630 So, here, on line nine, 84 00:05:14,630 --> 00:05:20,915 this statement is inside of if c equals the character t. So, 85 00:05:20,915 --> 00:05:22,280 what we can do is, 86 00:05:22,280 --> 00:05:25,290 we can replace x sub t here. 87 00:05:25,290 --> 00:05:31,130 So, we can replace the hard coded t with the variable c. We know that this is going to be 88 00:05:31,130 --> 00:05:36,930 the same because here we only run this code if c is the character t. So, 89 00:05:36,930 --> 00:05:43,470 we can say, x sub c equals x sub c plus one, and here, 90 00:05:43,470 --> 00:05:49,560 we can say x sub c equals x sub c plus one, 91 00:05:49,560 --> 00:05:55,800 because this line is inside of this elif c equals equals the character s. So, 92 00:05:55,800 --> 00:05:59,300 in other words, what we're going to just do in the next piece of code is, 93 00:05:59,300 --> 00:06:01,910 we're going to replace the hard coded s and 94 00:06:01,910 --> 00:06:08,330 hard coded t with the value of the variable c. So, 95 00:06:08,330 --> 00:06:12,270 if we do that, we get something that looks like this. 96 00:06:12,270 --> 00:06:15,960 So, we say, if the character c is t, 97 00:06:15,960 --> 00:06:18,840 then say x sub c equals it, 98 00:06:18,840 --> 00:06:21,480 s previous value plus one. 99 00:06:21,480 --> 00:06:24,765 Now, because again, this is inside of an if statement, 100 00:06:24,765 --> 00:06:27,325 we know that c is going to be t here, 101 00:06:27,325 --> 00:06:31,670 but we'll get to why we actually want to make this change in a little bit. 102 00:06:31,670 --> 00:06:33,865 Same thing with this elif. 103 00:06:33,865 --> 00:06:37,150 So, we know that t is going to be the character s here, 104 00:06:37,150 --> 00:06:40,370 but we just replaced the hard coded s with 105 00:06:40,370 --> 00:06:43,820 the value of the variable c. If we run our code, 106 00:06:43,820 --> 00:06:47,550 we're going to get the exact same result as before. 107 00:06:47,900 --> 00:06:53,090 So, now I want to go into why we actually wanted to replace that 108 00:06:53,090 --> 00:06:58,340 hard coded t and hard-coded s with the value of the variable c. So, 109 00:06:58,340 --> 00:07:01,040 let's suppose that rather than just counting the number of 110 00:07:01,040 --> 00:07:04,670 t's and the number of s's in scarlet.txt, 111 00:07:04,670 --> 00:07:07,610 we wanted to count the number of every single character. 112 00:07:07,610 --> 00:07:09,860 So, the number of a's, b's, 113 00:07:09,860 --> 00:07:12,540 c's, s's and t's, 114 00:07:12,540 --> 00:07:15,165 spaces, exclamation points and so on. 115 00:07:15,165 --> 00:07:22,670 So, we could do that by replacing line four with a whole bunch of accumulated variables. 116 00:07:22,670 --> 00:07:27,200 So, a_count, b_count, c_count, exclamation point_count, etc., 117 00:07:27,200 --> 00:07:29,450 but then our code would get really long 118 00:07:29,450 --> 00:07:32,240 and really repetitive because we would need to initialize 119 00:07:32,240 --> 00:07:38,210 a separate accumulator variable for every single character that might be in scarlet.txt. 120 00:07:38,210 --> 00:07:40,670 Instead, what we're going to do is, 121 00:07:40,670 --> 00:07:42,500 the dictionary accumulation pattern. 122 00:07:42,500 --> 00:07:46,565 So, we're going to have one accumulator variable which is a dictionary. 123 00:07:46,565 --> 00:07:48,520 So, on line four, 124 00:07:48,520 --> 00:07:51,165 we say, x equals an empty dictionary. 125 00:07:51,165 --> 00:07:58,400 Then like before, we loop through every character in txt and what we do is, 126 00:07:58,400 --> 00:08:01,160 we have an if statement to say, 127 00:08:01,160 --> 00:08:05,445 if the character c is not in our accumulator dictionary. 128 00:08:05,445 --> 00:08:07,890 So, if c is not an x. 129 00:08:07,890 --> 00:08:12,770 So, in other words, if we haven't encountered this new character c yet, 130 00:08:12,770 --> 00:08:17,335 then we initialize x sub c to be zero. 131 00:08:17,335 --> 00:08:19,065 What that means is that, 132 00:08:19,065 --> 00:08:21,379 the first time we see the character a, 133 00:08:21,379 --> 00:08:28,045 then we're going to initialize x sub a to be zero. 134 00:08:28,045 --> 00:08:31,115 The first time we see the character t, 135 00:08:31,115 --> 00:08:39,020 then we initialize x sub t to be zero and so on. 136 00:08:39,020 --> 00:08:41,785 Now, here on line 11, 137 00:08:41,785 --> 00:08:44,130 still inside of this for loop, 138 00:08:44,130 --> 00:08:49,125 we say x sub c equals x sub c plus one. 139 00:08:49,125 --> 00:08:55,250 In other words, we add one to it's previous value for whatever this character c is. 140 00:08:55,250 --> 00:08:56,900 So, in other words, 141 00:08:56,900 --> 00:09:00,200 if the character c is the letter a, 142 00:09:00,200 --> 00:09:04,205 then we say x sub a equals x sub a plus one, 143 00:09:04,205 --> 00:09:06,415 and that says that we saw one more a. 144 00:09:06,415 --> 00:09:08,605 If the character is the letter t, 145 00:09:08,605 --> 00:09:13,745 then we say x sub t equals it's previous value plus one and so on. 146 00:09:13,745 --> 00:09:16,310 So, here on line 13, 147 00:09:16,310 --> 00:09:24,760 we just print out the number of characters t and the number of characters s on line 14. 148 00:09:24,910 --> 00:09:28,300 So, what we should find is that we actually get 149 00:09:28,300 --> 00:09:31,225 the exact same value for t and s as before. 150 00:09:31,225 --> 00:09:33,380 So, if I run my code, 151 00:09:33,630 --> 00:09:38,245 then we will see that we get the correct number of t's and s's, 152 00:09:38,245 --> 00:09:42,865 but what's great here is that we have more than just t's and s's collected. 153 00:09:42,865 --> 00:09:45,730 We can print out the number of any character we want. 154 00:09:45,730 --> 00:09:51,060 So, I can print out the number of a's just by saying, 155 00:09:51,060 --> 00:09:58,090 number of a's is x sub a and the number of b's is x sub b. 156 00:09:58,100 --> 00:10:01,450 You'll see that our dictionary is keeping track of 157 00:10:01,450 --> 00:10:06,455 every single letter that might occur in scarlet.txt. 158 00:10:06,455 --> 00:10:08,800 So, to illustrate how this works, 159 00:10:08,800 --> 00:10:11,420 I'm going to go with just a slightly simpler example. 160 00:10:11,420 --> 00:10:16,565 So, rather than assigning txt to be the value in scarlet.txt, 161 00:10:16,565 --> 00:10:17,760 I'm just going to say, 162 00:10:17,760 --> 00:10:24,230 txt equals the string MICHIGAN and I am going to leave the rest of the code the same, 163 00:10:24,230 --> 00:10:28,745 except, I'm going to come out lines 12 through 15. 164 00:10:28,745 --> 00:10:32,405 Now, I'm going to run my code in code lens. 165 00:10:32,405 --> 00:10:37,120 So, you can see line one initializes txt to be MICHIGAN. 166 00:10:37,120 --> 00:10:42,935 Line three initializes our accumulator variable x to start out as an empty dictionary, 167 00:10:42,935 --> 00:10:48,895 and then we're going to loop through every character inside of our string MICHIGAN. 168 00:10:48,895 --> 00:10:54,320 So, c is first assigned to the character M because that's the first character, 169 00:10:54,320 --> 00:10:57,665 and we say, is c in our dictionary already? 170 00:10:57,665 --> 00:10:59,000 In our case, it's not. 171 00:10:59,000 --> 00:11:04,120 So, we assign x sub c or x sub M to be zero. 172 00:11:04,120 --> 00:11:10,040 Now on line 10, we immediately increment the value associated with x sub M. So, 173 00:11:10,040 --> 00:11:13,145 we say, we've seen one M so far. 174 00:11:13,145 --> 00:11:17,080 With the character I, then we see that I is not in our dictionary. 175 00:11:17,080 --> 00:11:19,680 So, we initialize x sub I to be 176 00:11:19,680 --> 00:11:24,510 zero and we set it's value to it's previous value plus one. 177 00:11:24,510 --> 00:11:26,915 With, it's not in our dictionary, 178 00:11:26,915 --> 00:11:30,400 so we initialize it to zero and then increment it. 179 00:11:30,400 --> 00:11:36,105 With H , we set it to zero and then increment it. 180 00:11:36,105 --> 00:11:39,650 So, you can see that our dictionary keeps building 181 00:11:39,650 --> 00:11:43,375 up the number of times that we've seen every given letter. 182 00:11:43,375 --> 00:11:45,870 So, at this point, we've seen M, 183 00:11:45,870 --> 00:11:48,605 I, C H and so far, 184 00:11:48,605 --> 00:11:51,920 none of these characters were in our dictionary and so our dictionary was 185 00:11:51,920 --> 00:11:55,915 just adding a new key value pair for every letter that we saw. 186 00:11:55,915 --> 00:12:00,560 Now, the next character is going to be this letter I. 187 00:12:00,560 --> 00:12:04,085 So, you can see, c is the character I. 188 00:12:04,085 --> 00:12:06,140 Now, the key point here is that, 189 00:12:06,140 --> 00:12:08,510 I is already in our dictionary. 190 00:12:08,510 --> 00:12:13,785 So, here, this if is not going to execute, 191 00:12:13,785 --> 00:12:16,560 because I is in our dictionary, and so, 192 00:12:16,560 --> 00:12:19,825 rather than setting x sub i to be zero, 193 00:12:19,825 --> 00:12:22,720 we just increment it on line 10. 194 00:12:22,720 --> 00:12:26,370 So, you'll see, x sub I go from one to two, 195 00:12:26,370 --> 00:12:29,085 to say that we've seen two I's so far. 196 00:12:29,085 --> 00:12:33,315 In the case of g, it's not in our dictionary, 197 00:12:33,315 --> 00:12:38,420 so we add a new key value pair and increment it's value and so on. 198 00:12:38,420 --> 00:12:40,775 So, by the time our for loop is done, 199 00:12:40,775 --> 00:12:42,845 then we end up with the dictionary, 200 00:12:42,845 --> 00:12:47,120 where every key is a character in our text txt 201 00:12:47,120 --> 00:12:51,235 and every value is the number of times that we've actually seen that letter. 202 00:12:51,235 --> 00:12:54,705 So, we'll see dictionary accumulation a lot and 203 00:12:54,705 --> 00:12:58,085 really I find that it just takes a lot of practice to get used to. 204 00:12:58,085 --> 00:13:00,365 It's a little bit counterintuitive at first, 205 00:13:00,365 --> 00:13:02,450 but we're going to go through more examples with 206 00:13:02,450 --> 00:13:04,520 dictionary accumulation and I think it's 207 00:13:04,520 --> 00:13:07,160 going to make more and more sense with more practice. 208 00:13:07,160 --> 00:13:10,230 That's all for now, until next time.