1
00:00:07,910 --> 00:00:12,510
Welcome back. So, we've already gone over the accumulation pattern,

2
00:00:12,510 --> 00:00:15,090
where you iterate over a sequence and update

3
00:00:15,090 --> 00:00:19,480
an accumulator variable as you iterate through every item in that sequence.

4
00:00:19,480 --> 00:00:23,460
In this lesson, we're going to go over the dictionary accumulation pattern,

5
00:00:23,460 --> 00:00:26,460
which is the same idea except our accumulator variable is

6
00:00:26,460 --> 00:00:30,075
going to be a dictionary that has multiple key value pairs.

7
00:00:30,075 --> 00:00:34,125
So, let's start out with the standard accumulation pattern example.

8
00:00:34,125 --> 00:00:35,730
So, on one here,

9
00:00:35,730 --> 00:00:40,625
we open a file named scarlet.txt and we open it to read it.

10
00:00:40,625 --> 00:00:44,045
Then we say, text is f.read.

11
00:00:44,045 --> 00:00:47,360
So, in other words, txt is a variable which is

12
00:00:47,360 --> 00:00:51,695
a string and it's the contents of the file scarlet.txt.

13
00:00:51,695 --> 00:00:56,990
So, suppose scarlet.txt represents the text of the book, The Scarlet Letter.

14
00:00:56,990 --> 00:01:04,550
So, let's say that we want to keep track of how many t's are in scarlet.txt.

15
00:01:04,550 --> 00:01:07,575
The way that we do that, is with the accumulation pattern.

16
00:01:07,575 --> 00:01:11,305
So, here, t_count is our accumulator variable.

17
00:01:11,305 --> 00:01:13,835
We initialize it to be 0 at first,

18
00:01:13,835 --> 00:01:16,715
to say that we've seen 0 t's so far,

19
00:01:16,715 --> 00:01:22,840
and then we iterate through every character in our text file and we say,

20
00:01:22,840 --> 00:01:26,130
if that character is the letter t,

21
00:01:26,130 --> 00:01:29,625
and then say, t_count equals t_count plus one.

22
00:01:29,625 --> 00:01:34,190
In other words, we just saw one more character t. So,

23
00:01:34,190 --> 00:01:37,060
by the time we're on line eight and we're done with our for loop,

24
00:01:37,060 --> 00:01:39,340
if we print out the value of t_count,

25
00:01:39,340 --> 00:01:44,270
then we should have the number of t's in scarlet.txt.

26
00:01:44,270 --> 00:01:47,305
So, if I run my code,

27
00:01:47,305 --> 00:01:49,190
then I'll see that here,

28
00:01:49,190 --> 00:01:56,650
there are 17,584 occurrences of the character t in scarlet.txt.

29
00:01:56,650 --> 00:01:59,920
So, that's great, that's the standard accumulation pattern,

30
00:01:59,920 --> 00:02:05,315
but let's suppose that we wanted to count more letters than just the letter t. So,

31
00:02:05,315 --> 00:02:10,855
suppose that we also wanted to keep track of how many s's are in scarlet.txt?

32
00:02:10,855 --> 00:02:14,210
Well, we could do that in almost the same way.

33
00:02:14,210 --> 00:02:18,270
So, here, we open up scarlet.txt again,

34
00:02:18,760 --> 00:02:22,200
and then we read it,

35
00:02:22,490 --> 00:02:25,705
except now on lines four and five,

36
00:02:25,705 --> 00:02:28,375
we create two different accumulator variables.

37
00:02:28,375 --> 00:02:32,780
We have t_count to keep track of the number of t's and we initialize that to

38
00:02:32,780 --> 00:02:38,255
zero and then we have s_count to keep track of the number of s's.

39
00:02:38,255 --> 00:02:43,460
Like before, we still iterate over every character in txt,

40
00:02:43,460 --> 00:02:46,745
but now we say, if that character is a t,

41
00:02:46,745 --> 00:02:49,585
then t-count is t_count plus one.

42
00:02:49,585 --> 00:02:51,390
If the character is an s,

43
00:02:51,390 --> 00:02:53,805
then we instead update s_count.

44
00:02:53,805 --> 00:02:56,385
So, by the end of our for loop,

45
00:02:56,385 --> 00:03:01,770
t_count is going to be the number of t's and s_count is going to be the number of s's.

46
00:03:01,770 --> 00:03:06,200
So, you could imagine doing this for any number of characters,

47
00:03:06,200 --> 00:03:08,840
but for every character that we would want to accumulate,

48
00:03:08,840 --> 00:03:12,710
we would have to create a new accumulator variable here.

49
00:03:12,710 --> 00:03:14,600
So, we might have a_count,

50
00:03:14,600 --> 00:03:18,960
then b_count, then c_count and so on.

51
00:03:18,960 --> 00:03:23,510
You can imagine that if we wanted to count every character in the alphabet,

52
00:03:23,510 --> 00:03:29,755
initializing 26 accumulated variables might be just a little bit lengthy codewise.

53
00:03:29,755 --> 00:03:32,400
So, one alternative way to do this,

54
00:03:32,400 --> 00:03:35,440
and it's going to seem a little bit weird at first,

55
00:03:35,440 --> 00:03:37,750
but I'm going to get to why we want to do this,

56
00:03:37,750 --> 00:03:40,839
is by instead using a dictionary.

57
00:03:40,839 --> 00:03:42,910
So, like the code before,

58
00:03:42,910 --> 00:03:46,955
we open up the files scarlet.txt,

59
00:03:46,955 --> 00:03:49,340
and then we read it in.

60
00:03:49,340 --> 00:03:54,100
Now, I'm going to have one accumulator variable which is a dictionary.

61
00:03:54,100 --> 00:03:58,045
So, I say, x equals an empty dictionary.

62
00:03:58,045 --> 00:03:59,770
Inside of that dictionary,

63
00:03:59,770 --> 00:04:02,260
we're going to have multiple key value pairs.

64
00:04:02,260 --> 00:04:05,920
So, if we wanted to still only count t's and s's,

65
00:04:05,920 --> 00:04:08,710
then rather than saying t_count equals zero,

66
00:04:08,710 --> 00:04:11,975
I'm going to say x sub t equals zero,

67
00:04:11,975 --> 00:04:14,145
and rather than saying s_count equals zero,

68
00:04:14,145 --> 00:04:17,520
I'm going to say x sub s equals zero.

69
00:04:17,520 --> 00:04:23,170
Now, these are just different key value pairs in this same dictionary x.

70
00:04:23,170 --> 00:04:24,925
So, now what we can do,

71
00:04:24,925 --> 00:04:28,885
is we can loop through every character in our file once again,

72
00:04:28,885 --> 00:04:30,970
and again we say,

73
00:04:30,970 --> 00:04:32,890
if that character is a t,

74
00:04:32,890 --> 00:04:39,270
then our dictionary x sub t equals its previous value plus one.

75
00:04:39,270 --> 00:04:41,840
If that character is an s,

76
00:04:41,840 --> 00:04:46,705
then x sub s gets incremented by one instead.

77
00:04:46,705 --> 00:04:49,765
Again, by the time our for loop is done,

78
00:04:49,765 --> 00:04:54,550
then we're going to have x sub t as the number of t's in our dictionary,

79
00:04:54,550 --> 00:04:59,960
and x sub s is going to be the number of s's in scarlet.txt.

80
00:05:00,630 --> 00:05:03,075
So, when we run our code,

81
00:05:03,075 --> 00:05:07,630
then we can see the number of t's and the number of s's.

82
00:05:07,630 --> 00:05:11,600
So, now, I'm going to make one really small change to our code.

83
00:05:11,600 --> 00:05:14,630
So, here, on line nine,

84
00:05:14,630 --> 00:05:20,915
this statement is inside of if c equals the character t. So,

85
00:05:20,915 --> 00:05:22,280
what we can do is,

86
00:05:22,280 --> 00:05:25,290
we can replace x sub t here.

87
00:05:25,290 --> 00:05:31,130
So, we can replace the hard coded t with the variable c. We know that this is going to be

88
00:05:31,130 --> 00:05:36,930
the same because here we only run this code if c is the character t. So,

89
00:05:36,930 --> 00:05:43,470
we can say, x sub c equals x sub c plus one, and here,

90
00:05:43,470 --> 00:05:49,560
we can say x sub c equals x sub c plus one,

91
00:05:49,560 --> 00:05:55,800
because this line is inside of this elif c equals equals the character s. So,

92
00:05:55,800 --> 00:05:59,300
in other words, what we're going to just do in the next piece of code is,

93
00:05:59,300 --> 00:06:01,910
we're going to replace the hard coded s and

94
00:06:01,910 --> 00:06:08,330
hard coded t with the value of the variable c. So,

95
00:06:08,330 --> 00:06:12,270
if we do that, we get something that looks like this.

96
00:06:12,270 --> 00:06:15,960
So, we say, if the character c is t,

97
00:06:15,960 --> 00:06:18,840
then say x sub c equals it,

98
00:06:18,840 --> 00:06:21,480
s previous value plus one.

99
00:06:21,480 --> 00:06:24,765
Now, because again, this is inside of an if statement,

100
00:06:24,765 --> 00:06:27,325
we know that c is going to be t here,

101
00:06:27,325 --> 00:06:31,670
but we'll get to why we actually want to make this change in a little bit.

102
00:06:31,670 --> 00:06:33,865
Same thing with this elif.

103
00:06:33,865 --> 00:06:37,150
So, we know that t is going to be the character s here,

104
00:06:37,150 --> 00:06:40,370
but we just replaced the hard coded s with

105
00:06:40,370 --> 00:06:43,820
the value of the variable c. If we run our code,

106
00:06:43,820 --> 00:06:47,550
we're going to get the exact same result as before.

107
00:06:47,900 --> 00:06:53,090
So, now I want to go into why we actually wanted to replace that

108
00:06:53,090 --> 00:06:58,340
hard coded t and hard-coded s with the value of the variable c. So,

109
00:06:58,340 --> 00:07:01,040
let's suppose that rather than just counting the number of

110
00:07:01,040 --> 00:07:04,670
t's and the number of s's in scarlet.txt,

111
00:07:04,670 --> 00:07:07,610
we wanted to count the number of every single character.

112
00:07:07,610 --> 00:07:09,860
So, the number of a's, b's,

113
00:07:09,860 --> 00:07:12,540
c's, s's and t's,

114
00:07:12,540 --> 00:07:15,165
spaces, exclamation points and so on.

115
00:07:15,165 --> 00:07:22,670
So, we could do that by replacing line four with a whole bunch of accumulated variables.

116
00:07:22,670 --> 00:07:27,200
So, a_count, b_count, c_count, exclamation point_count, etc.,

117
00:07:27,200 --> 00:07:29,450
but then our code would get really long

118
00:07:29,450 --> 00:07:32,240
and really repetitive because we would need to initialize

119
00:07:32,240 --> 00:07:38,210
a separate accumulator variable for every single character that might be in scarlet.txt.

120
00:07:38,210 --> 00:07:40,670
Instead, what we're going to do is,

121
00:07:40,670 --> 00:07:42,500
the dictionary accumulation pattern.

122
00:07:42,500 --> 00:07:46,565
So, we're going to have one accumulator variable which is a dictionary.

123
00:07:46,565 --> 00:07:48,520
So, on line four,

124
00:07:48,520 --> 00:07:51,165
we say, x equals an empty dictionary.

125
00:07:51,165 --> 00:07:58,400
Then like before, we loop through every character in txt and what we do is,

126
00:07:58,400 --> 00:08:01,160
we have an if statement to say,

127
00:08:01,160 --> 00:08:05,445
if the character c is not in our accumulator dictionary.

128
00:08:05,445 --> 00:08:07,890
So, if c is not an x.

129
00:08:07,890 --> 00:08:12,770
So, in other words, if we haven't encountered this new character c yet,

130
00:08:12,770 --> 00:08:17,335
then we initialize x sub c to be zero.

131
00:08:17,335 --> 00:08:19,065
What that means is that,

132
00:08:19,065 --> 00:08:21,379
the first time we see the character a,

133
00:08:21,379 --> 00:08:28,045
then we're going to initialize x sub a to be zero.

134
00:08:28,045 --> 00:08:31,115
The first time we see the character t,

135
00:08:31,115 --> 00:08:39,020
then we initialize x sub t to be zero and so on.

136
00:08:39,020 --> 00:08:41,785
Now, here on line 11,

137
00:08:41,785 --> 00:08:44,130
still inside of this for loop,

138
00:08:44,130 --> 00:08:49,125
we say x sub c equals x sub c plus one.

139
00:08:49,125 --> 00:08:55,250
In other words, we add one to it's previous value for whatever this character c is.

140
00:08:55,250 --> 00:08:56,900
So, in other words,

141
00:08:56,900 --> 00:09:00,200
if the character c is the letter a,

142
00:09:00,200 --> 00:09:04,205
then we say x sub a equals x sub a plus one,

143
00:09:04,205 --> 00:09:06,415
and that says that we saw one more a.

144
00:09:06,415 --> 00:09:08,605
If the character is the letter t,

145
00:09:08,605 --> 00:09:13,745
then we say x sub t equals it's previous value plus one and so on.

146
00:09:13,745 --> 00:09:16,310
So, here on line 13,

147
00:09:16,310 --> 00:09:24,760
we just print out the number of characters t and the number of characters s on line 14.

148
00:09:24,910 --> 00:09:28,300
So, what we should find is that we actually get

149
00:09:28,300 --> 00:09:31,225
the exact same value for t and s as before.

150
00:09:31,225 --> 00:09:33,380
So, if I run my code,

151
00:09:33,630 --> 00:09:38,245
then we will see that we get the correct number of t's and s's,

152
00:09:38,245 --> 00:09:42,865
but what's great here is that we have more than just t's and s's collected.

153
00:09:42,865 --> 00:09:45,730
We can print out the number of any character we want.

154
00:09:45,730 --> 00:09:51,060
So, I can print out the number of a's just by saying,

155
00:09:51,060 --> 00:09:58,090
number of a's is x sub a and the number of b's is x sub b.

156
00:09:58,100 --> 00:10:01,450
You'll see that our dictionary is keeping track of

157
00:10:01,450 --> 00:10:06,455
every single letter that might occur in scarlet.txt.

158
00:10:06,455 --> 00:10:08,800
So, to illustrate how this works,

159
00:10:08,800 --> 00:10:11,420
I'm going to go with just a slightly simpler example.

160
00:10:11,420 --> 00:10:16,565
So, rather than assigning txt to be the value in scarlet.txt,

161
00:10:16,565 --> 00:10:17,760
I'm just going to say,

162
00:10:17,760 --> 00:10:24,230
txt equals the string MICHIGAN and I am going to leave the rest of the code the same,

163
00:10:24,230 --> 00:10:28,745
except, I'm going to come out lines 12 through 15.

164
00:10:28,745 --> 00:10:32,405
Now, I'm going to run my code in code lens.

165
00:10:32,405 --> 00:10:37,120
So, you can see line one initializes txt to be MICHIGAN.

166
00:10:37,120 --> 00:10:42,935
Line three initializes our accumulator variable x to start out as an empty dictionary,

167
00:10:42,935 --> 00:10:48,895
and then we're going to loop through every character inside of our string MICHIGAN.

168
00:10:48,895 --> 00:10:54,320
So, c is first assigned to the character M because that's the first character,

169
00:10:54,320 --> 00:10:57,665
and we say, is c in our dictionary already?

170
00:10:57,665 --> 00:10:59,000
In our case, it's not.

171
00:10:59,000 --> 00:11:04,120
So, we assign x sub c or x sub M to be zero.

172
00:11:04,120 --> 00:11:10,040
Now on line 10, we immediately increment the value associated with x sub M. So,

173
00:11:10,040 --> 00:11:13,145
we say, we've seen one M so far.

174
00:11:13,145 --> 00:11:17,080
With the character I, then we see that I is not in our dictionary.

175
00:11:17,080 --> 00:11:19,680
So, we initialize x sub I to be

176
00:11:19,680 --> 00:11:24,510
zero and we set it's value to it's previous value plus one.

177
00:11:24,510 --> 00:11:26,915
With, it's not in our dictionary,

178
00:11:26,915 --> 00:11:30,400
so we initialize it to zero and then increment it.

179
00:11:30,400 --> 00:11:36,105
With H , we set it to zero and then increment it.

180
00:11:36,105 --> 00:11:39,650
So, you can see that our dictionary keeps building

181
00:11:39,650 --> 00:11:43,375
up the number of times that we've seen every given letter.

182
00:11:43,375 --> 00:11:45,870
So, at this point, we've seen M,

183
00:11:45,870 --> 00:11:48,605
I, C H and so far,

184
00:11:48,605 --> 00:11:51,920
none of these characters were in our dictionary and so our dictionary was

185
00:11:51,920 --> 00:11:55,915
just adding a new key value pair for every letter that we saw.

186
00:11:55,915 --> 00:12:00,560
Now, the next character is going to be this letter I.

187
00:12:00,560 --> 00:12:04,085
So, you can see, c is the character I.

188
00:12:04,085 --> 00:12:06,140
Now, the key point here is that,

189
00:12:06,140 --> 00:12:08,510
I is already in our dictionary.

190
00:12:08,510 --> 00:12:13,785
So, here, this if is not going to execute,

191
00:12:13,785 --> 00:12:16,560
because I is in our dictionary, and so,

192
00:12:16,560 --> 00:12:19,825
rather than setting x sub i to be zero,

193
00:12:19,825 --> 00:12:22,720
we just increment it on line 10.

194
00:12:22,720 --> 00:12:26,370
So, you'll see, x sub I go from one to two,

195
00:12:26,370 --> 00:12:29,085
to say that we've seen two I's so far.

196
00:12:29,085 --> 00:12:33,315
In the case of g, it's not in our dictionary,

197
00:12:33,315 --> 00:12:38,420
so we add a new key value pair and increment it's value and so on.

198
00:12:38,420 --> 00:12:40,775
So, by the time our for loop is done,

199
00:12:40,775 --> 00:12:42,845
then we end up with the dictionary,

200
00:12:42,845 --> 00:12:47,120
where every key is a character in our text txt

201
00:12:47,120 --> 00:12:51,235
and every value is the number of times that we've actually seen that letter.

202
00:12:51,235 --> 00:12:54,705
So, we'll see dictionary accumulation a lot and

203
00:12:54,705 --> 00:12:58,085
really I find that it just takes a lot of practice to get used to.

204
00:12:58,085 --> 00:13:00,365
It's a little bit counterintuitive at first,

205
00:13:00,365 --> 00:13:02,450
but we're going to go through more examples with

206
00:13:02,450 --> 00:13:04,520
dictionary accumulation and I think it's

207
00:13:04,520 --> 00:13:07,160
going to make more and more sense with more practice.

208
00:13:07,160 --> 00:13:10,230
That's all for now, until next time.