1
00:00:08,390 --> 00:00:11,790
OpenCV supports
reading of images in

2
00:00:11,790 --> 00:00:15,465
most file formats such as
JPEGs, PNGs, and TIFF.

3
00:00:15,465 --> 00:00:18,240
Most image and
video analysis requires

4
00:00:18,240 --> 00:00:20,955
converting images into
gray scale first.

5
00:00:20,955 --> 00:00:23,190
This simplifies the
image and reduces

6
00:00:23,190 --> 00:00:26,070
noise while allowing
for improved analysis.

7
00:00:26,070 --> 00:00:28,260
Let's write some code
that reads in an image

8
00:00:28,260 --> 00:00:30,180
of a person Floyd Mayweather,

9
00:00:30,180 --> 00:00:32,460
and converts it into gray scale.

10
00:00:32,460 --> 00:00:36,480
So first we're going to import
the openCV package cv2.

11
00:00:36,480 --> 00:00:39,630
So import cv2 as cv and then

12
00:00:39,630 --> 00:00:42,710
we'll load the Floyd
dot JPEG image.

13
00:00:42,710 --> 00:00:45,440
So image equals cv dot read from

14
00:00:45,440 --> 00:00:48,150
read-only Floyd
dot JPEG and we'll

15
00:00:48,150 --> 00:00:51,590
convert it to gray scale
using this cv color image.

16
00:00:51,590 --> 00:00:55,610
So gray equals cv dot cvtColor.

17
00:00:55,610 --> 00:00:59,880
This is function
in openCV passing

18
00:00:59,880 --> 00:01:04,830
the image and then cv
dot color BGR2GRAY.

19
00:01:04,830 --> 00:01:06,710
Now, before we get to

20
00:01:06,710 --> 00:01:08,690
the result let's
talk about the docs.

21
00:01:08,690 --> 00:01:10,010
Just like tesseract,

22
00:01:10,010 --> 00:01:12,410
OpenCV is an external
package written

23
00:01:12,410 --> 00:01:16,415
in C plus plus and the docs
for Python are really poor.

24
00:01:16,415 --> 00:01:18,500
This is unfortunately
quite common

25
00:01:18,500 --> 00:01:20,780
when Python is being
used as a wrapper.

26
00:01:20,780 --> 00:01:22,370
Thankfully, the web docs for

27
00:01:22,370 --> 00:01:24,515
OpenCV are actually pretty good.

28
00:01:24,515 --> 00:01:26,975
So hit the website
at docs dot opencv

29
00:01:26,975 --> 00:01:28,340
dot org when you want to learn

30
00:01:28,340 --> 00:01:30,340
more about a particular function.

31
00:01:30,340 --> 00:01:32,960
In this case, cvtColor

32
00:01:32,960 --> 00:01:35,720
converts from one color
space to another,

33
00:01:35,720 --> 00:01:38,225
and we're converting
our image to gray scale.

34
00:01:38,225 --> 00:01:39,530
Of course, we already know

35
00:01:39,530 --> 00:01:41,360
at least two different
ways of doing this

36
00:01:41,360 --> 00:01:45,005
using binarization and PIL
color space conversions.

37
00:01:45,005 --> 00:01:48,440
So let's inspect this object
that has been returned.

38
00:01:48,440 --> 00:01:51,515
So import inspect, this is
in the standard library

39
00:01:51,515 --> 00:01:56,245
inspect dot get mro
and then type of gray.

40
00:01:56,245 --> 00:02:00,130
So we see that this is
a type of ndarray which is

41
00:02:00,130 --> 00:02:02,050
a fundamental list
type coming from

42
00:02:02,050 --> 00:02:04,240
the numerical Python project.

43
00:02:04,240 --> 00:02:05,830
That's a bit surprising.

44
00:02:05,830 --> 00:02:07,810
Up until this point we've
been used to working

45
00:02:07,810 --> 00:02:10,060
with these PIL dot Image objects.

46
00:02:10,060 --> 00:02:12,610
OpenCV however,
wants to represent

47
00:02:12,610 --> 00:02:15,700
an image as a two dimensional
sequence of bytes,

48
00:02:15,700 --> 00:02:19,045
and the ndarray which stands
for an n dimensional array,

49
00:02:19,045 --> 00:02:20,905
is the ideal way to do this.

50
00:02:20,905 --> 00:02:25,060
Let's look at the contents
of the array, so just gray.

51
00:02:25,060 --> 00:02:28,465
So the array shown here
is a list of lists

52
00:02:28,465 --> 00:02:31,540
where the inner lists are
filled with integers.

53
00:02:31,540 --> 00:02:33,700
The d type equals u int eight

54
00:02:33,700 --> 00:02:36,220
definition indicates
that each of the items

55
00:02:36,220 --> 00:02:39,430
in the array is
an eight bit unsigned integer

56
00:02:39,430 --> 00:02:42,440
which is very common for
black and white images.

57
00:02:42,440 --> 00:02:46,840
So this is a pixel by pixel
definition of the image.

58
00:02:46,840 --> 00:02:49,255
The display package however,

59
00:02:49,255 --> 00:02:52,190
doesn't know what to
do with this image.

60
00:02:52,190 --> 00:02:53,510
So let's convert it into

61
00:02:53,510 --> 00:02:56,420
a PIL object to render
it into the browser.

62
00:02:56,420 --> 00:02:58,950
So from PIL import image,

63
00:02:58,950 --> 00:03:00,980
PIL can take an
array of data with

64
00:03:00,980 --> 00:03:04,460
a given color format and
convert this into a PIL object,

65
00:03:04,460 --> 00:03:06,515
and this is perfect
for our situation

66
00:03:06,515 --> 00:03:07,955
as the PIL color mode,

67
00:03:07,955 --> 00:03:12,230
L, is just a ray of luminate
values in unsigned integers.

68
00:03:12,230 --> 00:03:16,465
So the new image equals
image dot from array,

69
00:03:16,465 --> 00:03:18,230
gray and then we tell it

70
00:03:18,230 --> 00:03:21,475
luminance values and
then display image.

71
00:03:21,475 --> 00:03:25,055
Let's talk a bit more
about images for a minute.

72
00:03:25,055 --> 00:03:27,575
Numpy arrays are
multi-dimensional.

73
00:03:27,575 --> 00:03:28,980
For instance, we can define

74
00:03:28,980 --> 00:03:30,770
an array in a single dimension.

75
00:03:30,770 --> 00:03:32,885
So import numpy as

76
00:03:32,885 --> 00:03:36,365
np and then single dim
equals np dot array

77
00:03:36,365 --> 00:03:37,460
and we just pass into

78
00:03:37,460 --> 00:03:41,020
this a list of all of
the integer values.

79
00:03:41,020 --> 00:03:43,490
In an image, this is analogous to

80
00:03:43,490 --> 00:03:47,000
a single row of five
pixels each in gray scale.

81
00:03:47,000 --> 00:03:49,490
But actually all imaging
libraries tend

82
00:03:49,490 --> 00:03:51,935
to expect at least
two dimensions,

83
00:03:51,935 --> 00:03:53,450
a width and a height,

84
00:03:53,450 --> 00:03:55,370
and to show a matrix.

85
00:03:55,370 --> 00:03:57,500
So if we put
the single dim inside of

86
00:03:57,500 --> 00:04:00,470
another array this would
be a two-dimensional array

87
00:04:00,470 --> 00:04:02,405
with the equivalent of

88
00:04:02,405 --> 00:04:04,550
one element in
the height direction

89
00:04:04,550 --> 00:04:06,170
and five in the width.

90
00:04:06,170 --> 00:04:10,355
So we can go double dim
equals np dot array,

91
00:04:10,355 --> 00:04:13,070
and here we just taking
a single dimension array and

92
00:04:13,070 --> 00:04:16,280
put it inside another bracket,

93
00:04:16,280 --> 00:04:18,545
and so double dim.

94
00:04:18,545 --> 00:04:20,870
So this should look
pretty familiar.

95
00:04:20,870 --> 00:04:23,945
It's a lot like the list
of lists we saw above.

96
00:04:23,945 --> 00:04:26,600
Let's see what this new
two-dimensional array looks

97
00:04:26,600 --> 00:04:29,335
like if we actually
display it to the screen.

98
00:04:29,335 --> 00:04:31,680
So display and then
we have to convert to

99
00:04:31,680 --> 00:04:34,100
PIL so image dot
from array we pass

100
00:04:34,100 --> 00:04:36,140
in the double dim and say that

101
00:04:36,140 --> 00:04:37,400
it's a luminance and this

102
00:04:37,400 --> 00:04:39,245
should display it to the screen.

103
00:04:39,245 --> 00:04:42,845
So that's pretty unexciting
hits just a little line.

104
00:04:42,845 --> 00:04:45,140
It's actually five
pixels in a row to be

105
00:04:45,140 --> 00:04:47,975
exact of different
levels of black.

106
00:04:47,975 --> 00:04:51,530
The numpy library has a nice
attribute called shape that

107
00:04:51,530 --> 00:04:55,205
allows us to see how many
dimensions big an array is.

108
00:04:55,205 --> 00:04:57,740
The shape attribute
returns a tuple that

109
00:04:57,740 --> 00:05:00,575
shows the height of the image
by the width of the image.

110
00:05:00,575 --> 00:05:03,620
So double dim but dot shape.

111
00:05:03,620 --> 00:05:05,750
Let's take a look at the shape of

112
00:05:05,750 --> 00:05:08,720
our initial image that we
loaded into the image variable.

113
00:05:08,720 --> 00:05:11,270
So img dot shape.

114
00:05:11,270 --> 00:05:13,775
This image has three dimensions,

115
00:05:13,775 --> 00:05:16,115
that's because it's
got a width a height

116
00:05:16,115 --> 00:05:18,110
and what's called a color depth.

117
00:05:18,110 --> 00:05:20,090
In this case the color is

118
00:05:20,090 --> 00:05:22,955
represented as an
array of three values.

119
00:05:22,955 --> 00:05:26,015
Let's take a look at the
color of the first pixel.

120
00:05:26,015 --> 00:05:28,250
So the first pixel
is equal to image

121
00:05:28,250 --> 00:05:31,375
sub-zero and the width
sub-zero on the height,

122
00:05:31,375 --> 00:05:33,425
and so the first pixel.

123
00:05:33,425 --> 00:05:36,380
Here we see that the color
value is provided in

124
00:05:36,380 --> 00:05:39,320
full RGB using
an unsigned integer.

125
00:05:39,320 --> 00:05:41,660
This means that each
color can have one of

126
00:05:41,660 --> 00:05:44,240
256 different values and

127
00:05:44,240 --> 00:05:46,190
that the total number
of unique colors

128
00:05:46,190 --> 00:05:49,400
that can be represented
by this data is 256 by

129
00:05:49,400 --> 00:05:54,805
256 by 256 which is
roughly 16 million colors.

130
00:05:54,805 --> 00:05:57,230
We call this 24-bit color

131
00:05:57,230 --> 00:06:00,140
which is eight plus
eight plus eight, 24.

132
00:06:00,140 --> 00:06:02,090
If you find yourself shopping for

133
00:06:02,090 --> 00:06:03,740
a television you
might notice that

134
00:06:03,740 --> 00:06:06,110
some expensive models
are advertised as having

135
00:06:06,110 --> 00:06:08,825
10 bit or even 12-bit panels.

136
00:06:08,825 --> 00:06:10,760
These are televisions
where each of

137
00:06:10,760 --> 00:06:12,920
the red green and blue
color channels are

138
00:06:12,920 --> 00:06:15,955
represented by 10 or 12 bits
instead of eight.

139
00:06:15,955 --> 00:06:17,310
For 10 bit panels,

140
00:06:17,310 --> 00:06:19,940
this means that there's
over one billion colors

141
00:06:19,940 --> 00:06:22,400
capable and 12-bit panels are

142
00:06:22,400 --> 00:06:25,865
capable of over 68
billion colors.

143
00:06:25,865 --> 00:06:28,490
We're not going to talk
much more about color in

144
00:06:28,490 --> 00:06:30,760
this course but
it's a fun subject.

145
00:06:30,760 --> 00:06:33,680
Instead, let's go back to
this array representation of

146
00:06:33,680 --> 00:06:34,790
images because we can do

147
00:06:34,790 --> 00:06:37,220
some interesting
things with this.

148
00:06:37,220 --> 00:06:39,500
One of the most common
things we can do with

149
00:06:39,500 --> 00:06:41,345
an ndarray is to reshape it,

150
00:06:41,345 --> 00:06:43,730
to change the number of
rows and columns that are

151
00:06:43,730 --> 00:06:45,440
represented here so that we

152
00:06:45,440 --> 00:06:47,495
could do different kinds
of operations.

153
00:06:47,495 --> 00:06:49,760
Here's our original
two-dimensional image.

154
00:06:49,760 --> 00:06:53,335
So let's print original image
and then print out gray.

155
00:06:53,335 --> 00:06:55,340
If we wanted to represent that as

156
00:06:55,340 --> 00:06:57,800
a one-dimensional image
we just call reshape,

157
00:06:57,800 --> 00:07:00,830
so print new image
and reshape takes

158
00:07:00,830 --> 00:07:02,510
the image as the first parameter

159
00:07:02,510 --> 00:07:04,370
and a new shape as the second.

160
00:07:04,370 --> 00:07:08,360
So let's say image one d
equals np dot reshape we

161
00:07:08,360 --> 00:07:12,530
send in our original array gray
and then the new shape

162
00:07:12,530 --> 00:07:15,200
will have as one and then

163
00:07:15,200 --> 00:07:18,260
gray dot shape times
gray dot shapes up

164
00:07:18,260 --> 00:07:20,450
zero and one multiply
them together you get

165
00:07:20,450 --> 00:07:23,060
the total number
of pixels and then

166
00:07:23,060 --> 00:07:27,320
print image one d. So why are we

167
00:07:27,320 --> 00:07:29,810
talking about these nested
arrays of bytes when we're

168
00:07:29,810 --> 00:07:32,665
supposed to be talking
about OpenCV as a library?

169
00:07:32,665 --> 00:07:34,880
Well, I wanted to
show you that often

170
00:07:34,880 --> 00:07:37,625
libraries work on
the same kind of principals.

171
00:07:37,625 --> 00:07:40,370
In this case that images
are stored as arrays of

172
00:07:40,370 --> 00:07:41,930
bytes and they're not

173
00:07:41,930 --> 00:07:45,155
representing the data in
the same way in their APIs,

174
00:07:45,155 --> 00:07:47,240
but by exploring
a bit you can learn

175
00:07:47,240 --> 00:07:49,130
how the internal
representation of

176
00:07:49,130 --> 00:07:50,780
data is stored and build

177
00:07:50,780 --> 00:07:53,425
routines to convert
between formats.

178
00:07:53,425 --> 00:07:55,970
For instance, remember
the last lecture

179
00:07:55,970 --> 00:07:57,080
when we wanted to look for

180
00:07:57,080 --> 00:07:58,760
gaps in an image so we could

181
00:07:58,760 --> 00:08:00,875
draw lines to feed into cracking.

182
00:08:00,875 --> 00:08:03,170
Well, we use PIL to do this using

183
00:08:03,170 --> 00:08:06,050
the get pixels to look at
the individual pixels and see

184
00:08:06,050 --> 00:08:07,955
what luminosity values were

185
00:08:07,955 --> 00:08:10,290
then image dot
draw dot rectangle to

186
00:08:10,290 --> 00:08:12,895
actually fill in
a black bar separator

187
00:08:12,895 --> 00:08:14,930
and this was
a nice high-level API.

188
00:08:14,930 --> 00:08:16,430
Let us write some routines to

189
00:08:16,430 --> 00:08:17,930
do the work we wanted
without having to

190
00:08:17,930 --> 00:08:19,175
understand too much about

191
00:08:19,175 --> 00:08:20,810
how the images were
actually being

192
00:08:20,810 --> 00:08:24,715
stored but computationally
it was very slow.

193
00:08:24,715 --> 00:08:27,350
Instead, we could
write the code to

194
00:08:27,350 --> 00:08:30,365
use this using matrix
features within numpy.

195
00:08:30,365 --> 00:08:33,920
Lets take a look,
so import cv2 as

196
00:08:33,920 --> 00:08:37,835
cv will load the two column
image as well so image

197
00:08:37,835 --> 00:08:40,160
equals cv dot read and read

198
00:08:40,160 --> 00:08:43,040
only two col and we'll
convert it to gray-scale

199
00:08:43,040 --> 00:08:46,820
using this cvtColor image
so gray equals cv dot

200
00:08:46,820 --> 00:08:51,935
cvt color image and
color two gray.

201
00:08:51,935 --> 00:08:55,490
Now, remember how slicing
on a list works.

202
00:08:55,490 --> 00:08:58,100
If you have a list
of a number such as

203
00:08:58,100 --> 00:09:01,100
a equals zero through
five and then you

204
00:09:01,100 --> 00:09:04,610
go a sub two colon
four that's going to

205
00:09:04,610 --> 00:09:06,050
return a list of numbers at

206
00:09:06,050 --> 00:09:08,585
positions two through
four inclusive,

207
00:09:08,585 --> 00:09:12,170
and don't forget that lists
start indexing at zero.

208
00:09:12,170 --> 00:09:15,440
If we have a two-dimensional
array we can slice out

209
00:09:15,440 --> 00:09:18,425
a smaller piece of that
using the format a sub

210
00:09:18,425 --> 00:09:20,855
two comma four for
the first dimension

211
00:09:20,855 --> 00:09:23,420
and then one colon three for

212
00:09:23,420 --> 00:09:26,150
the second dimension and you
could think of this as first

213
00:09:26,150 --> 00:09:28,010
slicing along the row dimension

214
00:09:28,010 --> 00:09:30,025
then in the column's dimension.

215
00:09:30,025 --> 00:09:32,480
So in this example that
would be a matrix of rows

216
00:09:32,480 --> 00:09:35,360
two and three and
columns one and two.

217
00:09:35,360 --> 00:09:36,860
Here's a look at our image

218
00:09:36,860 --> 00:09:41,120
so gray two colon four
and one colon three.

219
00:09:41,120 --> 00:09:43,495
So we see that it's all white.

220
00:09:43,495 --> 00:09:45,470
We can use this technique as

221
00:09:45,470 --> 00:09:48,650
a window and move it
around our big image.

222
00:09:48,650 --> 00:09:51,770
Finally, the ndarray
library has lots of

223
00:09:51,770 --> 00:09:55,280
matrix functions which are
generally very fast to run,

224
00:09:55,280 --> 00:09:57,590
one that we might want to
consider in this case is

225
00:09:57,590 --> 00:10:00,140
count non-zero which just returns

226
00:10:00,140 --> 00:10:02,960
the number of entries in
the matrix which are not zero

227
00:10:02,960 --> 00:10:06,200
so np dot count non-zero we can

228
00:10:06,200 --> 00:10:08,550
say gray and give it two colon

229
00:10:08,550 --> 00:10:11,060
four and one colon three
and so this is going

230
00:10:11,060 --> 00:10:13,790
to crop out essentially
a piece of the image

231
00:10:13,790 --> 00:10:19,220
and as a matrix and send it
to np dot count non-zero.

232
00:10:19,220 --> 00:10:21,590
Okay, the last benefit of

233
00:10:21,590 --> 00:10:23,390
going to this low-level approach

234
00:10:23,390 --> 00:10:27,200
to images and so we can change
pixels very fast as well.

235
00:10:27,200 --> 00:10:29,825
Previously we were
drawing rectangles and

236
00:10:29,825 --> 00:10:32,390
setting a fill and a line width,

237
00:10:32,390 --> 00:10:33,770
this is nice if you want to do

238
00:10:33,770 --> 00:10:35,690
something like change
the color of the fill

239
00:10:35,690 --> 00:10:39,420
from the line or draw
complex shapes like polygons,

240
00:10:39,420 --> 00:10:41,690
but we really just wanted a line

241
00:10:41,690 --> 00:10:44,030
here and that's
really easy to do,

242
00:10:44,030 --> 00:10:46,100
all we have to do is
change the number of

243
00:10:46,100 --> 00:10:50,045
luminosity values
from 255 to zero.

244
00:10:50,045 --> 00:10:54,000
Here's an example, let's
create a big white matrix.

245
00:10:54,000 --> 00:10:56,090
So white matrix equals
and we'll use

246
00:10:56,090 --> 00:10:58,840
np dot four we wanted a 12 by

247
00:10:58,840 --> 00:11:01,970
12 matrix as we'll pass
that in as the tuple

248
00:11:01,970 --> 00:11:05,240
for the shape of it 255 because

249
00:11:05,240 --> 00:11:07,250
we want everything to
be white and we'll

250
00:11:07,250 --> 00:11:09,440
set the d type to be np dot

251
00:11:09,440 --> 00:11:11,145
u int eight so that means

252
00:11:11,145 --> 00:11:14,905
a one byte per pixel
to describe it.

253
00:11:14,905 --> 00:11:16,470
Now let's display that.

254
00:11:16,470 --> 00:11:18,060
So remember we have
to do image dot

255
00:11:18,060 --> 00:11:20,039
from array to convert from PIL

256
00:11:20,039 --> 00:11:22,770
passing the white matrix tuple

257
00:11:22,770 --> 00:11:24,810
that it's luminosity values,

258
00:11:24,810 --> 00:11:27,810
and let's print out
the white matrix as well.

259
00:11:27,810 --> 00:11:30,295
So this looks pretty boring,

260
00:11:30,295 --> 00:11:31,820
it's just a giant white square

261
00:11:31,820 --> 00:11:33,575
which we actually can't see

262
00:11:33,575 --> 00:11:35,600
but if we want we can easily

263
00:11:35,600 --> 00:11:37,795
color our column to be black.

264
00:11:37,795 --> 00:11:41,880
So if we go white matrix sub
and here I just

265
00:11:41,880 --> 00:11:45,110
want all row values so I just use

266
00:11:45,110 --> 00:11:48,530
a colon comma six so
I want the sixth,

267
00:11:48,530 --> 00:11:50,960
well in this case
the seventh column,

268
00:11:50,960 --> 00:11:53,450
and I want to set
this to be np dot

269
00:11:53,450 --> 00:11:56,495
four and I just want to add in

270
00:11:56,495 --> 00:11:59,450
a matrix of one colon 12 and

271
00:11:59,450 --> 00:12:03,500
shape with zero that's
also d type int,

272
00:12:03,500 --> 00:12:05,180
and I want to display this

273
00:12:05,180 --> 00:12:07,100
to the screen so image dot from

274
00:12:07,100 --> 00:12:10,130
array white matrix
which now won't just be

275
00:12:10,130 --> 00:12:14,370
white and then print
out the white matrix.

276
00:12:17,630 --> 00:12:21,710
That's essentially what
we wanted to do so,

277
00:12:21,710 --> 00:12:23,915
why should we do it
this way when it seems

278
00:12:23,915 --> 00:12:26,090
so much more low level and

279
00:12:26,090 --> 00:12:29,360
really the answer speed
this paradigm of using

280
00:12:29,360 --> 00:12:31,490
matrices to store and manipulate

281
00:12:31,490 --> 00:12:33,980
bytes of data for
images is much closer

282
00:12:33,980 --> 00:12:35,480
to how low-level API and

283
00:12:35,480 --> 00:12:37,520
hardware developers think about

284
00:12:37,520 --> 00:12:39,970
storing files and
bytes and memory.

285
00:12:39,970 --> 00:12:42,180
How much faster is this?

286
00:12:42,180 --> 00:12:44,055
Well, that's up to
you to discover.

287
00:12:44,055 --> 00:12:46,040
There is an optional
assignment for this week

288
00:12:46,040 --> 00:12:47,990
to convert our old code over into

289
00:12:47,990 --> 00:12:49,490
this new format to compare

290
00:12:49,490 --> 00:12:50,840
both the readability and

291
00:12:50,840 --> 00:12:53,940
the speed of the two
different approaches.