1 00:00:08,390 --> 00:00:11,790 OpenCV supports reading of images in 2 00:00:11,790 --> 00:00:15,465 most file formats such as JPEGs, PNGs, and TIFF. 3 00:00:15,465 --> 00:00:18,240 Most image and video analysis requires 4 00:00:18,240 --> 00:00:20,955 converting images into gray scale first. 5 00:00:20,955 --> 00:00:23,190 This simplifies the image and reduces 6 00:00:23,190 --> 00:00:26,070 noise while allowing for improved analysis. 7 00:00:26,070 --> 00:00:28,260 Let's write some code that reads in an image 8 00:00:28,260 --> 00:00:30,180 of a person Floyd Mayweather, 9 00:00:30,180 --> 00:00:32,460 and converts it into gray scale. 10 00:00:32,460 --> 00:00:36,480 So first we're going to import the openCV package cv2. 11 00:00:36,480 --> 00:00:39,630 So import cv2 as cv and then 12 00:00:39,630 --> 00:00:42,710 we'll load the Floyd dot JPEG image. 13 00:00:42,710 --> 00:00:45,440 So image equals cv dot read from 14 00:00:45,440 --> 00:00:48,150 read-only Floyd dot JPEG and we'll 15 00:00:48,150 --> 00:00:51,590 convert it to gray scale using this cv color image. 16 00:00:51,590 --> 00:00:55,610 So gray equals cv dot cvtColor. 17 00:00:55,610 --> 00:00:59,880 This is function in openCV passing 18 00:00:59,880 --> 00:01:04,830 the image and then cv dot color BGR2GRAY. 19 00:01:04,830 --> 00:01:06,710 Now, before we get to 20 00:01:06,710 --> 00:01:08,690 the result let's talk about the docs. 21 00:01:08,690 --> 00:01:10,010 Just like tesseract, 22 00:01:10,010 --> 00:01:12,410 OpenCV is an external package written 23 00:01:12,410 --> 00:01:16,415 in C plus plus and the docs for Python are really poor. 24 00:01:16,415 --> 00:01:18,500 This is unfortunately quite common 25 00:01:18,500 --> 00:01:20,780 when Python is being used as a wrapper. 26 00:01:20,780 --> 00:01:22,370 Thankfully, the web docs for 27 00:01:22,370 --> 00:01:24,515 OpenCV are actually pretty good. 28 00:01:24,515 --> 00:01:26,975 So hit the website at docs dot opencv 29 00:01:26,975 --> 00:01:28,340 dot org when you want to learn 30 00:01:28,340 --> 00:01:30,340 more about a particular function. 31 00:01:30,340 --> 00:01:32,960 In this case, cvtColor 32 00:01:32,960 --> 00:01:35,720 converts from one color space to another, 33 00:01:35,720 --> 00:01:38,225 and we're converting our image to gray scale. 34 00:01:38,225 --> 00:01:39,530 Of course, we already know 35 00:01:39,530 --> 00:01:41,360 at least two different ways of doing this 36 00:01:41,360 --> 00:01:45,005 using binarization and PIL color space conversions. 37 00:01:45,005 --> 00:01:48,440 So let's inspect this object that has been returned. 38 00:01:48,440 --> 00:01:51,515 So import inspect, this is in the standard library 39 00:01:51,515 --> 00:01:56,245 inspect dot get mro and then type of gray. 40 00:01:56,245 --> 00:02:00,130 So we see that this is a type of ndarray which is 41 00:02:00,130 --> 00:02:02,050 a fundamental list type coming from 42 00:02:02,050 --> 00:02:04,240 the numerical Python project. 43 00:02:04,240 --> 00:02:05,830 That's a bit surprising. 44 00:02:05,830 --> 00:02:07,810 Up until this point we've been used to working 45 00:02:07,810 --> 00:02:10,060 with these PIL dot Image objects. 46 00:02:10,060 --> 00:02:12,610 OpenCV however, wants to represent 47 00:02:12,610 --> 00:02:15,700 an image as a two dimensional sequence of bytes, 48 00:02:15,700 --> 00:02:19,045 and the ndarray which stands for an n dimensional array, 49 00:02:19,045 --> 00:02:20,905 is the ideal way to do this. 50 00:02:20,905 --> 00:02:25,060 Let's look at the contents of the array, so just gray. 51 00:02:25,060 --> 00:02:28,465 So the array shown here is a list of lists 52 00:02:28,465 --> 00:02:31,540 where the inner lists are filled with integers. 53 00:02:31,540 --> 00:02:33,700 The d type equals u int eight 54 00:02:33,700 --> 00:02:36,220 definition indicates that each of the items 55 00:02:36,220 --> 00:02:39,430 in the array is an eight bit unsigned integer 56 00:02:39,430 --> 00:02:42,440 which is very common for black and white images. 57 00:02:42,440 --> 00:02:46,840 So this is a pixel by pixel definition of the image. 58 00:02:46,840 --> 00:02:49,255 The display package however, 59 00:02:49,255 --> 00:02:52,190 doesn't know what to do with this image. 60 00:02:52,190 --> 00:02:53,510 So let's convert it into 61 00:02:53,510 --> 00:02:56,420 a PIL object to render it into the browser. 62 00:02:56,420 --> 00:02:58,950 So from PIL import image, 63 00:02:58,950 --> 00:03:00,980 PIL can take an array of data with 64 00:03:00,980 --> 00:03:04,460 a given color format and convert this into a PIL object, 65 00:03:04,460 --> 00:03:06,515 and this is perfect for our situation 66 00:03:06,515 --> 00:03:07,955 as the PIL color mode, 67 00:03:07,955 --> 00:03:12,230 L, is just a ray of luminate values in unsigned integers. 68 00:03:12,230 --> 00:03:16,465 So the new image equals image dot from array, 69 00:03:16,465 --> 00:03:18,230 gray and then we tell it 70 00:03:18,230 --> 00:03:21,475 luminance values and then display image. 71 00:03:21,475 --> 00:03:25,055 Let's talk a bit more about images for a minute. 72 00:03:25,055 --> 00:03:27,575 Numpy arrays are multi-dimensional. 73 00:03:27,575 --> 00:03:28,980 For instance, we can define 74 00:03:28,980 --> 00:03:30,770 an array in a single dimension. 75 00:03:30,770 --> 00:03:32,885 So import numpy as 76 00:03:32,885 --> 00:03:36,365 np and then single dim equals np dot array 77 00:03:36,365 --> 00:03:37,460 and we just pass into 78 00:03:37,460 --> 00:03:41,020 this a list of all of the integer values. 79 00:03:41,020 --> 00:03:43,490 In an image, this is analogous to 80 00:03:43,490 --> 00:03:47,000 a single row of five pixels each in gray scale. 81 00:03:47,000 --> 00:03:49,490 But actually all imaging libraries tend 82 00:03:49,490 --> 00:03:51,935 to expect at least two dimensions, 83 00:03:51,935 --> 00:03:53,450 a width and a height, 84 00:03:53,450 --> 00:03:55,370 and to show a matrix. 85 00:03:55,370 --> 00:03:57,500 So if we put the single dim inside of 86 00:03:57,500 --> 00:04:00,470 another array this would be a two-dimensional array 87 00:04:00,470 --> 00:04:02,405 with the equivalent of 88 00:04:02,405 --> 00:04:04,550 one element in the height direction 89 00:04:04,550 --> 00:04:06,170 and five in the width. 90 00:04:06,170 --> 00:04:10,355 So we can go double dim equals np dot array, 91 00:04:10,355 --> 00:04:13,070 and here we just taking a single dimension array and 92 00:04:13,070 --> 00:04:16,280 put it inside another bracket, 93 00:04:16,280 --> 00:04:18,545 and so double dim. 94 00:04:18,545 --> 00:04:20,870 So this should look pretty familiar. 95 00:04:20,870 --> 00:04:23,945 It's a lot like the list of lists we saw above. 96 00:04:23,945 --> 00:04:26,600 Let's see what this new two-dimensional array looks 97 00:04:26,600 --> 00:04:29,335 like if we actually display it to the screen. 98 00:04:29,335 --> 00:04:31,680 So display and then we have to convert to 99 00:04:31,680 --> 00:04:34,100 PIL so image dot from array we pass 100 00:04:34,100 --> 00:04:36,140 in the double dim and say that 101 00:04:36,140 --> 00:04:37,400 it's a luminance and this 102 00:04:37,400 --> 00:04:39,245 should display it to the screen. 103 00:04:39,245 --> 00:04:42,845 So that's pretty unexciting hits just a little line. 104 00:04:42,845 --> 00:04:45,140 It's actually five pixels in a row to be 105 00:04:45,140 --> 00:04:47,975 exact of different levels of black. 106 00:04:47,975 --> 00:04:51,530 The numpy library has a nice attribute called shape that 107 00:04:51,530 --> 00:04:55,205 allows us to see how many dimensions big an array is. 108 00:04:55,205 --> 00:04:57,740 The shape attribute returns a tuple that 109 00:04:57,740 --> 00:05:00,575 shows the height of the image by the width of the image. 110 00:05:00,575 --> 00:05:03,620 So double dim but dot shape. 111 00:05:03,620 --> 00:05:05,750 Let's take a look at the shape of 112 00:05:05,750 --> 00:05:08,720 our initial image that we loaded into the image variable. 113 00:05:08,720 --> 00:05:11,270 So img dot shape. 114 00:05:11,270 --> 00:05:13,775 This image has three dimensions, 115 00:05:13,775 --> 00:05:16,115 that's because it's got a width a height 116 00:05:16,115 --> 00:05:18,110 and what's called a color depth. 117 00:05:18,110 --> 00:05:20,090 In this case the color is 118 00:05:20,090 --> 00:05:22,955 represented as an array of three values. 119 00:05:22,955 --> 00:05:26,015 Let's take a look at the color of the first pixel. 120 00:05:26,015 --> 00:05:28,250 So the first pixel is equal to image 121 00:05:28,250 --> 00:05:31,375 sub-zero and the width sub-zero on the height, 122 00:05:31,375 --> 00:05:33,425 and so the first pixel. 123 00:05:33,425 --> 00:05:36,380 Here we see that the color value is provided in 124 00:05:36,380 --> 00:05:39,320 full RGB using an unsigned integer. 125 00:05:39,320 --> 00:05:41,660 This means that each color can have one of 126 00:05:41,660 --> 00:05:44,240 256 different values and 127 00:05:44,240 --> 00:05:46,190 that the total number of unique colors 128 00:05:46,190 --> 00:05:49,400 that can be represented by this data is 256 by 129 00:05:49,400 --> 00:05:54,805 256 by 256 which is roughly 16 million colors. 130 00:05:54,805 --> 00:05:57,230 We call this 24-bit color 131 00:05:57,230 --> 00:06:00,140 which is eight plus eight plus eight, 24. 132 00:06:00,140 --> 00:06:02,090 If you find yourself shopping for 133 00:06:02,090 --> 00:06:03,740 a television you might notice that 134 00:06:03,740 --> 00:06:06,110 some expensive models are advertised as having 135 00:06:06,110 --> 00:06:08,825 10 bit or even 12-bit panels. 136 00:06:08,825 --> 00:06:10,760 These are televisions where each of 137 00:06:10,760 --> 00:06:12,920 the red green and blue color channels are 138 00:06:12,920 --> 00:06:15,955 represented by 10 or 12 bits instead of eight. 139 00:06:15,955 --> 00:06:17,310 For 10 bit panels, 140 00:06:17,310 --> 00:06:19,940 this means that there's over one billion colors 141 00:06:19,940 --> 00:06:22,400 capable and 12-bit panels are 142 00:06:22,400 --> 00:06:25,865 capable of over 68 billion colors. 143 00:06:25,865 --> 00:06:28,490 We're not going to talk much more about color in 144 00:06:28,490 --> 00:06:30,760 this course but it's a fun subject. 145 00:06:30,760 --> 00:06:33,680 Instead, let's go back to this array representation of 146 00:06:33,680 --> 00:06:34,790 images because we can do 147 00:06:34,790 --> 00:06:37,220 some interesting things with this. 148 00:06:37,220 --> 00:06:39,500 One of the most common things we can do with 149 00:06:39,500 --> 00:06:41,345 an ndarray is to reshape it, 150 00:06:41,345 --> 00:06:43,730 to change the number of rows and columns that are 151 00:06:43,730 --> 00:06:45,440 represented here so that we 152 00:06:45,440 --> 00:06:47,495 could do different kinds of operations. 153 00:06:47,495 --> 00:06:49,760 Here's our original two-dimensional image. 154 00:06:49,760 --> 00:06:53,335 So let's print original image and then print out gray. 155 00:06:53,335 --> 00:06:55,340 If we wanted to represent that as 156 00:06:55,340 --> 00:06:57,800 a one-dimensional image we just call reshape, 157 00:06:57,800 --> 00:07:00,830 so print new image and reshape takes 158 00:07:00,830 --> 00:07:02,510 the image as the first parameter 159 00:07:02,510 --> 00:07:04,370 and a new shape as the second. 160 00:07:04,370 --> 00:07:08,360 So let's say image one d equals np dot reshape we 161 00:07:08,360 --> 00:07:12,530 send in our original array gray and then the new shape 162 00:07:12,530 --> 00:07:15,200 will have as one and then 163 00:07:15,200 --> 00:07:18,260 gray dot shape times gray dot shapes up 164 00:07:18,260 --> 00:07:20,450 zero and one multiply them together you get 165 00:07:20,450 --> 00:07:23,060 the total number of pixels and then 166 00:07:23,060 --> 00:07:27,320 print image one d. So why are we 167 00:07:27,320 --> 00:07:29,810 talking about these nested arrays of bytes when we're 168 00:07:29,810 --> 00:07:32,665 supposed to be talking about OpenCV as a library? 169 00:07:32,665 --> 00:07:34,880 Well, I wanted to show you that often 170 00:07:34,880 --> 00:07:37,625 libraries work on the same kind of principals. 171 00:07:37,625 --> 00:07:40,370 In this case that images are stored as arrays of 172 00:07:40,370 --> 00:07:41,930 bytes and they're not 173 00:07:41,930 --> 00:07:45,155 representing the data in the same way in their APIs, 174 00:07:45,155 --> 00:07:47,240 but by exploring a bit you can learn 175 00:07:47,240 --> 00:07:49,130 how the internal representation of 176 00:07:49,130 --> 00:07:50,780 data is stored and build 177 00:07:50,780 --> 00:07:53,425 routines to convert between formats. 178 00:07:53,425 --> 00:07:55,970 For instance, remember the last lecture 179 00:07:55,970 --> 00:07:57,080 when we wanted to look for 180 00:07:57,080 --> 00:07:58,760 gaps in an image so we could 181 00:07:58,760 --> 00:08:00,875 draw lines to feed into cracking. 182 00:08:00,875 --> 00:08:03,170 Well, we use PIL to do this using 183 00:08:03,170 --> 00:08:06,050 the get pixels to look at the individual pixels and see 184 00:08:06,050 --> 00:08:07,955 what luminosity values were 185 00:08:07,955 --> 00:08:10,290 then image dot draw dot rectangle to 186 00:08:10,290 --> 00:08:12,895 actually fill in a black bar separator 187 00:08:12,895 --> 00:08:14,930 and this was a nice high-level API. 188 00:08:14,930 --> 00:08:16,430 Let us write some routines to 189 00:08:16,430 --> 00:08:17,930 do the work we wanted without having to 190 00:08:17,930 --> 00:08:19,175 understand too much about 191 00:08:19,175 --> 00:08:20,810 how the images were actually being 192 00:08:20,810 --> 00:08:24,715 stored but computationally it was very slow. 193 00:08:24,715 --> 00:08:27,350 Instead, we could write the code to 194 00:08:27,350 --> 00:08:30,365 use this using matrix features within numpy. 195 00:08:30,365 --> 00:08:33,920 Lets take a look, so import cv2 as 196 00:08:33,920 --> 00:08:37,835 cv will load the two column image as well so image 197 00:08:37,835 --> 00:08:40,160 equals cv dot read and read 198 00:08:40,160 --> 00:08:43,040 only two col and we'll convert it to gray-scale 199 00:08:43,040 --> 00:08:46,820 using this cvtColor image so gray equals cv dot 200 00:08:46,820 --> 00:08:51,935 cvt color image and color two gray. 201 00:08:51,935 --> 00:08:55,490 Now, remember how slicing on a list works. 202 00:08:55,490 --> 00:08:58,100 If you have a list of a number such as 203 00:08:58,100 --> 00:09:01,100 a equals zero through five and then you 204 00:09:01,100 --> 00:09:04,610 go a sub two colon four that's going to 205 00:09:04,610 --> 00:09:06,050 return a list of numbers at 206 00:09:06,050 --> 00:09:08,585 positions two through four inclusive, 207 00:09:08,585 --> 00:09:12,170 and don't forget that lists start indexing at zero. 208 00:09:12,170 --> 00:09:15,440 If we have a two-dimensional array we can slice out 209 00:09:15,440 --> 00:09:18,425 a smaller piece of that using the format a sub 210 00:09:18,425 --> 00:09:20,855 two comma four for the first dimension 211 00:09:20,855 --> 00:09:23,420 and then one colon three for 212 00:09:23,420 --> 00:09:26,150 the second dimension and you could think of this as first 213 00:09:26,150 --> 00:09:28,010 slicing along the row dimension 214 00:09:28,010 --> 00:09:30,025 then in the column's dimension. 215 00:09:30,025 --> 00:09:32,480 So in this example that would be a matrix of rows 216 00:09:32,480 --> 00:09:35,360 two and three and columns one and two. 217 00:09:35,360 --> 00:09:36,860 Here's a look at our image 218 00:09:36,860 --> 00:09:41,120 so gray two colon four and one colon three. 219 00:09:41,120 --> 00:09:43,495 So we see that it's all white. 220 00:09:43,495 --> 00:09:45,470 We can use this technique as 221 00:09:45,470 --> 00:09:48,650 a window and move it around our big image. 222 00:09:48,650 --> 00:09:51,770 Finally, the ndarray library has lots of 223 00:09:51,770 --> 00:09:55,280 matrix functions which are generally very fast to run, 224 00:09:55,280 --> 00:09:57,590 one that we might want to consider in this case is 225 00:09:57,590 --> 00:10:00,140 count non-zero which just returns 226 00:10:00,140 --> 00:10:02,960 the number of entries in the matrix which are not zero 227 00:10:02,960 --> 00:10:06,200 so np dot count non-zero we can 228 00:10:06,200 --> 00:10:08,550 say gray and give it two colon 229 00:10:08,550 --> 00:10:11,060 four and one colon three and so this is going 230 00:10:11,060 --> 00:10:13,790 to crop out essentially a piece of the image 231 00:10:13,790 --> 00:10:19,220 and as a matrix and send it to np dot count non-zero. 232 00:10:19,220 --> 00:10:21,590 Okay, the last benefit of 233 00:10:21,590 --> 00:10:23,390 going to this low-level approach 234 00:10:23,390 --> 00:10:27,200 to images and so we can change pixels very fast as well. 235 00:10:27,200 --> 00:10:29,825 Previously we were drawing rectangles and 236 00:10:29,825 --> 00:10:32,390 setting a fill and a line width, 237 00:10:32,390 --> 00:10:33,770 this is nice if you want to do 238 00:10:33,770 --> 00:10:35,690 something like change the color of the fill 239 00:10:35,690 --> 00:10:39,420 from the line or draw complex shapes like polygons, 240 00:10:39,420 --> 00:10:41,690 but we really just wanted a line 241 00:10:41,690 --> 00:10:44,030 here and that's really easy to do, 242 00:10:44,030 --> 00:10:46,100 all we have to do is change the number of 243 00:10:46,100 --> 00:10:50,045 luminosity values from 255 to zero. 244 00:10:50,045 --> 00:10:54,000 Here's an example, let's create a big white matrix. 245 00:10:54,000 --> 00:10:56,090 So white matrix equals and we'll use 246 00:10:56,090 --> 00:10:58,840 np dot four we wanted a 12 by 247 00:10:58,840 --> 00:11:01,970 12 matrix as we'll pass that in as the tuple 248 00:11:01,970 --> 00:11:05,240 for the shape of it 255 because 249 00:11:05,240 --> 00:11:07,250 we want everything to be white and we'll 250 00:11:07,250 --> 00:11:09,440 set the d type to be np dot 251 00:11:09,440 --> 00:11:11,145 u int eight so that means 252 00:11:11,145 --> 00:11:14,905 a one byte per pixel to describe it. 253 00:11:14,905 --> 00:11:16,470 Now let's display that. 254 00:11:16,470 --> 00:11:18,060 So remember we have to do image dot 255 00:11:18,060 --> 00:11:20,039 from array to convert from PIL 256 00:11:20,039 --> 00:11:22,770 passing the white matrix tuple 257 00:11:22,770 --> 00:11:24,810 that it's luminosity values, 258 00:11:24,810 --> 00:11:27,810 and let's print out the white matrix as well. 259 00:11:27,810 --> 00:11:30,295 So this looks pretty boring, 260 00:11:30,295 --> 00:11:31,820 it's just a giant white square 261 00:11:31,820 --> 00:11:33,575 which we actually can't see 262 00:11:33,575 --> 00:11:35,600 but if we want we can easily 263 00:11:35,600 --> 00:11:37,795 color our column to be black. 264 00:11:37,795 --> 00:11:41,880 So if we go white matrix sub and here I just 265 00:11:41,880 --> 00:11:45,110 want all row values so I just use 266 00:11:45,110 --> 00:11:48,530 a colon comma six so I want the sixth, 267 00:11:48,530 --> 00:11:50,960 well in this case the seventh column, 268 00:11:50,960 --> 00:11:53,450 and I want to set this to be np dot 269 00:11:53,450 --> 00:11:56,495 four and I just want to add in 270 00:11:56,495 --> 00:11:59,450 a matrix of one colon 12 and 271 00:11:59,450 --> 00:12:03,500 shape with zero that's also d type int, 272 00:12:03,500 --> 00:12:05,180 and I want to display this 273 00:12:05,180 --> 00:12:07,100 to the screen so image dot from 274 00:12:07,100 --> 00:12:10,130 array white matrix which now won't just be 275 00:12:10,130 --> 00:12:14,370 white and then print out the white matrix. 276 00:12:17,630 --> 00:12:21,710 That's essentially what we wanted to do so, 277 00:12:21,710 --> 00:12:23,915 why should we do it this way when it seems 278 00:12:23,915 --> 00:12:26,090 so much more low level and 279 00:12:26,090 --> 00:12:29,360 really the answer speed this paradigm of using 280 00:12:29,360 --> 00:12:31,490 matrices to store and manipulate 281 00:12:31,490 --> 00:12:33,980 bytes of data for images is much closer 282 00:12:33,980 --> 00:12:35,480 to how low-level API and 283 00:12:35,480 --> 00:12:37,520 hardware developers think about 284 00:12:37,520 --> 00:12:39,970 storing files and bytes and memory. 285 00:12:39,970 --> 00:12:42,180 How much faster is this? 286 00:12:42,180 --> 00:12:44,055 Well, that's up to you to discover. 287 00:12:44,055 --> 00:12:46,040 There is an optional assignment for this week 288 00:12:46,040 --> 00:12:47,990 to convert our old code over into 289 00:12:47,990 --> 00:12:49,490 this new format to compare 290 00:12:49,490 --> 00:12:50,840 both the readability and 291 00:12:50,840 --> 00:12:53,940 the speed of the two different approaches.