1 00:00:00,000 --> 00:00:04,060 In this video we will be covering numpy in 1D, 2 00:00:04,060 --> 00:00:06,790 in particular ND arrays. 3 00:00:06,790 --> 00:00:10,270 Numpy is a library for scientific computing. 4 00:00:10,270 --> 00:00:12,520 It has many useful functions. 5 00:00:12,520 --> 00:00:15,790 There are many other advantages like speed and memory. 6 00:00:15,790 --> 00:00:18,730 Numpy is also the basis for pandas. 7 00:00:18,730 --> 00:00:21,070 So check out our pandas video. 8 00:00:21,070 --> 00:00:23,350 In this video we will be covering 9 00:00:23,350 --> 00:00:25,830 the basics and array creation, 10 00:00:25,830 --> 00:00:27,540 indexing and slicing, 11 00:00:27,540 --> 00:00:31,060 basic operations, universal functions. 12 00:00:31,060 --> 00:00:34,560 Let's go over how to create a numpy array. 13 00:00:34,560 --> 00:00:36,990 A Python list is a container that 14 00:00:36,990 --> 00:00:39,510 allows you to store and access data. 15 00:00:39,510 --> 00:00:42,670 Each element is associated with an index. 16 00:00:42,670 --> 00:00:44,750 We can access each element using 17 00:00:44,750 --> 00:00:47,210 a square bracket as follows. 18 00:00:47,210 --> 00:00:52,020 A numpy array or ND array is similar to a list. 19 00:00:52,020 --> 00:00:54,180 It's usually fixed in size 20 00:00:54,180 --> 00:00:56,510 and each element is of the same type, 21 00:00:56,510 --> 00:00:58,290 in this case integers. 22 00:00:58,290 --> 00:00:59,720 We can cast a list to 23 00:00:59,720 --> 00:01:02,460 a numpy array by first importing numpy. 24 00:01:02,460 --> 00:01:05,230 We then cast the list as follows; 25 00:01:05,230 --> 00:01:08,010 we can access the data via an index. 26 00:01:08,010 --> 00:01:10,490 As with the list, we can access 27 00:01:10,490 --> 00:01:14,070 each element with an integer and a square bracket. 28 00:01:14,070 --> 00:01:17,240 The value of a is stored as follows. 29 00:01:17,240 --> 00:01:23,220 If we check the type of the array we get, numpy.ndarray. 30 00:01:23,220 --> 00:01:26,970 As numpy arrays contain data of the same type, 31 00:01:26,970 --> 00:01:28,750 we can use the attribute 32 00:01:28,750 --> 00:01:32,720 dtype to obtain the data type of the array's elements. 33 00:01:32,720 --> 00:01:35,420 In this case a 64-bit integer. 34 00:01:35,420 --> 00:01:38,150 Let's review some basic array attributes 35 00:01:38,150 --> 00:01:39,950 using the array a. 36 00:01:39,950 --> 00:01:41,480 The attribute size is 37 00:01:41,480 --> 00:01:43,560 the number of elements in the array. 38 00:01:43,560 --> 00:01:46,850 As there are five elements the result is five. 39 00:01:46,850 --> 00:01:48,890 The next two attributes will make 40 00:01:48,890 --> 00:01:51,890 more sense when we get to higher dimensions, 41 00:01:51,890 --> 00:01:53,440 but let's review them. 42 00:01:53,440 --> 00:01:56,360 The attribute ndim represents 43 00:01:56,360 --> 00:01:59,720 the number of array dimensions or the rank of the array, 44 00:01:59,720 --> 00:02:01,280 in this case one. 45 00:02:01,280 --> 00:02:04,130 The attribute shape is a tuple of 46 00:02:04,130 --> 00:02:05,720 integers indicating the size of 47 00:02:05,720 --> 00:02:07,730 the array in each dimension. 48 00:02:07,730 --> 00:02:11,090 We can create a numpy array with real numbers. 49 00:02:11,090 --> 00:02:12,960 When we check the type of the array, 50 00:02:12,960 --> 00:02:15,920 we get numpy.ndarray. 51 00:02:15,920 --> 00:02:18,480 If we examine the attribute D type, 52 00:02:18,480 --> 00:02:22,520 we see float64 as the elements are not integers. 53 00:02:22,520 --> 00:02:26,960 There were many other attributes, check out numpy.org. 54 00:02:26,960 --> 00:02:30,620 Let's review some indexing and slicing methods. 55 00:02:30,620 --> 00:02:32,810 We can change the first element of 56 00:02:32,810 --> 00:02:35,220 the array to 100 as follows. 57 00:02:35,220 --> 00:02:37,940 The array's first value is now 100. 58 00:02:37,940 --> 00:02:41,340 We can change the fifth element of the array as follows. 59 00:02:41,340 --> 00:02:44,090 The fifth element is now zero. 60 00:02:44,090 --> 00:02:48,000 Like lists and tuples we can slice a NumPy array. 61 00:02:48,000 --> 00:02:49,880 The elements of the array correspond 62 00:02:49,880 --> 00:02:51,470 to the following index. 63 00:02:51,470 --> 00:02:54,740 We can select the elements from one to three and assign 64 00:02:54,740 --> 00:02:58,160 it to a new numpy array d as follows. 65 00:02:58,160 --> 00:03:01,460 The elements in d correspond to the index. 66 00:03:01,460 --> 00:03:03,350 Like lists, we do not count 67 00:03:03,350 --> 00:03:06,140 the element corresponding to the last index. 68 00:03:06,140 --> 00:03:08,920 We can assign the corresponding indices 69 00:03:08,920 --> 00:03:11,130 to new values as follows. 70 00:03:11,130 --> 00:03:14,040 The array c now has new values. 71 00:03:14,040 --> 00:03:16,520 See the labs or numpy.org for 72 00:03:16,520 --> 00:03:19,340 more examples of what you can do with numpy. 73 00:03:19,340 --> 00:03:21,410 Numpy makes it easier to do 74 00:03:21,410 --> 00:03:22,610 many operations that are 75 00:03:22,610 --> 00:03:24,900 commonly performed in data science. 76 00:03:24,900 --> 00:03:26,630 The same operations are 77 00:03:26,630 --> 00:03:28,580 usually computationally faster and 78 00:03:28,580 --> 00:03:32,570 require less memory in numpy compared to regular Python. 79 00:03:32,570 --> 00:03:34,830 Let's review some of these operations 80 00:03:34,830 --> 00:03:37,050 on one-dimensional arrays. 81 00:03:37,050 --> 00:03:39,710 We will look at many of the operations in the context of 82 00:03:39,710 --> 00:03:43,250 Euclidian vectors to make things more interesting. 83 00:03:43,250 --> 00:03:46,070 Vector addition is a widely used operation 84 00:03:46,070 --> 00:03:47,360 in data science. 85 00:03:47,360 --> 00:03:50,000 Consider the vector u with two elements, 86 00:03:50,000 --> 00:03:53,700 the elements are distinguished by the different colors. 87 00:03:53,700 --> 00:03:57,530 Similarly, consider the vector v with two components. 88 00:03:57,530 --> 00:03:59,540 In vector addition, we create 89 00:03:59,540 --> 00:04:01,900 a new vector in this case z. 90 00:04:01,900 --> 00:04:03,790 The first component of z 91 00:04:03,790 --> 00:04:05,430 is the addition of the first component 92 00:04:05,430 --> 00:04:08,730 of vectors u and v. Similarly, 93 00:04:08,730 --> 00:04:10,040 the second component is 94 00:04:10,040 --> 00:04:11,720 the sum of the second components of 95 00:04:11,720 --> 00:04:15,110 u and v. This new vector z is now 96 00:04:15,110 --> 00:04:17,780 a linear combination of the vector u and 97 00:04:17,780 --> 00:04:20,510 v. Representing vector addition 98 00:04:20,510 --> 00:04:23,010 with line segment or arrows is helpful. 99 00:04:23,010 --> 00:04:25,220 The first vector is represented in red. 100 00:04:25,220 --> 00:04:26,540 The vector will point in 101 00:04:26,540 --> 00:04:29,000 the direction of the two components. 102 00:04:29,000 --> 00:04:31,340 The first component of the vector is one. 103 00:04:31,340 --> 00:04:33,870 As a result the arrow is offset 104 00:04:33,870 --> 00:04:36,860 one unit from the origin in the horizontal direction. 105 00:04:36,860 --> 00:04:39,090 The second component is zero, 106 00:04:39,090 --> 00:04:42,150 we represent this component in the vertical direction. 107 00:04:42,150 --> 00:04:44,450 As this component is zero, 108 00:04:44,450 --> 00:04:47,700 the vector does not point in the vertical direction. 109 00:04:47,700 --> 00:04:50,480 We represent the second vector in blue. 110 00:04:50,480 --> 00:04:52,470 The first component is zero, 111 00:04:52,470 --> 00:04:54,050 therefore the arrow does not 112 00:04:54,050 --> 00:04:56,180 point to the horizontal direction. 113 00:04:56,180 --> 00:04:58,730 The second component is one. 114 00:04:58,730 --> 00:05:00,740 As a result the vector points in 115 00:05:00,740 --> 00:05:02,790 the vertical direction one unit. 116 00:05:02,790 --> 00:05:05,480 When we add the vector u and v, 117 00:05:05,480 --> 00:05:07,430 we get the new vector z. 118 00:05:07,430 --> 00:05:09,570 We add the first component, 119 00:05:09,570 --> 00:05:12,360 this corresponds to the horizontal direction. 120 00:05:12,360 --> 00:05:14,670 We also add the second component. 121 00:05:14,670 --> 00:05:16,640 It's helpful to use the tip to 122 00:05:16,640 --> 00:05:18,900 tail method when adding vectors, 123 00:05:18,900 --> 00:05:22,800 placing the tail of the vector v on the tip of vector u. 124 00:05:22,800 --> 00:05:25,640 The new vector z is constructed by connecting 125 00:05:25,640 --> 00:05:28,640 the base of the first vector u with the tail of the 126 00:05:28,640 --> 00:05:31,890 second v. The following three lines of code 127 00:05:31,890 --> 00:05:33,500 we'll add the two lists and place 128 00:05:33,500 --> 00:05:35,780 the result in the list z. 129 00:05:35,780 --> 00:05:37,460 We can also perform 130 00:05:37,460 --> 00:05:40,840 vector addition with one line of NumPy code. 131 00:05:40,840 --> 00:05:43,280 It would require multiple lines to perform 132 00:05:43,280 --> 00:05:45,230 vector subtraction on two lists 133 00:05:45,230 --> 00:05:47,360 as shown on the right side of the screen. 134 00:05:47,360 --> 00:05:50,720 In addition, the numpy code will run much faster. 135 00:05:50,720 --> 00:05:53,580 This is important if you have lots of data. 136 00:05:53,580 --> 00:05:56,540 We can also perform vector subtraction by changing 137 00:05:56,540 --> 00:05:59,240 the addition sign to a subtraction sign. 138 00:05:59,240 --> 00:06:00,770 It would require multiple lines 139 00:06:00,770 --> 00:06:02,130 perform vector subtraction 140 00:06:02,130 --> 00:06:05,640 on two lists as shown on the right side of the screen. 141 00:06:05,640 --> 00:06:08,150 Vector multiplication with a scalar is 142 00:06:08,150 --> 00:06:10,470 another commonly performed operation. 143 00:06:10,470 --> 00:06:12,130 Consider the vector y, 144 00:06:12,130 --> 00:06:14,940 each component is specified by a different color. 145 00:06:14,940 --> 00:06:16,940 We simply multiply the vector by 146 00:06:16,940 --> 00:06:19,570 a scalar value in this case two. 147 00:06:19,570 --> 00:06:22,680 Each component of the vector is multiplied by two, 148 00:06:22,680 --> 00:06:25,550 in this case each component is doubled. 149 00:06:25,550 --> 00:06:27,620 We can use the line segment or 150 00:06:27,620 --> 00:06:29,960 arrows to visualize what's going on. 151 00:06:29,960 --> 00:06:32,610 The original vector y is in purple. 152 00:06:32,610 --> 00:06:35,900 After multiplying it by a scalar value of two, 153 00:06:35,900 --> 00:06:39,590 the vector is stretched out by two units as shown in red. 154 00:06:39,590 --> 00:06:42,870 The new vector is twice as long in each direction. 155 00:06:42,870 --> 00:06:45,500 Vector multiplication with a scalar only 156 00:06:45,500 --> 00:06:48,570 requires one line of code using numpy. 157 00:06:48,570 --> 00:06:50,480 It would require multiple lines 158 00:06:50,480 --> 00:06:51,890 to perform the same task as 159 00:06:51,890 --> 00:06:53,610 shown with Python lists 160 00:06:53,610 --> 00:06:55,850 as shown on the right side of the screen. 161 00:06:55,850 --> 00:06:59,950 In addition, the operation would also be much slower. 162 00:06:59,950 --> 00:07:01,820 Hadamard product is 163 00:07:01,820 --> 00:07:05,360 another widely used operation in data science. 164 00:07:05,360 --> 00:07:07,520 Consider the following two vectors, 165 00:07:07,520 --> 00:07:10,070 u and v. The Hadamard product 166 00:07:10,070 --> 00:07:12,280 of u and v is a new vector z. 167 00:07:12,280 --> 00:07:14,780 The first component of z is the product of 168 00:07:14,780 --> 00:07:18,020 the first element of u and v. Similarly, 169 00:07:18,020 --> 00:07:19,160 the second component is 170 00:07:19,160 --> 00:07:20,990 the product of the second element of 171 00:07:20,990 --> 00:07:24,380 u and v. The resultant vector consists of 172 00:07:24,380 --> 00:07:27,800 the entry wise product of u and v. We can 173 00:07:27,800 --> 00:07:29,840 also perform hadamard product 174 00:07:29,840 --> 00:07:31,910 with one line of code in numpy. 175 00:07:31,910 --> 00:07:34,190 It would require multiple lines to perform 176 00:07:34,190 --> 00:07:36,020 hadamard product on two lists 177 00:07:36,020 --> 00:07:38,390 as shown on the right side of the screen. 178 00:07:38,390 --> 00:07:40,130 The dot product is 179 00:07:40,130 --> 00:07:42,930 another widely used operation in data science. 180 00:07:42,930 --> 00:07:45,370 Consider the vector u and v, 181 00:07:45,370 --> 00:07:48,260 the dot product is a single number given by 182 00:07:48,260 --> 00:07:49,730 the following term and 183 00:07:49,730 --> 00:07:52,200 represents how similar two vectors are. 184 00:07:52,200 --> 00:07:55,460 We multiply the first component from v and u, 185 00:07:55,460 --> 00:07:57,530 we then multiply the second component 186 00:07:57,530 --> 00:07:59,300 and add the result together. 187 00:07:59,300 --> 00:08:01,370 The result is a number that represents 188 00:08:01,370 --> 00:08:03,860 how similar the two vectors are. 189 00:08:03,860 --> 00:08:07,490 We can also perform dot product using the numpy function 190 00:08:07,490 --> 00:08:11,280 dot and assign it with the variable result as follows. 191 00:08:11,280 --> 00:08:13,320 Consider the array u, 192 00:08:13,320 --> 00:08:15,900 the array contains the following elements. 193 00:08:15,900 --> 00:08:18,480 If we add a scalar value to the array, 194 00:08:18,480 --> 00:08:21,390 numpy will add that value to each element. 195 00:08:21,390 --> 00:08:24,360 This property is known as broadcasting. 196 00:08:24,360 --> 00:08:26,850 A universal function is a function that 197 00:08:26,850 --> 00:08:29,820 operates on ND arrays. 198 00:08:29,820 --> 00:08:33,570 We can apply a universal function to a numpy array. 199 00:08:33,570 --> 00:08:35,640 Consider the arrays a, 200 00:08:35,640 --> 00:08:38,180 we can calculate the mean or average value of 201 00:08:38,180 --> 00:08:41,840 all the elements in a using the method mean. 202 00:08:41,840 --> 00:08:45,000 This corresponds to the average of all the elements. 203 00:08:45,000 --> 00:08:47,600 In this case the result is zero. 204 00:08:47,600 --> 00:08:49,490 There are many other functions. 205 00:08:49,490 --> 00:08:53,070 For example, consider the numpy arrays b. 206 00:08:53,070 --> 00:08:56,460 We can find the maximum value using the method five. 207 00:08:56,460 --> 00:08:59,000 We see the largest value is five, 208 00:08:59,000 --> 00:09:02,400 therefore the method max returns a five. 209 00:09:02,400 --> 00:09:05,100 We can use numpy to create functions that 210 00:09:05,100 --> 00:09:08,100 map numpy arrays to new numpy arrays. 211 00:09:08,100 --> 00:09:11,310 Let's implement some code on the left side of the screen 212 00:09:11,310 --> 00:09:12,740 and use the right side of 213 00:09:12,740 --> 00:09:15,030 the screen to demonstrate what's going on. 214 00:09:15,030 --> 00:09:19,170 We can access the value of pie in numpy as follows. 215 00:09:19,170 --> 00:09:22,500 We can create the following numpy array in radians. 216 00:09:22,500 --> 00:09:25,830 This array corresponds to the following vector. 217 00:09:25,830 --> 00:09:28,670 We can apply the function sin to the array 218 00:09:28,670 --> 00:09:32,220 x and assign the values to the array y. 219 00:09:32,220 --> 00:09:34,010 This applies the sin function 220 00:09:34,010 --> 00:09:35,900 to each element in the array, 221 00:09:35,900 --> 00:09:37,820 this corresponds to applying 222 00:09:37,820 --> 00:09:40,800 the sine function to each component of the vector. 223 00:09:40,800 --> 00:09:43,730 The result is a new array y, 224 00:09:43,730 --> 00:09:45,260 where each value corresponds to 225 00:09:45,260 --> 00:09:47,000 a sine function being applied to 226 00:09:47,000 --> 00:09:49,430 each element in the array x. 227 00:09:49,430 --> 00:09:51,590 A useful function for plotting 228 00:09:51,590 --> 00:09:54,420 mathematical functions is line space. 229 00:09:54,420 --> 00:09:56,390 Line space returns evenly 230 00:09:56,390 --> 00:09:59,100 spaced numbers over specified interval. 231 00:09:59,100 --> 00:10:02,230 We specify the starting point of the sequence, 232 00:10:02,230 --> 00:10:04,240 the ending point of the sequence. 233 00:10:04,240 --> 00:10:06,650 The parameter num indicates 234 00:10:06,650 --> 00:10:08,580 the number of samples to generate, 235 00:10:08,580 --> 00:10:10,340 in this case five. 236 00:10:10,340 --> 00:10:12,980 The space between samples is one. 237 00:10:12,980 --> 00:10:15,670 If we change the parameter num to nine, 238 00:10:15,670 --> 00:10:18,450 we get nine evenly spaced numbers 239 00:10:18,450 --> 00:10:21,890 over the integral from negative two to two. 240 00:10:21,890 --> 00:10:23,570 The result is the difference 241 00:10:23,570 --> 00:10:25,170 between subsequent samples is 242 00:10:25,170 --> 00:10:29,550 0.5 as opposed to one as before. 243 00:10:29,550 --> 00:10:33,450 We can use the function line space to generate 100 244 00:10:33,450 --> 00:10:37,920 evenly spaced samples from the interval zero to two pie. 245 00:10:37,920 --> 00:10:40,520 We can use the numpy function sin to 246 00:10:40,520 --> 00:10:43,800 map the array x to a new array y. 247 00:10:43,800 --> 00:10:46,510 We can import the library pyplot as 248 00:10:46,510 --> 00:10:49,860 plt to help us plot the function. 249 00:10:49,860 --> 00:10:52,340 As we are using a Jupiter notebook, 250 00:10:52,340 --> 00:10:56,980 we use the command matplotlib inline to display the plot. 251 00:10:56,980 --> 00:10:59,220 The following command plots a graph. 252 00:10:59,220 --> 00:11:01,160 The first input corresponds to 253 00:11:01,160 --> 00:11:04,320 the values for the horizontal or x-axis. 254 00:11:04,320 --> 00:11:06,920 The second input corresponds to 255 00:11:06,920 --> 00:11:09,600 the values for the vertical or y-axis. 256 00:11:09,600 --> 00:11:12,320 There's a lot more you can do with numpy. 257 00:11:12,320 --> 00:11:15,780 Check out the labs and numpy.org for more. 258 00:11:15,780 --> 00:11:18,950 Thanks for watching this video. 259 00:11:18,950 --> 00:11:23,000 (Music)