1
00:00:00,000 --> 00:00:04,060
In this video we will be covering numpy in 1D,

2
00:00:04,060 --> 00:00:06,790
in particular ND arrays.

3
00:00:06,790 --> 00:00:10,270
Numpy is a library for scientific computing.

4
00:00:10,270 --> 00:00:12,520
It has many useful functions.

5
00:00:12,520 --> 00:00:15,790
There are many other advantages like speed and memory.

6
00:00:15,790 --> 00:00:18,730
Numpy is also the basis for pandas.

7
00:00:18,730 --> 00:00:21,070
So check out our pandas video.

8
00:00:21,070 --> 00:00:23,350
In this video we will be covering

9
00:00:23,350 --> 00:00:25,830
the basics and array creation,

10
00:00:25,830 --> 00:00:27,540
indexing and slicing,

11
00:00:27,540 --> 00:00:31,060
basic operations, universal functions.

12
00:00:31,060 --> 00:00:34,560
Let's go over how to create a numpy array.

13
00:00:34,560 --> 00:00:36,990
A Python list is a container that

14
00:00:36,990 --> 00:00:39,510
allows you to store and access data.

15
00:00:39,510 --> 00:00:42,670
Each element is associated with an index.

16
00:00:42,670 --> 00:00:44,750
We can access each element using

17
00:00:44,750 --> 00:00:47,210
a square bracket as follows.

18
00:00:47,210 --> 00:00:52,020
A numpy array or ND array is similar to a list.

19
00:00:52,020 --> 00:00:54,180
It's usually fixed in size

20
00:00:54,180 --> 00:00:56,510
and each element is of the same type,

21
00:00:56,510 --> 00:00:58,290
in this case integers.

22
00:00:58,290 --> 00:00:59,720
We can cast a list to

23
00:00:59,720 --> 00:01:02,460
a numpy array by first importing numpy.

24
00:01:02,460 --> 00:01:05,230
We then cast the list as follows;

25
00:01:05,230 --> 00:01:08,010
we can access the data via an index.

26
00:01:08,010 --> 00:01:10,490
As with the list, we can access

27
00:01:10,490 --> 00:01:14,070
each element with an integer and a square bracket.

28
00:01:14,070 --> 00:01:17,240
The value of a is stored as follows.

29
00:01:17,240 --> 00:01:23,220
If we check the type of the array we get, numpy.ndarray.

30
00:01:23,220 --> 00:01:26,970
As numpy arrays contain data of the same type,

31
00:01:26,970 --> 00:01:28,750
we can use the attribute

32
00:01:28,750 --> 00:01:32,720
dtype to obtain the data type of the array's elements.

33
00:01:32,720 --> 00:01:35,420
In this case a 64-bit integer.

34
00:01:35,420 --> 00:01:38,150
Let's review some basic array attributes

35
00:01:38,150 --> 00:01:39,950
using the array a.

36
00:01:39,950 --> 00:01:41,480
The attribute size is

37
00:01:41,480 --> 00:01:43,560
the number of elements in the array.

38
00:01:43,560 --> 00:01:46,850
As there are five elements the result is five.

39
00:01:46,850 --> 00:01:48,890
The next two attributes will make

40
00:01:48,890 --> 00:01:51,890
more sense when we get to higher dimensions,

41
00:01:51,890 --> 00:01:53,440
but let's review them.

42
00:01:53,440 --> 00:01:56,360
The attribute ndim represents

43
00:01:56,360 --> 00:01:59,720
the number of array dimensions or the rank of the array,

44
00:01:59,720 --> 00:02:01,280
in this case one.

45
00:02:01,280 --> 00:02:04,130
The attribute shape is a tuple of

46
00:02:04,130 --> 00:02:05,720
integers indicating the size of

47
00:02:05,720 --> 00:02:07,730
the array in each dimension.

48
00:02:07,730 --> 00:02:11,090
We can create a numpy array with real numbers.

49
00:02:11,090 --> 00:02:12,960
When we check the type of the array,

50
00:02:12,960 --> 00:02:15,920
we get numpy.ndarray.

51
00:02:15,920 --> 00:02:18,480
If we examine the attribute D type,

52
00:02:18,480 --> 00:02:22,520
we see float64 as the elements are not integers.

53
00:02:22,520 --> 00:02:26,960
There were many other attributes, check out numpy.org.

54
00:02:26,960 --> 00:02:30,620
Let's review some indexing and slicing methods.

55
00:02:30,620 --> 00:02:32,810
We can change the first element of

56
00:02:32,810 --> 00:02:35,220
the array to 100 as follows.

57
00:02:35,220 --> 00:02:37,940
The array's first value is now 100.

58
00:02:37,940 --> 00:02:41,340
We can change the fifth element of the array as follows.

59
00:02:41,340 --> 00:02:44,090
The fifth element is now zero.

60
00:02:44,090 --> 00:02:48,000
Like lists and tuples we can slice a NumPy array.

61
00:02:48,000 --> 00:02:49,880
The elements of the array correspond

62
00:02:49,880 --> 00:02:51,470
to the following index.

63
00:02:51,470 --> 00:02:54,740
We can select the elements from one to three and assign

64
00:02:54,740 --> 00:02:58,160
it to a new numpy array d as follows.

65
00:02:58,160 --> 00:03:01,460
The elements in d correspond to the index.

66
00:03:01,460 --> 00:03:03,350
Like lists, we do not count

67
00:03:03,350 --> 00:03:06,140
the element corresponding to the last index.

68
00:03:06,140 --> 00:03:08,920
We can assign the corresponding indices

69
00:03:08,920 --> 00:03:11,130
to new values as follows.

70
00:03:11,130 --> 00:03:14,040
The array c now has new values.

71
00:03:14,040 --> 00:03:16,520
See the labs or numpy.org for

72
00:03:16,520 --> 00:03:19,340
more examples of what you can do with numpy.

73
00:03:19,340 --> 00:03:21,410
Numpy makes it easier to do

74
00:03:21,410 --> 00:03:22,610
many operations that are

75
00:03:22,610 --> 00:03:24,900
commonly performed in data science.

76
00:03:24,900 --> 00:03:26,630
The same operations are

77
00:03:26,630 --> 00:03:28,580
usually computationally faster and

78
00:03:28,580 --> 00:03:32,570
require less memory in numpy compared to regular Python.

79
00:03:32,570 --> 00:03:34,830
Let's review some of these operations

80
00:03:34,830 --> 00:03:37,050
on one-dimensional arrays.

81
00:03:37,050 --> 00:03:39,710
We will look at many of the operations in the context of

82
00:03:39,710 --> 00:03:43,250
Euclidian vectors to make things more interesting.

83
00:03:43,250 --> 00:03:46,070
Vector addition is a widely used operation

84
00:03:46,070 --> 00:03:47,360
in data science.

85
00:03:47,360 --> 00:03:50,000
Consider the vector u with two elements,

86
00:03:50,000 --> 00:03:53,700
the elements are distinguished by the different colors.

87
00:03:53,700 --> 00:03:57,530
Similarly, consider the vector v with two components.

88
00:03:57,530 --> 00:03:59,540
In vector addition, we create

89
00:03:59,540 --> 00:04:01,900
a new vector in this case z.

90
00:04:01,900 --> 00:04:03,790
The first component of z

91
00:04:03,790 --> 00:04:05,430
is the addition of the first component

92
00:04:05,430 --> 00:04:08,730
of vectors u and v. Similarly,

93
00:04:08,730 --> 00:04:10,040
the second component is

94
00:04:10,040 --> 00:04:11,720
the sum of the second components of

95
00:04:11,720 --> 00:04:15,110
u and v. This new vector z is now

96
00:04:15,110 --> 00:04:17,780
a linear combination of the vector u and

97
00:04:17,780 --> 00:04:20,510
v. Representing vector addition

98
00:04:20,510 --> 00:04:23,010
with line segment or arrows is helpful.

99
00:04:23,010 --> 00:04:25,220
The first vector is represented in red.

100
00:04:25,220 --> 00:04:26,540
The vector will point in

101
00:04:26,540 --> 00:04:29,000
the direction of the two components.

102
00:04:29,000 --> 00:04:31,340
The first component of the vector is one.

103
00:04:31,340 --> 00:04:33,870
As a result the arrow is offset

104
00:04:33,870 --> 00:04:36,860
one unit from the origin in the horizontal direction.

105
00:04:36,860 --> 00:04:39,090
The second component is zero,

106
00:04:39,090 --> 00:04:42,150
we represent this component in the vertical direction.

107
00:04:42,150 --> 00:04:44,450
As this component is zero,

108
00:04:44,450 --> 00:04:47,700
the vector does not point in the vertical direction.

109
00:04:47,700 --> 00:04:50,480
We represent the second vector in blue.

110
00:04:50,480 --> 00:04:52,470
The first component is zero,

111
00:04:52,470 --> 00:04:54,050
therefore the arrow does not

112
00:04:54,050 --> 00:04:56,180
point to the horizontal direction.

113
00:04:56,180 --> 00:04:58,730
The second component is one.

114
00:04:58,730 --> 00:05:00,740
As a result the vector points in

115
00:05:00,740 --> 00:05:02,790
the vertical direction one unit.

116
00:05:02,790 --> 00:05:05,480
When we add the vector u and v,

117
00:05:05,480 --> 00:05:07,430
we get the new vector z.

118
00:05:07,430 --> 00:05:09,570
We add the first component,

119
00:05:09,570 --> 00:05:12,360
this corresponds to the horizontal direction.

120
00:05:12,360 --> 00:05:14,670
We also add the second component.

121
00:05:14,670 --> 00:05:16,640
It's helpful to use the tip to

122
00:05:16,640 --> 00:05:18,900
tail method when adding vectors,

123
00:05:18,900 --> 00:05:22,800
placing the tail of the vector v on the tip of vector u.

124
00:05:22,800 --> 00:05:25,640
The new vector z is constructed by connecting

125
00:05:25,640 --> 00:05:28,640
the base of the first vector u with the tail of the

126
00:05:28,640 --> 00:05:31,890
second v. The following three lines of code

127
00:05:31,890 --> 00:05:33,500
we'll add the two lists and place

128
00:05:33,500 --> 00:05:35,780
the result in the list z.

129
00:05:35,780 --> 00:05:37,460
We can also perform

130
00:05:37,460 --> 00:05:40,840
vector addition with one line of NumPy code.

131
00:05:40,840 --> 00:05:43,280
It would require multiple lines to perform

132
00:05:43,280 --> 00:05:45,230
vector subtraction on two lists

133
00:05:45,230 --> 00:05:47,360
as shown on the right side of the screen.

134
00:05:47,360 --> 00:05:50,720
In addition, the numpy code will run much faster.

135
00:05:50,720 --> 00:05:53,580
This is important if you have lots of data.

136
00:05:53,580 --> 00:05:56,540
We can also perform vector subtraction by changing

137
00:05:56,540 --> 00:05:59,240
the addition sign to a subtraction sign.

138
00:05:59,240 --> 00:06:00,770
It would require multiple lines

139
00:06:00,770 --> 00:06:02,130
perform vector subtraction

140
00:06:02,130 --> 00:06:05,640
on two lists as shown on the right side of the screen.

141
00:06:05,640 --> 00:06:08,150
Vector multiplication with a scalar is

142
00:06:08,150 --> 00:06:10,470
another commonly performed operation.

143
00:06:10,470 --> 00:06:12,130
Consider the vector y,

144
00:06:12,130 --> 00:06:14,940
each component is specified by a different color.

145
00:06:14,940 --> 00:06:16,940
We simply multiply the vector by

146
00:06:16,940 --> 00:06:19,570
a scalar value in this case two.

147
00:06:19,570 --> 00:06:22,680
Each component of the vector is multiplied by two,

148
00:06:22,680 --> 00:06:25,550
in this case each component is doubled.

149
00:06:25,550 --> 00:06:27,620
We can use the line segment or

150
00:06:27,620 --> 00:06:29,960
arrows to visualize what's going on.

151
00:06:29,960 --> 00:06:32,610
The original vector y is in purple.

152
00:06:32,610 --> 00:06:35,900
After multiplying it by a scalar value of two,

153
00:06:35,900 --> 00:06:39,590
the vector is stretched out by two units as shown in red.

154
00:06:39,590 --> 00:06:42,870
The new vector is twice as long in each direction.

155
00:06:42,870 --> 00:06:45,500
Vector multiplication with a scalar only

156
00:06:45,500 --> 00:06:48,570
requires one line of code using numpy.

157
00:06:48,570 --> 00:06:50,480
It would require multiple lines

158
00:06:50,480 --> 00:06:51,890
to perform the same task as

159
00:06:51,890 --> 00:06:53,610
shown with Python lists

160
00:06:53,610 --> 00:06:55,850
as shown on the right side of the screen.

161
00:06:55,850 --> 00:06:59,950
In addition, the operation would also be much slower.

162
00:06:59,950 --> 00:07:01,820
Hadamard product is

163
00:07:01,820 --> 00:07:05,360
another widely used operation in data science.

164
00:07:05,360 --> 00:07:07,520
Consider the following two vectors,

165
00:07:07,520 --> 00:07:10,070
u and v. The Hadamard product

166
00:07:10,070 --> 00:07:12,280
of u and v is a new vector z.

167
00:07:12,280 --> 00:07:14,780
The first component of z is the product of

168
00:07:14,780 --> 00:07:18,020
the first element of u and v. Similarly,

169
00:07:18,020 --> 00:07:19,160
the second component is

170
00:07:19,160 --> 00:07:20,990
the product of the second element of

171
00:07:20,990 --> 00:07:24,380
u and v. The resultant vector consists of

172
00:07:24,380 --> 00:07:27,800
the entry wise product of u and v. We can

173
00:07:27,800 --> 00:07:29,840
also perform hadamard product

174
00:07:29,840 --> 00:07:31,910
with one line of code in numpy.

175
00:07:31,910 --> 00:07:34,190
It would require multiple lines to perform

176
00:07:34,190 --> 00:07:36,020
hadamard product on two lists

177
00:07:36,020 --> 00:07:38,390
as shown on the right side of the screen.

178
00:07:38,390 --> 00:07:40,130
The dot product is

179
00:07:40,130 --> 00:07:42,930
another widely used operation in data science.

180
00:07:42,930 --> 00:07:45,370
Consider the vector u and v,

181
00:07:45,370 --> 00:07:48,260
the dot product is a single number given by

182
00:07:48,260 --> 00:07:49,730
the following term and

183
00:07:49,730 --> 00:07:52,200
represents how similar two vectors are.

184
00:07:52,200 --> 00:07:55,460
We multiply the first component from v and u,

185
00:07:55,460 --> 00:07:57,530
we then multiply the second component

186
00:07:57,530 --> 00:07:59,300
and add the result together.

187
00:07:59,300 --> 00:08:01,370
The result is a number that represents

188
00:08:01,370 --> 00:08:03,860
how similar the two vectors are.

189
00:08:03,860 --> 00:08:07,490
We can also perform dot product using the numpy function

190
00:08:07,490 --> 00:08:11,280
dot and assign it with the variable result as follows.

191
00:08:11,280 --> 00:08:13,320
Consider the array u,

192
00:08:13,320 --> 00:08:15,900
the array contains the following elements.

193
00:08:15,900 --> 00:08:18,480
If we add a scalar value to the array,

194
00:08:18,480 --> 00:08:21,390
numpy will add that value to each element.

195
00:08:21,390 --> 00:08:24,360
This property is known as broadcasting.

196
00:08:24,360 --> 00:08:26,850
A universal function is a function that

197
00:08:26,850 --> 00:08:29,820
operates on ND arrays.

198
00:08:29,820 --> 00:08:33,570
We can apply a universal function to a numpy array.

199
00:08:33,570 --> 00:08:35,640
Consider the arrays a,

200
00:08:35,640 --> 00:08:38,180
we can calculate the mean or average value of

201
00:08:38,180 --> 00:08:41,840
all the elements in a using the method mean.

202
00:08:41,840 --> 00:08:45,000
This corresponds to the average of all the elements.

203
00:08:45,000 --> 00:08:47,600
In this case the result is zero.

204
00:08:47,600 --> 00:08:49,490
There are many other functions.

205
00:08:49,490 --> 00:08:53,070
For example, consider the numpy arrays b.

206
00:08:53,070 --> 00:08:56,460
We can find the maximum value using the method five.

207
00:08:56,460 --> 00:08:59,000
We see the largest value is five,

208
00:08:59,000 --> 00:09:02,400
therefore the method max returns a five.

209
00:09:02,400 --> 00:09:05,100
We can use numpy to create functions that

210
00:09:05,100 --> 00:09:08,100
map numpy arrays to new numpy arrays.

211
00:09:08,100 --> 00:09:11,310
Let's implement some code on the left side of the screen

212
00:09:11,310 --> 00:09:12,740
and use the right side of

213
00:09:12,740 --> 00:09:15,030
the screen to demonstrate what's going on.

214
00:09:15,030 --> 00:09:19,170
We can access the value of pie in numpy as follows.

215
00:09:19,170 --> 00:09:22,500
We can create the following numpy array in radians.

216
00:09:22,500 --> 00:09:25,830
This array corresponds to the following vector.

217
00:09:25,830 --> 00:09:28,670
We can apply the function sin to the array

218
00:09:28,670 --> 00:09:32,220
x and assign the values to the array y.

219
00:09:32,220 --> 00:09:34,010
This applies the sin function

220
00:09:34,010 --> 00:09:35,900
to each element in the array,

221
00:09:35,900 --> 00:09:37,820
this corresponds to applying

222
00:09:37,820 --> 00:09:40,800
the sine function to each component of the vector.

223
00:09:40,800 --> 00:09:43,730
The result is a new array y,

224
00:09:43,730 --> 00:09:45,260
where each value corresponds to

225
00:09:45,260 --> 00:09:47,000
a sine function being applied to

226
00:09:47,000 --> 00:09:49,430
each element in the array x.

227
00:09:49,430 --> 00:09:51,590
A useful function for plotting

228
00:09:51,590 --> 00:09:54,420
mathematical functions is line space.

229
00:09:54,420 --> 00:09:56,390
Line space returns evenly

230
00:09:56,390 --> 00:09:59,100
spaced numbers over specified interval.

231
00:09:59,100 --> 00:10:02,230
We specify the starting point of the sequence,

232
00:10:02,230 --> 00:10:04,240
the ending point of the sequence.

233
00:10:04,240 --> 00:10:06,650
The parameter num indicates

234
00:10:06,650 --> 00:10:08,580
the number of samples to generate,

235
00:10:08,580 --> 00:10:10,340
in this case five.

236
00:10:10,340 --> 00:10:12,980
The space between samples is one.

237
00:10:12,980 --> 00:10:15,670
If we change the parameter num to nine,

238
00:10:15,670 --> 00:10:18,450
we get nine evenly spaced numbers

239
00:10:18,450 --> 00:10:21,890
over the integral from negative two to two.

240
00:10:21,890 --> 00:10:23,570
The result is the difference

241
00:10:23,570 --> 00:10:25,170
between subsequent samples is

242
00:10:25,170 --> 00:10:29,550
0.5 as opposed to one as before.

243
00:10:29,550 --> 00:10:33,450
We can use the function line space to generate 100

244
00:10:33,450 --> 00:10:37,920
evenly spaced samples from the interval zero to two pie.

245
00:10:37,920 --> 00:10:40,520
We can use the numpy function sin to

246
00:10:40,520 --> 00:10:43,800
map the array x to a new array y.

247
00:10:43,800 --> 00:10:46,510
We can import the library pyplot as

248
00:10:46,510 --> 00:10:49,860
plt to help us plot the function.

249
00:10:49,860 --> 00:10:52,340
As we are using a Jupiter notebook,

250
00:10:52,340 --> 00:10:56,980
we use the command matplotlib inline to display the plot.

251
00:10:56,980 --> 00:10:59,220
The following command plots a graph.

252
00:10:59,220 --> 00:11:01,160
The first input corresponds to

253
00:11:01,160 --> 00:11:04,320
the values for the horizontal or x-axis.

254
00:11:04,320 --> 00:11:06,920
The second input corresponds to

255
00:11:06,920 --> 00:11:09,600
the values for the vertical or y-axis.

256
00:11:09,600 --> 00:11:12,320
There's a lot more you can do with numpy.

257
00:11:12,320 --> 00:11:15,780
Check out the labs and numpy.org for more.

258
00:11:15,780 --> 00:11:18,950
Thanks for watching this video.

259
00:11:18,950 --> 00:11:23,000
(Music)