1
00:00:00,000 --> 00:00:04,850
Dependencies or libraries are pre-written code to help solve problems.

2
00:00:04,850 --> 00:00:07,770
In this video, we will introduce Pandas,

3
00:00:07,770 --> 00:00:10,470
a popular library for data analysis.

4
00:00:10,470 --> 00:00:16,030
We can import the library or a dependency like Pandas using the following command.

5
00:00:16,030 --> 00:00:20,210
We start with the import command followed by the name of the library.

6
00:00:20,210 --> 00:00:24,510
We now have access to a large number of pre-built classes and functions.

7
00:00:24,510 --> 00:00:27,100
This assumes the library is installed.

8
00:00:27,100 --> 00:00:28,430
In our lab environment,

9
00:00:28,430 --> 00:00:31,220
all the necessary libraries are installed.

10
00:00:31,220 --> 00:00:37,390
Let's say we would like to load a CSV file using the Pandas built-in function, read csv.

11
00:00:37,390 --> 00:00:42,100
A CSV is a typical file type used to store data.

12
00:00:42,100 --> 00:00:44,530
We simply typed the word Pandas,

13
00:00:44,530 --> 00:00:47,830
then a dot, and the name of the function with all the inputs.

14
00:00:47,830 --> 00:00:50,800
Typing Pandas all the time may get tedious.

15
00:00:50,800 --> 00:00:54,450
We can use the as statement to shorten the name of the library.

16
00:00:54,450 --> 00:00:58,480
In this case, we use the standard abbreviation, pd.

17
00:00:58,480 --> 00:01:01,120
Now we type pd, and a dot,

18
00:01:01,120 --> 00:01:03,860
followed by the name of the function we would like to use.

19
00:01:03,860 --> 00:01:06,100
In this case, read_csv.

20
00:01:06,100 --> 00:01:10,670
We are not limited to the abbreviation pd.

21
00:01:10,670 --> 00:01:14,050
In this case, we use the term banana.

22
00:01:14,050 --> 00:01:17,010
We will stick with pd for the rest of this video.

23
00:01:17,010 --> 00:01:19,580
Let's examine this code more in-depth.

24
00:01:19,580 --> 00:01:23,710
One way Pandas allows you to work with data is with the data frame.

25
00:01:23,710 --> 00:01:28,690
Let's go over the process to go from a CSV file to a data frame.

26
00:01:28,690 --> 00:01:32,350
This variable stores the path of the CSV.

27
00:01:32,350 --> 00:01:36,910
It is used as an argument to the read_csv function.

28
00:01:36,910 --> 00:01:39,670
The result is stored to the variable df.

29
00:01:39,670 --> 00:01:42,370
This is short for data frame.

30
00:01:42,370 --> 00:01:46,410
Now that we have the data in a data frame, we can work with it.

31
00:01:46,410 --> 00:01:51,610
We can use the method head to examine the first five rows of a data frame.

32
00:01:51,610 --> 00:01:54,810
The process for loading an Excel file is similar.

33
00:01:54,810 --> 00:01:57,070
We use the path of the Excel file.

34
00:01:57,070 --> 00:01:59,450
The function reads Excel.

35
00:01:59,450 --> 00:02:01,610
The result is a data frame.

36
00:02:01,610 --> 00:02:04,910
A data frame is comprised of rows and columns.

37
00:02:04,910 --> 00:02:07,890
We can create a data frame out of a dictionary.

38
00:02:07,890 --> 00:02:10,800
The keys correspond to the column labels.

39
00:02:10,800 --> 00:02:13,920
The values or lists corresponding to the rows.

40
00:02:13,920 --> 00:02:18,720
We then cast the dictionary to a data frame using the function data frame.

41
00:02:18,720 --> 00:02:21,830
We can see the direct correspondence between the table.

42
00:02:21,830 --> 00:02:24,470
The keys correspond to the table headers.

43
00:02:24,470 --> 00:02:27,830
The values are lists corresponding to the rows.

44
00:02:27,830 --> 00:02:31,340
We can create a new data frame consisting of one column.

45
00:02:31,340 --> 00:02:34,580
We just put the data frame name, in this case, df,

46
00:02:34,580 --> 00:02:38,480
and the name of the column header enclosed in double brackets.

47
00:02:38,480 --> 00:02:42,720
The result is a new data frame comprised of the original column.

48
00:02:42,720 --> 00:02:45,820
You can do the same thing for multiple columns.

49
00:02:45,820 --> 00:02:48,900
We just put the data frame name, in this case, df,

50
00:02:48,900 --> 00:02:53,560
and the name of the multiple column headers enclosed in double brackets.

51
00:02:53,560 --> 00:02:58,110
The result is a new data frame comprised of the specified columns.

52
00:02:58,110 --> 00:03:02,000
(Music)