1 00:00:00,000 --> 00:00:04,850 Dependencies or libraries are pre-written code to help solve problems. 2 00:00:04,850 --> 00:00:07,770 In this video, we will introduce Pandas, 3 00:00:07,770 --> 00:00:10,470 a popular library for data analysis. 4 00:00:10,470 --> 00:00:16,030 We can import the library or a dependency like Pandas using the following command. 5 00:00:16,030 --> 00:00:20,210 We start with the import command followed by the name of the library. 6 00:00:20,210 --> 00:00:24,510 We now have access to a large number of pre-built classes and functions. 7 00:00:24,510 --> 00:00:27,100 This assumes the library is installed. 8 00:00:27,100 --> 00:00:28,430 In our lab environment, 9 00:00:28,430 --> 00:00:31,220 all the necessary libraries are installed. 10 00:00:31,220 --> 00:00:37,390 Let's say we would like to load a CSV file using the Pandas built-in function, read csv. 11 00:00:37,390 --> 00:00:42,100 A CSV is a typical file type used to store data. 12 00:00:42,100 --> 00:00:44,530 We simply typed the word Pandas, 13 00:00:44,530 --> 00:00:47,830 then a dot, and the name of the function with all the inputs. 14 00:00:47,830 --> 00:00:50,800 Typing Pandas all the time may get tedious. 15 00:00:50,800 --> 00:00:54,450 We can use the as statement to shorten the name of the library. 16 00:00:54,450 --> 00:00:58,480 In this case, we use the standard abbreviation, pd. 17 00:00:58,480 --> 00:01:01,120 Now we type pd, and a dot, 18 00:01:01,120 --> 00:01:03,860 followed by the name of the function we would like to use. 19 00:01:03,860 --> 00:01:06,100 In this case, read_csv. 20 00:01:06,100 --> 00:01:10,670 We are not limited to the abbreviation pd. 21 00:01:10,670 --> 00:01:14,050 In this case, we use the term banana. 22 00:01:14,050 --> 00:01:17,010 We will stick with pd for the rest of this video. 23 00:01:17,010 --> 00:01:19,580 Let's examine this code more in-depth. 24 00:01:19,580 --> 00:01:23,710 One way Pandas allows you to work with data is with the data frame. 25 00:01:23,710 --> 00:01:28,690 Let's go over the process to go from a CSV file to a data frame. 26 00:01:28,690 --> 00:01:32,350 This variable stores the path of the CSV. 27 00:01:32,350 --> 00:01:36,910 It is used as an argument to the read_csv function. 28 00:01:36,910 --> 00:01:39,670 The result is stored to the variable df. 29 00:01:39,670 --> 00:01:42,370 This is short for data frame. 30 00:01:42,370 --> 00:01:46,410 Now that we have the data in a data frame, we can work with it. 31 00:01:46,410 --> 00:01:51,610 We can use the method head to examine the first five rows of a data frame. 32 00:01:51,610 --> 00:01:54,810 The process for loading an Excel file is similar. 33 00:01:54,810 --> 00:01:57,070 We use the path of the Excel file. 34 00:01:57,070 --> 00:01:59,450 The function reads Excel. 35 00:01:59,450 --> 00:02:01,610 The result is a data frame. 36 00:02:01,610 --> 00:02:04,910 A data frame is comprised of rows and columns. 37 00:02:04,910 --> 00:02:07,890 We can create a data frame out of a dictionary. 38 00:02:07,890 --> 00:02:10,800 The keys correspond to the column labels. 39 00:02:10,800 --> 00:02:13,920 The values or lists corresponding to the rows. 40 00:02:13,920 --> 00:02:18,720 We then cast the dictionary to a data frame using the function data frame. 41 00:02:18,720 --> 00:02:21,830 We can see the direct correspondence between the table. 42 00:02:21,830 --> 00:02:24,470 The keys correspond to the table headers. 43 00:02:24,470 --> 00:02:27,830 The values are lists corresponding to the rows. 44 00:02:27,830 --> 00:02:31,340 We can create a new data frame consisting of one column. 45 00:02:31,340 --> 00:02:34,580 We just put the data frame name, in this case, df, 46 00:02:34,580 --> 00:02:38,480 and the name of the column header enclosed in double brackets. 47 00:02:38,480 --> 00:02:42,720 The result is a new data frame comprised of the original column. 48 00:02:42,720 --> 00:02:45,820 You can do the same thing for multiple columns. 49 00:02:45,820 --> 00:02:48,900 We just put the data frame name, in this case, df, 50 00:02:48,900 --> 00:02:53,560 and the name of the multiple column headers enclosed in double brackets. 51 00:02:53,560 --> 00:02:58,110 The result is a new data frame comprised of the specified columns. 52 00:02:58,110 --> 00:03:02,000 (Music)