1 00:00:00,000 --> 00:00:02,280 When we have a data frame we can work with 2 00:00:02,280 --> 00:00:06,050 the data and save the results in other formats. 3 00:00:06,050 --> 00:00:09,660 Consider the stack of 13 blocks of different colors. 4 00:00:09,660 --> 00:00:12,250 We can see there are three unique colors. 5 00:00:12,250 --> 00:00:13,970 Let's say you would like to find out 6 00:00:13,970 --> 00:00:17,140 how many unique elements are in a column of a data frame. 7 00:00:17,140 --> 00:00:20,960 This may be much more difficult because instead of 13 elements, 8 00:00:20,960 --> 00:00:22,620 you may have millions. 9 00:00:22,620 --> 00:00:25,060 Pandas has the method unique to 10 00:00:25,060 --> 00:00:28,630 determine the unique elements in a column of a data frame. 11 00:00:28,630 --> 00:00:34,000 Lets say we would like to determine the unique year of the albums in the data set. 12 00:00:34,000 --> 00:00:36,150 We enter the name of the data frame, 13 00:00:36,150 --> 00:00:40,090 then enter the name of the column released within brackets. 14 00:00:40,090 --> 00:00:42,410 Then we apply the method unique. 15 00:00:42,410 --> 00:00:46,400 The result is all of the unique elements in the column released. 16 00:00:46,400 --> 00:00:49,030 Let's say we would like to create a new database 17 00:00:49,030 --> 00:00:52,140 consisting of songs from the 1980s and after. 18 00:00:52,140 --> 00:00:56,670 We can look at the column released for songs made after 1979, 19 00:00:56,670 --> 00:00:59,230 then select the corresponding columns. 20 00:00:59,230 --> 00:01:02,580 We can accomplish this within one line of code in Pandas. 21 00:01:02,580 --> 00:01:04,730 But let's break up the steps. 22 00:01:04,730 --> 00:01:09,560 We can use the inequality operators for the entire data frame in Pandas. 23 00:01:09,560 --> 00:01:12,340 The result is a series of Boolean values. 24 00:01:12,340 --> 00:01:15,130 For our case, we simply specify the column 25 00:01:15,130 --> 00:01:19,810 released and the inequality for the albums after 1979. 26 00:01:19,810 --> 00:01:22,840 The result is a series of Boolean values. 27 00:01:22,840 --> 00:01:27,280 The result is true when the condition is true and false otherwise. 28 00:01:27,280 --> 00:01:30,940 We can select the specified columns in one line. 29 00:01:30,940 --> 00:01:35,240 We simply use the data frames names and square brackets we placed 30 00:01:35,240 --> 00:01:40,090 the previously mentioned inequality and assign it to the variable df1. 31 00:01:40,090 --> 00:01:42,280 We now have a new data frame, 32 00:01:42,280 --> 00:01:45,930 where each album was released after 1979. 33 00:01:45,930 --> 00:01:49,710 We can save the new data frame using the method to_csv. 34 00:01:49,710 --> 00:01:54,050 The argument is the name of the csv file. 35 00:01:54,050 --> 00:01:57,110 Make sure you include a.csv extension. 36 00:01:57,110 --> 00:02:01,000 There are other functions to save the data frame in other formats. 37 00:02:01,000 --> 00:02:06,000 (Music)