1
00:00:00,000 --> 00:00:02,280
When we have a data frame we can work with

2
00:00:02,280 --> 00:00:06,050
the data and save the results in other formats.

3
00:00:06,050 --> 00:00:09,660
Consider the stack of 13 blocks of different colors.

4
00:00:09,660 --> 00:00:12,250
We can see there are three unique colors.

5
00:00:12,250 --> 00:00:13,970
Let's say you would like to find out

6
00:00:13,970 --> 00:00:17,140
how many unique elements are in a column of a data frame.

7
00:00:17,140 --> 00:00:20,960
This may be much more difficult because instead of 13 elements,

8
00:00:20,960 --> 00:00:22,620
you may have millions.

9
00:00:22,620 --> 00:00:25,060
Pandas has the method unique to

10
00:00:25,060 --> 00:00:28,630
determine the unique elements in a column of a data frame.

11
00:00:28,630 --> 00:00:34,000
Lets say we would like to determine the unique year of the albums in the data set.

12
00:00:34,000 --> 00:00:36,150
We enter the name of the data frame,

13
00:00:36,150 --> 00:00:40,090
then enter the name of the column released within brackets.

14
00:00:40,090 --> 00:00:42,410
Then we apply the method unique.

15
00:00:42,410 --> 00:00:46,400
The result is all of the unique elements in the column released.

16
00:00:46,400 --> 00:00:49,030
Let's say we would like to create a new database

17
00:00:49,030 --> 00:00:52,140
consisting of songs from the 1980s and after.

18
00:00:52,140 --> 00:00:56,670
We can look at the column released for songs made after 1979,

19
00:00:56,670 --> 00:00:59,230
then select the corresponding columns.

20
00:00:59,230 --> 00:01:02,580
We can accomplish this within one line of code in Pandas.

21
00:01:02,580 --> 00:01:04,730
But let's break up the steps.

22
00:01:04,730 --> 00:01:09,560
We can use the inequality operators for the entire data frame in Pandas.

23
00:01:09,560 --> 00:01:12,340
The result is a series of Boolean values.

24
00:01:12,340 --> 00:01:15,130
For our case, we simply specify the column

25
00:01:15,130 --> 00:01:19,810
released and the inequality for the albums after 1979.

26
00:01:19,810 --> 00:01:22,840
The result is a series of Boolean values.

27
00:01:22,840 --> 00:01:27,280
The result is true when the condition is true and false otherwise.

28
00:01:27,280 --> 00:01:30,940
We can select the specified columns in one line.

29
00:01:30,940 --> 00:01:35,240
We simply use the data frames names and square brackets we placed

30
00:01:35,240 --> 00:01:40,090
the previously mentioned inequality and assign it to the variable df1.

31
00:01:40,090 --> 00:01:42,280
We now have a new data frame,

32
00:01:42,280 --> 00:01:45,930
where each album was released after 1979.

33
00:01:45,930 --> 00:01:49,710
We can save the new data frame using the method to_csv.

34
00:01:49,710 --> 00:01:54,050
The argument is the name of the csv file.

35
00:01:54,050 --> 00:01:57,110
Make sure you include a.csv extension.

36
00:01:57,110 --> 00:02:01,000
There are other functions to save the data frame in other formats.

37
00:02:01,000 --> 00:02:06,000
(Music)