1 00:00:07,880 --> 00:00:12,225 Welcome back. To write a CSV file, 2 00:00:12,225 --> 00:00:17,370 you just need to write text strings that follow the comma-separated values format. 3 00:00:17,370 --> 00:00:20,280 Here we have the basic structure. 4 00:00:20,280 --> 00:00:24,405 You write a header line with field names separated by commas. 5 00:00:24,405 --> 00:00:29,895 You iterate through your list of objects and for each one you generate a line of output. 6 00:00:29,895 --> 00:00:33,870 I've got here is just going to print the output to the Output Window, 7 00:00:33,870 --> 00:00:36,180 but it's in the CSV format. 8 00:00:36,180 --> 00:00:43,500 As you can see, we have one header row with the three headers, 9 00:00:43,500 --> 00:00:49,935 Name, Age, and Sport and then each line has the same structure. 10 00:00:49,935 --> 00:00:52,685 There's somebody's name and then a comma, 11 00:00:52,685 --> 00:00:57,270 there's an age and then another comma and then a sport. 12 00:00:59,630 --> 00:01:03,735 Now, if we want to change this instead of 13 00:01:03,735 --> 00:01:07,795 printing to the Output Window we want to write to a file, 14 00:01:07,795 --> 00:01:13,645 we'll do our usual transformation where instead of print will use.write, 15 00:01:13,645 --> 00:01:17,600 and we'll have to open the file and close it. 16 00:01:17,600 --> 00:01:20,360 I've actually already got that code written out. 17 00:01:20,360 --> 00:01:27,010 I'm going to switch to it. So, I've opened the file for writing. 18 00:01:27,010 --> 00:01:31,565 I've assigned the file object to this variable name outfile. 19 00:01:31,565 --> 00:01:34,100 Then I'm writing the header on lines 20 00:01:34,100 --> 00:01:39,350 89 and with the row strings instead of printing them out, 21 00:01:39,350 --> 00:01:44,890 I'm doing outfile.write and because we're writing to a file, 22 00:01:44,890 --> 00:01:48,630 we have to explicitly put in line breaks that we want. 23 00:01:52,490 --> 00:01:55,795 Instead of putting it in the Output Window, 24 00:01:55,795 --> 00:02:01,630 it's now in this data file which is available until we reload the page. 25 00:02:01,630 --> 00:02:04,820 Again, the key to outputting CSV format 26 00:02:04,820 --> 00:02:08,360 is just generating a string that contains one line, 27 00:02:08,360 --> 00:02:14,935 where that line has commas that are separating into the field names or the values. 28 00:02:14,935 --> 00:02:19,520 There are two options that work well for generating a line with 29 00:02:19,520 --> 00:02:24,460 the commas separating the values and a third one that I don't really recommend. 30 00:02:24,460 --> 00:02:28,190 Shown here, we're using the first method that I do 31 00:02:28,190 --> 00:02:32,825 like and actually the one I liked the best which is to use a format string. 32 00:02:32,825 --> 00:02:37,245 It's easy to see with this format string, 33 00:02:37,245 --> 00:02:40,460 and we're going to have three values because we have three pairs 34 00:02:40,460 --> 00:02:43,520 of curly braces one, two, three. 35 00:02:43,520 --> 00:02:47,490 It's easy to see that we've got a comma separating each of them. 36 00:02:47,990 --> 00:02:52,470 Then the values are just going to get substituted in. 37 00:02:53,210 --> 00:03:01,290 The first element John Aalberge The second element goes there, 38 00:03:01,290 --> 00:03:06,690 that's the 31 and Cross Country Skiing being the third element. 39 00:03:06,690 --> 00:03:11,625 The second possibility is to use the.join method. 40 00:03:11,625 --> 00:03:18,110 You may recall the.split method for chopping up a string into component parts. 41 00:03:18,110 --> 00:03:22,110 Its counterpart going the other way is the.join method. 42 00:03:23,900 --> 00:03:34,840 I would write something like this row string equals comma.join of some values. 43 00:03:34,840 --> 00:03:39,570 Those values are going to be the same values that we used here. 44 00:03:47,480 --> 00:03:51,609 So, I can use this instead, 45 00:03:51,860 --> 00:03:55,390 I'll comment out the old version. 46 00:03:56,440 --> 00:04:01,220 I always find this a little confusing and I think many students do too. 47 00:04:01,220 --> 00:04:05,750 You might think that the natural thing is to call the join method and pass 48 00:04:05,750 --> 00:04:10,820 comma as a parameter like we did with.split as in comma as a parameter. 49 00:04:10,820 --> 00:04:16,370 But here, we're seeing join as an operation that we do on 50 00:04:16,370 --> 00:04:19,735 the comma object and we have to 51 00:04:19,735 --> 00:04:24,380 pass in as values the things that are going to be joined together. 52 00:04:24,380 --> 00:04:29,315 You just have to remember that that's the opposite order of what you might expect. 53 00:04:29,315 --> 00:04:31,190 So, it's comma.join. 54 00:04:31,190 --> 00:04:35,825 There's a couple other tricky things about the join operator. 55 00:04:35,825 --> 00:04:37,745 I'll show them to you here. 56 00:04:37,745 --> 00:04:40,350 First I'm going to get rid of our markings. 57 00:04:41,280 --> 00:04:46,780 When I run this, I'm going to get an error. 58 00:04:47,270 --> 00:04:49,665 That's actually two problems. 59 00:04:49,665 --> 00:04:57,445 The first problem is the join is expecting two arguments and I've given it four. 60 00:04:57,445 --> 00:05:01,570 So, it really wants a list of things, 61 00:05:01,570 --> 00:05:04,910 not a bunch of different values. 62 00:05:04,910 --> 00:05:07,085 So, I have to give it a list. 63 00:05:07,085 --> 00:05:11,130 I can put all of these values into a list. 64 00:05:12,770 --> 00:05:21,480 One other tricky part about it is that join wants to have a list of strings, 65 00:05:21,480 --> 00:05:23,685 a sequence of strings. 66 00:05:23,685 --> 00:05:26,870 Olympian square bracket one is the number 31. 67 00:05:26,870 --> 00:05:29,840 It isn't a string, it's an integer. 68 00:05:29,840 --> 00:05:33,395 So, we get an error that it expected a string, 69 00:05:33,395 --> 00:05:36,535 but it actually got something different. 70 00:05:36,535 --> 00:05:41,445 Sequence item one, that's Olympian square bracket one. 71 00:05:41,445 --> 00:05:45,700 So, if I turn it into a string, 72 00:05:46,030 --> 00:05:49,680 I'll finally have something that works. 73 00:05:50,900 --> 00:05:56,250 Now, I get the same output that I was getting before. 74 00:05:57,940 --> 00:06:00,970 So, this looks pretty complicated and you're 75 00:06:00,970 --> 00:06:03,695 probably thinking why would anyone ever want to do this. 76 00:06:03,695 --> 00:06:08,890 Well, if my values were all strings, 77 00:06:11,300 --> 00:06:14,770 then I might be tempted to do it. 78 00:06:16,790 --> 00:06:20,940 Now, I don't need to say STR of Olympian. 79 00:06:20,940 --> 00:06:26,750 I can just say, let it be in square bracket one there and it'll work, 80 00:06:28,170 --> 00:06:31,870 and the thing that really makes this attractive in 81 00:06:31,870 --> 00:06:36,250 this situation is that I can just refer to Olympian, 82 00:06:36,250 --> 00:06:39,370 which is already a sequence of strings. 83 00:06:39,370 --> 00:06:41,290 It's a tuple with three strings in it. 84 00:06:41,290 --> 00:06:46,130 I can call.join and pass that tuple of strings and I still 85 00:06:46,130 --> 00:06:51,455 get this lovely compact code and the same output. 86 00:06:51,455 --> 00:06:54,160 So, if you have a list of strings, 87 00:06:54,160 --> 00:06:58,140 then this.join method might be pretty attractive. 88 00:06:58,140 --> 00:07:00,660 If not and you're going to have to do any kind 89 00:07:00,660 --> 00:07:03,760 of converting integers to strings or things like that, 90 00:07:03,760 --> 00:07:09,760 and I think you're going to want the version on line 13 that uses the format string. 91 00:07:09,760 --> 00:07:14,780 A third possibility is to just use string concatenation. 92 00:07:14,780 --> 00:07:17,860 But it gets really hard to read it. 93 00:07:17,860 --> 00:07:21,740 It also is still going to require us to convert the number to a string. 94 00:07:21,740 --> 00:07:24,785 So, we're going to have something like this, 95 00:07:24,785 --> 00:07:28,850 and I think I'm pretty unlikely to get it right the first time, 96 00:07:28,850 --> 00:07:36,350 that I will have row string equals 97 00:07:36,350 --> 00:07:43,785 Olympian square bracket zero plus a comma plus 98 00:07:43,785 --> 00:07:54,760 Olympian square bracket one plus another comma plus Olympian square bracket two. 99 00:07:55,430 --> 00:07:58,780 That might work, let's see. 100 00:07:59,360 --> 00:08:04,050 Yeah, I got the right output again. 101 00:08:04,050 --> 00:08:07,005 So, little hard to read. 102 00:08:07,005 --> 00:08:12,154 My preference generally is to use the format string like on line 13, 103 00:08:12,154 --> 00:08:15,560 unless I really have a sequence of 104 00:08:15,560 --> 00:08:20,580 strings in which case I might be tempted to use the comma.join from line 12. 105 00:08:20,900 --> 00:08:24,485 Now, suppose we had slightly different data, 106 00:08:24,485 --> 00:08:28,670 where one of the event names now has a comma in it, 107 00:08:28,670 --> 00:08:31,690 but not all of them do. 108 00:08:31,690 --> 00:08:36,459 Here you can see that Cross Country Skiing, 109 00:08:36,950 --> 00:08:42,800 we have the 15 kilometer is specified rather than 110 00:08:42,800 --> 00:08:49,020 the a 100 kilometer and some of the other events don't have commas in them. 111 00:08:49,090 --> 00:08:52,070 You may remember that one of the ways we can handle 112 00:08:52,070 --> 00:08:54,890 this thing is with the advanced CSV format, 113 00:08:54,890 --> 00:08:59,640 where we put all of the values in quotes. 114 00:09:00,610 --> 00:09:07,490 So, this is one of the nice things about Python having both single quotes or 115 00:09:07,490 --> 00:09:11,360 double quotes as a way of delimiting a string is if we wanted to have 116 00:09:11,360 --> 00:09:15,455 double quotes as a character inside the string, 117 00:09:15,455 --> 00:09:20,365 we can use the single quotes as the delimiter as I've done here on line eight. 118 00:09:20,365 --> 00:09:22,245 With our format string, 119 00:09:22,245 --> 00:09:24,855 this is not too bad. 120 00:09:24,855 --> 00:09:30,250 We just have double quotes around each of the pairs 121 00:09:30,250 --> 00:09:34,720 of curly braces and we're still going to substitute in where the curly braces are. 122 00:09:34,720 --> 00:09:38,675 So, the value is going to be surrounded by the double quotes. 123 00:09:38,675 --> 00:09:42,520 This is a situation where I really appreciate the.format method. 124 00:09:42,520 --> 00:09:46,030 I wouldn't want to try this with.join and I 125 00:09:46,030 --> 00:09:50,525 definitely wouldn't want to try it using concatenation with the plus sign. 126 00:09:50,525 --> 00:09:56,115 You'll see that our outputs have those quotes around all of the values. 127 00:09:56,115 --> 00:09:58,780 In particular, we've got quotes around 128 00:09:58,780 --> 00:10:02,195 the whole Cross Country Skiing comma 15 kilometers. 129 00:10:02,195 --> 00:10:07,260 So, there's a comma that's inside one of the values inside the double quotes and 130 00:10:07,260 --> 00:10:12,985 the other commas are separating the different values. 131 00:10:12,985 --> 00:10:20,060 To summarize, the overall structure for writing a CSV file is to write the header line, 132 00:10:20,060 --> 00:10:22,975 that's what we did on lines 89, 133 00:10:22,975 --> 00:10:27,080 then iterate through all of your objects 134 00:10:27,080 --> 00:10:31,610 and for each of them you're going to write one line into the CSV file. 135 00:10:31,610 --> 00:10:33,635 So, I'm creating the row string, 136 00:10:33,635 --> 00:10:40,175 I'm writing it and then I'm tacking on the backslash in to indicate the new line. 137 00:10:40,175 --> 00:10:43,160 We'll see you next time.