1 00:00:06,500 --> 00:00:10,365 So now let's have a look at RStudio. 2 00:00:10,365 --> 00:00:14,994 RStudio is IDE, Integrated Development Environment, 3 00:00:14,994 --> 00:00:20,450 and it's made for the programming language R. R is 4 00:00:20,450 --> 00:00:23,510 a statistical programming language which has been 5 00:00:23,510 --> 00:00:27,440 derived from the closed source S language. 6 00:00:27,440 --> 00:00:29,180 So R is open source, 7 00:00:29,180 --> 00:00:30,980 RStudio is open source, 8 00:00:30,980 --> 00:00:33,200 and it's completely for free. 9 00:00:33,200 --> 00:00:38,630 The central data structure in R is a DataFrame. 10 00:00:38,630 --> 00:00:41,900 So we have here a window which contains 11 00:00:41,900 --> 00:00:44,360 an editor and below we 12 00:00:44,360 --> 00:00:47,750 have here a window which contains the console. 13 00:00:47,750 --> 00:00:51,320 The R interpreter is the interactive interpreter. 14 00:00:51,320 --> 00:00:54,545 So you have access to the interpreter here at anytime, 15 00:00:54,545 --> 00:00:57,630 but it's handy if you have a text editor here. 16 00:00:57,630 --> 00:00:59,330 You can always execute 17 00:00:59,330 --> 00:01:01,850 the code you're writing here in this text editor. 18 00:01:01,850 --> 00:01:03,230 On the top right, 19 00:01:03,230 --> 00:01:04,900 you have a narrow window 20 00:01:04,900 --> 00:01:07,160 where your environment is displayed, 21 00:01:07,160 --> 00:01:09,710 so every variable on 22 00:01:09,710 --> 00:01:13,450 the heap is accessible and you can also inspect it. 23 00:01:13,450 --> 00:01:16,025 If you plot graphs, 24 00:01:16,025 --> 00:01:17,670 they are ending up here. 25 00:01:17,670 --> 00:01:21,475 So let's start with a DataFrame. 26 00:01:21,475 --> 00:01:24,350 So in order to find out where we are, 27 00:01:24,350 --> 00:01:28,320 we say, "Getwd," so get working directory. 28 00:01:28,320 --> 00:01:31,725 So we are in the "Home" folder. That's pretty cool. 29 00:01:31,725 --> 00:01:34,845 So now we say, DataFrame equals. 30 00:01:34,845 --> 00:01:41,340 You might have seen this symbol here, so that's legacy. 31 00:01:41,340 --> 00:01:44,160 There's only a slight difference in function codes, 32 00:01:44,160 --> 00:01:45,740 so don't worry about it 33 00:01:45,740 --> 00:01:48,025 unless you're using it in function codes. 34 00:01:48,025 --> 00:01:50,840 So you can use the normal ordinary assignment operator, 35 00:01:50,840 --> 00:01:54,830 which makes our looking a bit less ugly. 36 00:01:54,830 --> 00:02:01,560 So we say now read.csv and now we can say, 37 00:02:01,560 --> 00:02:04,815 "Go to my Downloads folder". 38 00:02:04,815 --> 00:02:08,940 In here we see all the possible files. 39 00:02:08,940 --> 00:02:11,550 So I know it started with CU, 40 00:02:11,550 --> 00:02:13,605 so it's customer messages. 41 00:02:13,605 --> 00:02:18,740 We hit Enter, we can execute this line. 42 00:02:18,740 --> 00:02:20,645 Then this line gets copied 43 00:02:20,645 --> 00:02:23,500 to the console and gets executed. 44 00:02:23,500 --> 00:02:26,780 At the same time, you see here DataFrame is 45 00:02:26,780 --> 00:02:31,420 basically containing the data from that CSV, 46 00:02:31,420 --> 00:02:33,410 and we can now have a look at 47 00:02:33,410 --> 00:02:36,080 the contents of this DataFrame. 48 00:02:36,080 --> 00:02:39,800 If we say, "View(df)", you see 49 00:02:39,800 --> 00:02:44,465 here it's already shown inside the editor window. 50 00:02:44,465 --> 00:02:46,490 What we notice is that 51 00:02:46,490 --> 00:02:49,790 the first line has been interpreted as header. 52 00:02:49,790 --> 00:02:51,905 That's something we don't want. 53 00:02:51,905 --> 00:02:59,040 So we can say, "Header FALSE". 54 00:02:59,040 --> 00:03:02,070 It's very handy that you have autocompletion here as 55 00:03:02,070 --> 00:03:05,220 you always know what parameters a function has. 56 00:03:05,220 --> 00:03:07,130 If you now have a look at DataFrame again, 57 00:03:07,130 --> 00:03:09,440 you see here the first line is 58 00:03:09,440 --> 00:03:12,475 now part of the data and not the header. 59 00:03:12,475 --> 00:03:15,980 So that's how you can load data. 60 00:03:15,980 --> 00:03:17,060 You can, of course, 61 00:03:17,060 --> 00:03:19,970 also load data from remote database systems. 62 00:03:19,970 --> 00:03:22,820 You can use ODBC for that, 63 00:03:22,820 --> 00:03:25,525 but that's beyond the scope of this tutorial. 64 00:03:25,525 --> 00:03:29,405 What I want to show you is another way of importing data, 65 00:03:29,405 --> 00:03:31,834 you can say here, "Import from CSV" 66 00:03:31,834 --> 00:03:34,955 and then the code is actually created for you. 67 00:03:34,955 --> 00:03:37,480 So let's open the file again. 68 00:03:37,480 --> 00:03:39,065 It's basically doing the same, 69 00:03:39,065 --> 00:03:41,870 just that you have an idea. 70 00:03:41,870 --> 00:03:45,095 You have a data preview with the same problem 71 00:03:45,095 --> 00:03:48,755 that you have the first line interpreted as header. 72 00:03:48,755 --> 00:03:50,990 So you can actually uncheck 73 00:03:50,990 --> 00:03:53,780 this one here and then it's as it should be, 74 00:03:53,780 --> 00:03:55,990 and you say, "Import". 75 00:03:55,990 --> 00:03:59,990 You notice here you have a second object 76 00:03:59,990 --> 00:04:03,680 here on the heap which is called customer messages. 77 00:04:03,680 --> 00:04:07,650 It's also DataFrame and it's also shown here. 78 00:04:09,230 --> 00:04:12,630 That concludes the first video. 79 00:04:12,630 --> 00:04:13,715 In the next video, 80 00:04:13,715 --> 00:04:17,590 I will show you how to work with libraries, 81 00:04:17,590 --> 00:04:22,920 and then after that I will show you how to create plots.