1 00:00:07,580 --> 00:00:11,660 So far, we’ve reviewed Python, R, and SQL. 2 00:00:11,660 --> 00:00:17,780 In this video, we will review some other languages that have compelling use cases for data science. 3 00:00:17,780 --> 00:00:24,690 Ok, so indisputably, Python, R, and SQL are the three most popular languages that data 4 00:00:24,690 --> 00:00:26,480 scientists use. 5 00:00:26,480 --> 00:00:30,960 But, there are many, many other languages that are worth your time when considering 6 00:00:30,960 --> 00:00:35,460 which language to use to solve a particular data science problem. 7 00:00:35,460 --> 00:00:42,870 Scala, Java, C++, and Julia are probably the most traditional data science languages on 8 00:00:42,870 --> 00:00:44,330 this slide. 9 00:00:44,330 --> 00:00:51,180 But JavaScript, PHP, Go, Ruby, Visual Basic, and others have all found their place in the 10 00:00:51,180 --> 00:00:53,070 data science community as well! 11 00:00:53,070 --> 00:01:00,890 I won’t dive as deeply into each of these languages, but I'll mention some notable highlights. 12 00:01:00,890 --> 00:01:07,330 Java is a tried-and-true general-purpose object oriented programming language. 13 00:01:07,330 --> 00:01:13,980 It's been widely adopted in the enterprise space and is designed to be fast and scalable. 14 00:01:13,980 --> 00:01:21,080 Java applications are compiled to bytecode and run on the Java Virtual Machine, or "JVM." 15 00:01:21,080 --> 00:01:27,690 Some notable data science tools built with Java include Weka, for data mining; Java-ML, 16 00:01:27,690 --> 00:01:34,360 which is a machine learning library; Apache MLlib, which makes machine learning scalable; 17 00:01:34,360 --> 00:01:39,840 and Deeplearning4j, for deep learning. 18 00:01:39,840 --> 00:01:43,650 Apache Hadoop is another Java-built application. 19 00:01:43,650 --> 00:01:49,270 It manages data processing and storage for big data applications running in clustered 20 00:01:49,270 --> 00:01:53,100 systems. 21 00:01:53,100 --> 00:01:58,520 Scala is a general-purpose programming language that provides support for functional programming 22 00:01:58,520 --> 00:02:01,909 and a strong static type system. 23 00:02:01,909 --> 00:02:06,640 Many of the design decisions in the construction of the Scala language were made to address 24 00:02:06,640 --> 00:02:09,709 criticisms of Java. 25 00:02:09,709 --> 00:02:13,870 Scala is also interoperable with Java, as it runs on the JVM. 26 00:02:13,870 --> 00:02:19,340 The name "Scala" is a combination of "scalable" and "language." 27 00:02:19,340 --> 00:02:24,880 This language is designed to grow along with the demands of its users. 28 00:02:24,880 --> 00:02:30,620 For data science, the most popular program built using Scala is Apache Spark. 29 00:02:30,620 --> 00:02:34,930 Spark is a fast and general-purpose cluster computing system. 30 00:02:34,930 --> 00:02:41,370 It provides APIs that make parallel jobs easy to write, and an optimized engine that supports 31 00:02:41,370 --> 00:02:44,990 general computation graphs. 32 00:02:44,990 --> 00:02:52,660 Spark includes Shark, which is a query engine; MLlib, for machine learning; GraphX, for graph 33 00:02:52,660 --> 00:02:55,340 processing; and Spark Streaming. 34 00:02:55,340 --> 00:03:00,020 Apache Spark was designed to be faster than Hadoop. 35 00:03:00,020 --> 00:03:04,460 C++ is a general-purpose programming language. 36 00:03:04,460 --> 00:03:09,480 It is an extension of the C programming language, or "C with Classes.” 37 00:03:09,480 --> 00:03:16,239 C++ improves processing speed, enables system programming, and provides broader control 38 00:03:16,239 --> 00:03:19,350 over the software application. 39 00:03:19,350 --> 00:03:24,520 Many organizations that use Python or other high-level languages for data analysis and 40 00:03:24,520 --> 00:03:31,700 exploratory tasks still rely on C++ to develop programs that feed that data to customers 41 00:03:31,700 --> 00:03:34,629 in real-time. 42 00:03:34,629 --> 00:03:40,220 For data science, a popular deep learning library for dataflow called TensorFlow was 43 00:03:40,220 --> 00:03:42,400 built with C++. 44 00:03:42,400 --> 00:03:48,970 But while C++ is the foundation of TensorFlow, it runs on a Python interface, so you don’t 45 00:03:48,970 --> 00:03:52,050 need to know C++ to use it. 46 00:03:52,050 --> 00:03:58,720 MongoDB, a NoSQL database for big data management, was built with C++. 47 00:03:58,720 --> 00:04:07,080 Caffe is a deep learning algorithm repository built with C++, with Python and MATLAB bindings. 48 00:04:07,080 --> 00:04:12,420 A core technology for the World Wide Web, JavaScript is a general-purpose language that 49 00:04:12,420 --> 00:04:20,030 extended beyond the browser with the creation of Node.js and other server-side approaches. 50 00:04:20,030 --> 00:04:23,990 Javascript is NOT related to the Java language. 51 00:04:23,990 --> 00:04:31,360 For data science, the most popular implementation is undoubtedly TensorFlow.js. 52 00:04:31,360 --> 00:04:37,089 TensorFlow.js makes machine learning and deep learning possible in Node.js as well as in 53 00:04:37,089 --> 00:04:39,110 the browser. 54 00:04:39,110 --> 00:04:48,970 TensorFlow.js was also adopted by other open source libraries, including brain.js and machinelearn.js. 55 00:04:48,970 --> 00:04:55,439 The R-js project is another great implementation of JavaScript for data science. 56 00:04:55,439 --> 00:05:03,020 R-js has re-written linear algebra specifications from the R Language into Typescript. 57 00:05:03,020 --> 00:05:07,740 This re-write will provide a foundation for other projects to implement more powerful 58 00:05:07,740 --> 00:05:13,270 math base frameworks like Numpy and SciPy of Python. 59 00:05:13,270 --> 00:05:19,009 Typescript is a superset of JavaScript. 60 00:05:19,009 --> 00:05:25,830 Julia was designed at MIT for high-performance numerical analysis and computational science. 61 00:05:25,830 --> 00:05:31,169 It provides speedy development like Python or R, while producing programs that run as 62 00:05:31,169 --> 00:05:35,379 fast as C or Fortran programs. 63 00:05:35,379 --> 00:05:41,870 Julia is compiled, which means that the code is executed directly on the processor as executable 64 00:05:41,870 --> 00:05:52,090 code; it calls C, Go, Java, MATLAB, R, Fortran, and Python libraries; and has refined parallelism. 65 00:05:52,090 --> 00:05:57,869 The Julia language is relatively new, having been written in 2012, but it has a lot of 66 00:05:57,869 --> 00:06:01,949 promise for future impact on the data science industry. 67 00:06:01,949 --> 00:06:08,229 JuliaDB is a particularly useful application of Julia for data science. 68 00:06:08,229 --> 00:06:13,020 It's a package for working with large persistent data sets. 69 00:06:13,020 --> 00:06:17,870 That's as far as we’ll dig into the many languages that are used to solve data science 70 00:06:17,870 --> 00:06:18,870 problems. 71 00:06:18,870 --> 00:06:24,169 If you have experience with a particular language, I recommend you do a web search to see what 72 00:06:24,169 --> 00:06:28,509 might already be possible in terms of using it for data science. 73 00:06:28,509 --> 00:06:30,740 You might be surprised at the possibilities!