1 00:00:00,000 --> 00:00:05,585 Hello, in this video you will learn how to access databases using Python. 2 00:00:05,585 --> 00:00:09,110 Databases are powerful tools for data scientists. 3 00:00:09,110 --> 00:00:11,030 After completing this module, 4 00:00:11,030 --> 00:00:13,310 you'll be able to explain the basic concepts 5 00:00:13,310 --> 00:00:16,385 related to using Python to connect to databases. 6 00:00:16,385 --> 00:00:18,215 Then you'll create tables, 7 00:00:18,215 --> 00:00:22,235 load data and query data using SQL from Jupyter Notebooks, 8 00:00:22,235 --> 00:00:24,620 and finally, analyze the data. 9 00:00:24,620 --> 00:00:26,210 In the lab assignments, 10 00:00:26,210 --> 00:00:28,690 you will learn how to create an instance in the Cloud, 11 00:00:28,690 --> 00:00:30,215 connect to a database, 12 00:00:30,215 --> 00:00:36,070 query data from the database using SQL, and analyze the data using Python. 13 00:00:36,070 --> 00:00:39,400 You will be able to explain the basic concepts related 14 00:00:39,400 --> 00:00:42,570 to connecting a Python application to a database. 15 00:00:42,570 --> 00:00:46,070 Describe SQL APIs as well as list some of 16 00:00:46,070 --> 00:00:52,030 the proprietary APIs used by popular SQL-based DBMS systems. 17 00:00:52,030 --> 00:00:55,715 Let's quickly review some of the benefits of using Python, 18 00:00:55,715 --> 00:00:59,445 a popular scripting language for connecting to databases. 19 00:00:59,445 --> 00:01:05,470 The Python ecosystem is very rich and provides easy to use tools for data science. 20 00:01:05,470 --> 00:01:08,385 Some of the most popular packages are NumPy, 21 00:01:08,385 --> 00:01:12,040 pandas, matplotlib, and SciPy. 22 00:01:12,040 --> 00:01:15,695 Python is easy to learn and has a simple syntax. 23 00:01:15,695 --> 00:01:17,710 Due to its open source nature, 24 00:01:17,710 --> 00:01:20,615 Python has been ported to many platforms. 25 00:01:20,615 --> 00:01:23,580 All your python programs can work on any of 26 00:01:23,580 --> 00:01:27,715 these platforms without requiring any changes at all. 27 00:01:27,715 --> 00:01:31,780 If you are careful and avoid any system dependent features, 28 00:01:31,780 --> 00:01:35,390 Python supports relational database systems. 29 00:01:35,390 --> 00:01:39,090 Writing Python code to access databases is made 30 00:01:39,090 --> 00:01:42,570 easier by the presence of the Python database API. 31 00:01:42,570 --> 00:01:46,225 Commonly referred to as the DB API, 32 00:01:46,225 --> 00:01:50,820 and detailed documentation related to Python is easily available. 33 00:01:50,820 --> 00:01:56,100 Notebooks are also very popular in the field of data science because they run 34 00:01:56,100 --> 00:02:01,025 in an environment that allows creation and sharing of documents that contain live code, 35 00:02:01,025 --> 00:02:05,810 equations, visualizations, and explanatory texts. 36 00:02:05,810 --> 00:02:10,835 A notebook interface is a virtual notebook environment used for programming. 37 00:02:10,835 --> 00:02:15,065 Examples of notebook interfaces include the Mathematica notebook, 38 00:02:15,065 --> 00:02:17,920 Maple worksheet, Matlab notebook, 39 00:02:17,920 --> 00:02:20,900 IPython Jupyter, R Markdown, 40 00:02:20,900 --> 00:02:26,435 Apache Zeppelin, Apache Spark notebook, and the Databricks cloud. 41 00:02:26,435 --> 00:02:30,335 In this module, we will be using Jupyter notebooks. 42 00:02:30,335 --> 00:02:33,510 The Jupyter notebook is an open source web application that 43 00:02:33,510 --> 00:02:36,815 allows you to create and share documents that contain live code, 44 00:02:36,815 --> 00:02:41,005 equations, visualizations, and narrative texts. 45 00:02:41,005 --> 00:02:44,570 Here are some of the advantages of using Jupyter notebooks. 46 00:02:44,570 --> 00:02:49,035 Notebook support for over 40 programming languages including Python, 47 00:02:49,035 --> 00:02:51,415 R, Julia, and Scala. 48 00:02:51,415 --> 00:02:54,170 Notebooks can be shared with others by email, 49 00:02:54,170 --> 00:02:57,865 Dropbox, GitHub, and the Jupyter notebook viewer. 50 00:02:57,865 --> 00:03:01,925 Your code can produce rich interactive output HTML, 51 00:03:01,925 --> 00:03:06,465 images, videos, LaTex, and customized types. 52 00:03:06,465 --> 00:03:11,745 You can leverage big data tools such as Apache Spark from Python, R, 53 00:03:11,745 --> 00:03:15,230 and Scala, and explore that same data with pandas, 54 00:03:15,230 --> 00:03:19,310 scikit-learn, ggplot2, and TensorFlow. 55 00:03:19,310 --> 00:03:21,760 This is how a typical user accesses 56 00:03:21,760 --> 00:03:25,340 databases using Python code written on a Jupyter notebook, 57 00:03:25,340 --> 00:03:27,045 a web based editor. 58 00:03:27,045 --> 00:03:32,230 There is a mechanism by which the Python program communicates with the DBMS. 59 00:03:32,230 --> 00:03:36,360 The Python code connects to the database using API calls. 60 00:03:36,360 --> 00:03:41,705 We will explain the basics of SQL APIs and Python DB APIs. 61 00:03:41,705 --> 00:03:44,740 An application programming interface is a set of 62 00:03:44,740 --> 00:03:48,355 functions that you can call to get access to some type of service. 63 00:03:48,355 --> 00:03:54,325 The SQL API consists of library function calls as an application programming interface, 64 00:03:54,325 --> 00:03:57,105 API, for the DBMS. 65 00:03:57,105 --> 00:04:00,290 To pass SQL statements to the DBMS, 66 00:04:00,290 --> 00:04:04,595 an application program calls functions in the API, 67 00:04:04,595 --> 00:04:06,935 and it calls other functions to retrieve 68 00:04:06,935 --> 00:04:11,100 query results and status information from the DBMS. 69 00:04:11,100 --> 00:04:16,690 The basic operation of a typical SQL API is illustrated in the figure. 70 00:04:16,690 --> 00:04:20,620 The application program begins its database access with one or 71 00:04:20,620 --> 00:04:24,730 more API calls that connect the program to the DBMS. 72 00:04:24,730 --> 00:04:28,000 To send the SQL statement to the DBMS, 73 00:04:28,000 --> 00:04:31,770 the program builds the statement as a text string in a buffer and then 74 00:04:31,770 --> 00:04:36,420 makes an API call to pass the buffer contents to the DBMS. 75 00:04:36,420 --> 00:04:39,920 The application program makes API calls to check 76 00:04:39,920 --> 00:04:44,275 the status of its DBMS request and to handle errors. 77 00:04:44,275 --> 00:04:47,850 The application program ends its database access 78 00:04:47,850 --> 00:04:51,825 with an API call that disconnects it from the database. 79 00:04:51,825 --> 00:04:54,510 Now, lets learn basic concepts about some of 80 00:04:54,510 --> 00:04:59,675 the proprietary APIs used by popular SQL-based DBMS systems. 81 00:04:59,675 --> 00:05:02,705 Each database system has its own library. 82 00:05:02,705 --> 00:05:09,640 As you can see, the table shows a list of a few applications and corresponding SQL APIs. 83 00:05:09,640 --> 00:05:14,390 MySQL C API provides low level access to 84 00:05:14,390 --> 00:05:21,100 the MySQL client server protocol and enables C programs to access database contents. 85 00:05:21,100 --> 00:05:28,135 The psycopg2 API connects Python applications in PostgreSQL databases. 86 00:05:28,135 --> 00:05:36,085 The IBM_DB API is used to connect Python applications to IBM DB2 databases. 87 00:05:36,085 --> 00:05:40,840 The dblib API is used to connect to SQL server databases. 88 00:05:40,840 --> 00:05:46,460 ODBC is used for database access for Microsoft Windows OS. 89 00:05:46,460 --> 00:05:49,820 OCI is used by Oracle databases. 90 00:05:49,820 --> 00:05:54,525 And finally, JDBC is used by Java applications. 91 00:05:54,525 --> 00:05:57,170 Thanks for watching this video.