1 00:00:07,500 --> 00:00:11,130 So lets introduce the architecture of Jupyter. 2 00:00:11,130 --> 00:00:17,410 So on the left hand side you have of course the user, which interacts with a browser and 3 00:00:17,410 --> 00:00:21,340 the central component in Jupyter is the notebook server. 4 00:00:21,340 --> 00:00:28,539 So the notebook server is loading and storing the jupyter notebook file and sends the html 5 00:00:28,539 --> 00:00:30,720 content to the browser. 6 00:00:30,720 --> 00:00:37,300 Once this html content in the browser is rendered it interacts with the notebook server using 7 00:00:37,300 --> 00:00:38,540 websockets. 8 00:00:38,540 --> 00:00:43,760 So its a split application the UI runs in the browser using javascript. 9 00:00:43,760 --> 00:00:49,520 And the notebook server is responsible for keeping state, and the second thing is, it 10 00:00:49,520 --> 00:00:52,510 is communicating with a so called kernel. 11 00:00:52,510 --> 00:01:01,510 So the kernel is a wrapper, which wraps the execution framework , for example the python 12 00:01:01,510 --> 00:01:02,510 interpreter. 13 00:01:02,510 --> 00:01:09,070 So one problem you might encounter if you are running heavy load in jupyter or jupyter 14 00:01:09,070 --> 00:01:15,930 lab this is all installed on a single machine, either on your laptop on a server machine. 15 00:01:15,930 --> 00:01:21,640 Especially for example, if you are using amazon sage maker, you have the jupyter notebooks. 16 00:01:21,640 --> 00:01:29,290 But, its all restricted to the machine, the EC2 instance you have chosen before. 17 00:01:29,290 --> 00:01:33,630 And another disadvantage is of course, if you are not using the machine you still pay 18 00:01:33,630 --> 00:01:35,170 for it. 19 00:01:35,170 --> 00:01:42,900 And if its not sufficient then you just either need to wait a long time, or its just impossible 20 00:01:42,900 --> 00:01:45,070 to run your workload. 21 00:01:45,070 --> 00:01:51,579 So you see that here illustrated, we have here a single machine and multipe users for 22 00:01:51,579 --> 00:01:58,149 example, are running multiple kernels and eventually the main memory is exhausted and 23 00:01:58,149 --> 00:02:00,450 it just crashes. 24 00:02:00,450 --> 00:02:02,460 So BOOM! 25 00:02:02,460 --> 00:02:11,180 And the solution to this is the jupyter enterprise gateway, which IBM created and which IBM also 26 00:02:11,180 --> 00:02:17,900 open sourced which also is now part of the official jupyter open source distribution. 27 00:02:17,900 --> 00:02:25,620 So the idea is here that, you have a proxy and behind the proxy there is a cluster of 28 00:02:25,620 --> 00:02:32,490 machines and that proxy decides where to put the individual kernels and the kernels are 29 00:02:32,490 --> 00:02:35,900 then running on those machines. 30 00:02:35,900 --> 00:02:45,310 To get a bit more into detail, its here I’m using a architecture diagram, of jupyter enterprise 31 00:02:45,310 --> 00:02:51,910 gateway . So you have here, the users on your left hand side as usual there are jupyter 32 00:02:51,910 --> 00:02:54,129 notebook instances runnign in the browser. 33 00:02:54,129 --> 00:03:02,550 So that’s a javascript application and it communicates to the server using websockets, 34 00:03:02,550 --> 00:03:09,110 then you have jupyter lab and then, the jupyter enterprise gateway kicks in. 35 00:03:09,110 --> 00:03:18,220 So in contrast jupyter lab talking to a kernel it talks to the jupyter enterprise gateway, 36 00:03:18,220 --> 00:03:19,740 which acts a proxy. 37 00:03:19,740 --> 00:03:28,880 So it basically acts like a kernel but in reality it forwards all the calls to a remote 38 00:03:28,880 --> 00:03:35,370 kernel, and the remote kernel is running inside a Kubernetes POD. 39 00:03:35,370 --> 00:03:45,090 So its basically a container and that container by Kubernetes can be scheduled on any cluster 40 00:03:45,090 --> 00:03:46,150 member. 41 00:03:46,150 --> 00:03:53,500 And the cool thing here is that you can now just start as many kernels as you like, and 42 00:03:53,500 --> 00:03:58,530 Kubernetes takes care of distributing them on the cluster. 43 00:03:58,530 --> 00:04:05,069 So of course I don’t want to pitch watson studio here, but you can see here that in 44 00:04:05,069 --> 00:04:13,110 the background, while this notebook has been loaded kernel has been initiated on a remote 45 00:04:13,110 --> 00:04:16,030 Kubernetes cluster member. 46 00:04:16,030 --> 00:04:21,169 And whatever you execute here now goes through the jupyter enterprise gateway. 47 00:04:21,169 --> 00:04:27,830 I’m just showing watson studio here because I don’t have open source installation available 48 00:04:27,830 --> 00:04:30,820 which uses the jupyter enterprise gateway. 49 00:04:30,820 --> 00:04:38,560 And if you now for example go through this project overview, then watson studio asks 50 00:04:38,560 --> 00:04:46,699 the jupyter enterprise gateway about the status of the different containers, which are encapsulating 51 00:04:46,699 --> 00:04:47,699 the kernels. 52 00:04:47,699 --> 00:04:51,330 And you see here, I have one active environment. 53 00:04:51,330 --> 00:04:56,160 This is this one here, you see here the hardware configuration and this is actually running 54 00:04:56,160 --> 00:05:00,479 somewhere in the IBM container cloud on Kubernetes. 55 00:05:00,479 --> 00:05:08,850 That’s pretty cool and just a final thing, I want to show you here is you can specify 56 00:05:08,850 --> 00:05:14,870 environments and you can attach environments to notebooks. 57 00:05:14,870 --> 00:05:22,210 So you can basically choose on how many CPUs and how much RAM a particular notebook runs. 58 00:05:22,210 --> 00:05:29,850 That’s particularly handy for example, if you have ETL task for data integration which 59 00:05:29,850 --> 00:05:32,670 uses spark, you just use a spark notebook. 60 00:05:32,670 --> 00:05:36,789 And if it’s a heavy task you can increase the number of executors. 61 00:05:36,789 --> 00:05:45,710 Or you have for example, tensorflow notebook which then takes this data and does some training. 62 00:05:45,710 --> 00:05:50,060 Then of course you can also increase the number of CPUs and main memory. 63 00:05:50,060 --> 00:05:57,110 That’s basically all and I hope you have learned something about jupyter, jupyter notebooks 64 00:05:57,110 --> 00:06:04,069 architecture and if you have questions please post them down in the comments section or 65 00:06:04,069 --> 00:06:05,700 use the coursera discussion forum. 66 00:06:05,700 --> 00:06:05,712 Thanks a lot bye!