1
00:00:07,500 --> 00:00:11,130
So lets introduce the architecture of Jupyter.

2
00:00:11,130 --> 00:00:17,410
So on the left hand side you have of course
the user, which interacts with a browser and

3
00:00:17,410 --> 00:00:21,340
the central component in Jupyter is the notebook
server.

4
00:00:21,340 --> 00:00:28,539
So the notebook server is loading and storing
the jupyter notebook file and sends the html

5
00:00:28,539 --> 00:00:30,720
content to the browser.

6
00:00:30,720 --> 00:00:37,300
Once this html content in the browser is rendered
it interacts with the notebook server using

7
00:00:37,300 --> 00:00:38,540
websockets.

8
00:00:38,540 --> 00:00:43,760
So its a split application the UI runs in
the browser using javascript.

9
00:00:43,760 --> 00:00:49,520
And the notebook server is responsible for
keeping state, and the second thing is, it

10
00:00:49,520 --> 00:00:52,510
is communicating with a so called kernel.

11
00:00:52,510 --> 00:01:01,510
So the kernel is a wrapper, which wraps the
execution framework , for example the python

12
00:01:01,510 --> 00:01:02,510
interpreter.

13
00:01:02,510 --> 00:01:09,070
So one problem you might encounter if you
are running heavy load in jupyter or jupyter

14
00:01:09,070 --> 00:01:15,930
lab this is all installed on a single machine,
either on your laptop on a server machine.

15
00:01:15,930 --> 00:01:21,640
Especially for example, if you are using amazon
sage maker, you have the jupyter notebooks.

16
00:01:21,640 --> 00:01:29,290
But, its all restricted to the machine, the
EC2 instance you have chosen before.

17
00:01:29,290 --> 00:01:33,630
And another disadvantage is of course, if
you are not using the machine you still pay

18
00:01:33,630 --> 00:01:35,170
for it.

19
00:01:35,170 --> 00:01:42,900
And if its not sufficient then you just either
need to wait a long time, or its just impossible

20
00:01:42,900 --> 00:01:45,070
to run your workload.

21
00:01:45,070 --> 00:01:51,579
So you see that here illustrated, we have
here a single machine and multipe users for

22
00:01:51,579 --> 00:01:58,149
example, are running multiple kernels and
eventually the main memory is exhausted and

23
00:01:58,149 --> 00:02:00,450
it just crashes.

24
00:02:00,450 --> 00:02:02,460
So BOOM!

25
00:02:02,460 --> 00:02:11,180
And the solution to this is the jupyter enterprise
gateway, which IBM created and which IBM also

26
00:02:11,180 --> 00:02:17,900
open sourced which also is now part of the
official jupyter open source distribution.

27
00:02:17,900 --> 00:02:25,620
So the idea is here that, you have a proxy
and behind the proxy there is a cluster of

28
00:02:25,620 --> 00:02:32,490
machines and that proxy decides where to put
the individual kernels and the kernels are

29
00:02:32,490 --> 00:02:35,900
then running on those machines.

30
00:02:35,900 --> 00:02:45,310
To get a bit more into detail, its here I’m
using a architecture diagram, of jupyter enterprise

31
00:02:45,310 --> 00:02:51,910
gateway . So you have here, the users on your
left hand side as usual there are jupyter

32
00:02:51,910 --> 00:02:54,129
notebook instances runnign in the browser.

33
00:02:54,129 --> 00:03:02,550
So that’s a javascript application and it
communicates to the server using websockets,

34
00:03:02,550 --> 00:03:09,110
then you have jupyter lab and then, the jupyter
enterprise gateway kicks in.

35
00:03:09,110 --> 00:03:18,220
So in contrast jupyter lab talking to a kernel
it talks to the jupyter enterprise gateway,

36
00:03:18,220 --> 00:03:19,740
which acts a proxy.

37
00:03:19,740 --> 00:03:28,880
So it basically acts like a kernel but in
reality it forwards all the calls to a remote

38
00:03:28,880 --> 00:03:35,370
kernel, and the remote kernel is running inside
a Kubernetes POD.

39
00:03:35,370 --> 00:03:45,090
So its basically a container and that container
by Kubernetes can be scheduled on any cluster

40
00:03:45,090 --> 00:03:46,150
member.

41
00:03:46,150 --> 00:03:53,500
And the cool thing here is that you can now
just start as many kernels as you like, and

42
00:03:53,500 --> 00:03:58,530
Kubernetes takes care of distributing them
on the cluster.

43
00:03:58,530 --> 00:04:05,069
So of course I don’t want to pitch watson
studio here, but you can see here that in

44
00:04:05,069 --> 00:04:13,110
the background, while this notebook has been
loaded kernel has been initiated on a remote

45
00:04:13,110 --> 00:04:16,030
Kubernetes cluster member.

46
00:04:16,030 --> 00:04:21,169
And whatever you execute here now goes through
the jupyter enterprise gateway.

47
00:04:21,169 --> 00:04:27,830
I’m just showing watson studio here because
I don’t have open source installation available

48
00:04:27,830 --> 00:04:30,820
which uses the jupyter enterprise gateway.

49
00:04:30,820 --> 00:04:38,560
And if you now for example go through this
project overview, then watson studio asks

50
00:04:38,560 --> 00:04:46,699
the jupyter enterprise gateway about the status
of the different containers, which are encapsulating

51
00:04:46,699 --> 00:04:47,699
the kernels.

52
00:04:47,699 --> 00:04:51,330
And you see here, I have one active environment.

53
00:04:51,330 --> 00:04:56,160
This is this one here, you see here the hardware
configuration and this is actually running

54
00:04:56,160 --> 00:05:00,479
somewhere in the IBM container cloud on Kubernetes.

55
00:05:00,479 --> 00:05:08,850
That’s pretty cool and just a final thing,
I want to show you here is you can specify

56
00:05:08,850 --> 00:05:14,870
environments and you can attach environments
to notebooks.

57
00:05:14,870 --> 00:05:22,210
So you can basically choose on how many CPUs
and how much RAM a particular notebook runs.

58
00:05:22,210 --> 00:05:29,850
That’s particularly handy for example, if
you have ETL task for data integration which

59
00:05:29,850 --> 00:05:32,670
uses spark, you just use a spark notebook.

60
00:05:32,670 --> 00:05:36,789
And if it’s a heavy task you can increase
the number of executors.

61
00:05:36,789 --> 00:05:45,710
Or you have for example, tensorflow notebook
which then takes this data and does some training.

62
00:05:45,710 --> 00:05:50,060
Then of course you can also increase the number
of CPUs and main memory.

63
00:05:50,060 --> 00:05:57,110
That’s basically all and I hope you have
learned something about jupyter, jupyter notebooks

64
00:05:57,110 --> 00:06:04,069
architecture and if you have questions please
post them down in the comments section or

65
00:06:04,069 --> 00:06:05,700
use the coursera discussion forum.

66
00:06:05,700 --> 00:06:05,712
Thanks a lot bye!