1
00:00:00,540 --> 00:00:08,130
Hello and welcome to this lecture, in this lecture, we discuss about Closter upgrade process and Coronets.

2
00:00:09,360 --> 00:00:15,630
In the previous lecture we saw how Copernicus manages its software releases and how different components

3
00:00:15,630 --> 00:00:16,590
have their versions.

4
00:00:17,550 --> 00:00:24,690
We will keep dependency on external components like SCD and courteousness aside for now and focus on

5
00:00:24,690 --> 00:00:26,880
the core control plane components.

6
00:00:28,290 --> 00:00:31,960
Is it mandatory for all of these to have the same version?

7
00:00:32,970 --> 00:00:39,960
No, the components can be at different release versions since the Cube API server is the primary component

8
00:00:39,960 --> 00:00:44,430
in the control plane and that is the component that all other components talk to.

9
00:00:44,790 --> 00:00:50,700
None of the other components should ever be at a version higher than the Cube API server.

10
00:00:51,960 --> 00:00:59,670
The controller manager and scheduler can be at one version lower, so Ifukube API server was at X controller

11
00:00:59,670 --> 00:01:07,530
managers and Kube schedulers can be at X minus one and the Kubelik and Q proxy components can be at

12
00:01:07,530 --> 00:01:09,870
two versions, lower X minus two.

13
00:01:10,750 --> 00:01:17,760
So if Cube API server was at one point, then the controller manager and scheduler could be at one or

14
00:01:17,780 --> 00:01:23,510
10 or one point nine and the cube lid and Q proxy could be at one point eight.

15
00:01:24,780 --> 00:01:29,990
None of them could be at a worse and higher than the Cube API server 1.0 11.

16
00:01:30,960 --> 00:01:33,390
Now this is not the case with cube control.

17
00:01:33,960 --> 00:01:41,730
The cube control utility could be at one or 11 percent higher than the API server 1.0 10, the same

18
00:01:41,730 --> 00:01:47,760
version as the API server or at one point nine version lower than the API server.

19
00:01:48,360 --> 00:01:54,330
Now, this permissable skew in versions allows us to carry out live upgrades.

20
00:01:54,930 --> 00:01:58,080
We can upgrade component by component if required.

21
00:01:58,530 --> 00:02:00,090
So when should you upgrade?

22
00:02:01,680 --> 00:02:10,640
So you were at one point ten and coordinators releases versions one 11 and 12 at any time Cuban and

23
00:02:10,660 --> 00:02:15,120
his support only up to the recent three minor versions.

24
00:02:15,610 --> 00:02:22,110
So with one or 12 being the latest release, coroneted supports versions one, not 12, one not eleven

25
00:02:22,290 --> 00:02:23,340
and one, not 10.

26
00:02:23,790 --> 00:02:34,380
So when one or 13 is released, only versions one 13, one 12 and 11 are supported before the release

27
00:02:34,380 --> 00:02:39,480
of one 13 would be a good time to upgrade your cluster to the next release.

28
00:02:40,260 --> 00:02:41,520
So how do we upgrade?

29
00:02:41,880 --> 00:02:45,020
Do we upgrade directly from 10 to 13?

30
00:02:45,540 --> 00:02:53,820
Know the recommended approach is to upgrade one minor version at a time version 1.0 10 to one at then

31
00:02:53,820 --> 00:02:57,930
one 11 to 12 and then load 12 to 13.

32
00:02:58,890 --> 00:03:02,400
The upgrade process depends on how your cluster is set up.

33
00:03:02,760 --> 00:03:08,670
For example, if your cluster is a managed carbonaceous cluster deployed on cloud service providers

34
00:03:08,670 --> 00:03:14,700
like Google, for instance, Google coordinates engine lets you upgrade your cluster easily with just

35
00:03:14,700 --> 00:03:15,420
a few clicks.

36
00:03:16,170 --> 00:03:21,930
If you deployed your cluster using tools like Badme, then the tool can help you plan and upgrade the

37
00:03:21,930 --> 00:03:22,470
cluster.

38
00:03:23,100 --> 00:03:28,680
If you deployed your cluster from scratch, then you manually upgrade the different components of the

39
00:03:28,680 --> 00:03:29,850
cluster yourself.

40
00:03:30,240 --> 00:03:34,200
In this lecture, we will look at the options by Cube SDM.

41
00:03:35,070 --> 00:03:41,430
So you have a cluster with master and worker nodes running in production, hosting pods, serving users.

42
00:03:42,000 --> 00:03:44,790
The nodes and components are worse than one that ten.

43
00:03:45,360 --> 00:03:48,570
Upgrading a cluster involves two major steps.

44
00:03:48,930 --> 00:03:55,950
First, you upgrade your master nodes and then upgrade the worker nodes while the master is being upgraded.

45
00:03:56,280 --> 00:03:59,070
The control plane components such as the API server.

46
00:04:00,310 --> 00:04:07,840
Scheduler and controller managers go down briefly, the master going down does not mean work or nodes

47
00:04:07,840 --> 00:04:10,060
and applications on the cluster are impacted.

48
00:04:10,630 --> 00:04:15,160
All workloads hosted on the worker nodes continue to serve users as normal.

49
00:04:15,580 --> 00:04:19,390
Since the master is done, all management functions are done.

50
00:04:19,690 --> 00:04:23,770
You cannot access the cluster using control or other components API.

51
00:04:24,190 --> 00:04:28,090
You cannot deploy new applications or delete or modify existing ones.

52
00:04:28,390 --> 00:04:30,880
The controller managers don't function either.

53
00:04:31,220 --> 00:04:35,290
If a power was to fail, a new pod won't be automatically created.

54
00:04:35,650 --> 00:04:42,880
But as long as the nodes and the pods are up, your applications should be up and users will not be

55
00:04:42,880 --> 00:04:43,510
impacted.

56
00:04:44,020 --> 00:04:49,210
Once the upgrade is complete and the cluster is back up, it should function normally.

57
00:04:49,630 --> 00:04:55,870
We now have the master and the master components at more than one large 11 and the worker nodes at version

58
00:04:55,870 --> 00:04:56,580
1.0 10.

59
00:04:57,310 --> 00:05:00,820
As we saw earlier, this is a supported configuration.

60
00:05:01,040 --> 00:05:04,080
Is now time to upgrade the worker nodes?

61
00:05:04,600 --> 00:05:07,810
There are different strategies available to upgrade the worker nodes.

62
00:05:08,290 --> 00:05:15,580
One is to upgrade all of them at once, but then your pods are down and users are no longer able to

63
00:05:15,580 --> 00:05:17,080
access the applications.

64
00:05:18,130 --> 00:05:24,550
Once the upgrade is complete, the notes are back up, new pods are scheduled and users can resume access.

65
00:05:25,180 --> 00:05:27,580
That's one strategy that requires downtime.

66
00:05:28,360 --> 00:05:31,780
The second strategy is to upgrade one node at a time.

67
00:05:32,170 --> 00:05:39,580
So going back to the state where we have our master upgraded and node waiting to be upgraded, we first

68
00:05:39,580 --> 00:05:45,700
upgrade the first node where the workloads move to the second and third node and users are so far from

69
00:05:45,700 --> 00:05:46,060
there.

70
00:05:46,750 --> 00:05:52,690
Once the first note is upgraded and back up with an update, the second node where the workloads move

71
00:05:52,690 --> 00:05:54,340
to the first and third node.

72
00:05:56,170 --> 00:06:01,750
And finally, the third node where the workloads are shared between the first two, until we have all

73
00:06:01,750 --> 00:06:07,630
nodes upgraded to a newer version, we then follow the same procedure to upgrade the nodes from one

74
00:06:07,630 --> 00:06:10,990
not 11 to 12 and then one to 13.

75
00:06:11,770 --> 00:06:19,300
A third strategy would be to add new nodes to the cluster nodes with newer software version.

76
00:06:21,060 --> 00:06:27,090
This is especially convenient if you're on a cloud environment where you can easily provision new nodes

77
00:06:27,090 --> 00:06:33,810
and decommission old ones, nodes with the newer, softer version can be added to the cluster, move

78
00:06:33,810 --> 00:06:35,250
the workload over to the new.

79
00:06:37,550 --> 00:06:38,840
And remove the old not.

80
00:06:40,300 --> 00:06:44,800
Until you finally have all new notes with the new software version.

81
00:06:46,800 --> 00:06:54,330
Let us now see how it is done, so we were to upgrade this cluster from one, not 11 to 13, Curium

82
00:06:54,330 --> 00:07:01,770
has an upgrade command that helps in upgrading clusters with curium run the security upgrade plan command,

83
00:07:01,800 --> 00:07:04,470
and it will give you a lot of good information.

84
00:07:04,680 --> 00:07:10,650
The current cluster version, the be tooele version, the latest table version of commonalties.

85
00:07:11,400 --> 00:07:17,730
Then it lists all the control plane components and their versions and what version these can be upgraded

86
00:07:17,730 --> 00:07:18,090
to.

87
00:07:18,570 --> 00:07:24,180
It also tells you that after we upgrade the control plane components, you must manually upgrade the

88
00:07:24,180 --> 00:07:25,860
CubeSat versions on each node.

89
00:07:26,550 --> 00:07:30,570
Remember to EDM does not install or upgrade.

90
00:07:31,950 --> 00:07:35,130
Finally, it gives you the command to upgrade the cluster.

91
00:07:35,490 --> 00:07:40,620
Also note that you must upgrade the aquarium tool itself before you can upgrade the cluster.

92
00:07:41,290 --> 00:07:45,990
The aquarium to also follows the same software version as commonalties.

93
00:07:46,690 --> 00:07:50,460
So we are at one dot 11 and we want to go to one dot 13.

94
00:07:50,880 --> 00:07:54,720
But remember, we can only go one minor version at a time.

95
00:07:54,870 --> 00:08:02,460
So we first go to one or 12, first upgrade the aquarium tool itself to version 1.0 12, then upgrade

96
00:08:02,460 --> 00:08:10,380
the cluster using the command from the upgrade plan output, herbarium upgrade, apply it, post the

97
00:08:10,380 --> 00:08:16,620
necessary images and upgrade the cluster components once complete your control plane components are

98
00:08:16,620 --> 00:08:18,180
now at one dot twelve.

99
00:08:18,930 --> 00:08:24,510
If you run the cube control get nodes command, you will still see the master node at one point eleven.

100
00:08:24,930 --> 00:08:31,710
This is because in the output of this command, it is showing the versions of Kubilius on each of these

101
00:08:31,710 --> 00:08:36,720
nodes registered with the API server and not the version of the API server itself.

102
00:08:37,420 --> 00:08:39,480
So the next step is to upgrade the cube.

103
00:08:39,480 --> 00:08:45,440
Let's remember, depending on your setup, you may or may not have Kuebler running on your master node.

104
00:08:45,660 --> 00:08:51,840
In this case, the cluster deployed with curium has Kubla on the master node, which are used to run

105
00:08:51,840 --> 00:08:55,320
the control plane components as part of the master note.

106
00:08:56,100 --> 00:09:02,430
When we set up a component as cluster from scratch later during this course, we do not install it on

107
00:09:02,430 --> 00:09:03,180
the master node.

108
00:09:03,390 --> 00:09:07,170
You will not see the master node in the output of this command in that case.

109
00:09:08,250 --> 00:09:11,250
So the next step is to upgrade Kubelik on the master node.

110
00:09:11,550 --> 00:09:16,980
If you have Kubla on them, run the Apte get upgrade Kubla to command for this.

111
00:09:17,610 --> 00:09:20,400
Once the package is upgraded, restart the cube.

112
00:09:20,400 --> 00:09:21,060
Let's service.

113
00:09:22,970 --> 00:09:28,540
Running the cube control get no comment now shows that the master has been upgraded to one, not 12.

114
00:09:29,120 --> 00:09:32,240
The worker notes are still at one dot 11.

115
00:09:33,230 --> 00:09:39,980
So next, the worker notes, let us start one at a time, we need to first move the workload from the

116
00:09:39,980 --> 00:09:42,230
first worker node to the other nodes.

117
00:09:42,620 --> 00:09:48,800
The control drain command lets you safely terminate all the pods from a node and reschedules them on

118
00:09:48,800 --> 00:09:49,520
the other nodes.

119
00:09:50,390 --> 00:09:54,200
It also corden's the node and marks it unredeemable.

120
00:09:54,230 --> 00:10:01,100
That way, no new ports are scheduled on it, then upgrade the curium and Kubelik packages on the worker

121
00:10:01,100 --> 00:10:02,900
nodes as we did on the master node.

122
00:10:03,440 --> 00:10:10,520
Then using the cube adiam to upgrade command update the node configuration for the new Kubelik version,

123
00:10:11,210 --> 00:10:13,010
then restart the Kubla to service.

124
00:10:13,820 --> 00:10:16,770
The node should now be up with the new software version.

125
00:10:17,450 --> 00:10:24,050
However, when we drain the node, we actually marked it on schedule, so we need to unmask it by running

126
00:10:24,050 --> 00:10:25,520
the command control.

127
00:10:25,550 --> 00:10:27,260
Uncoordinated node one.

128
00:10:28,220 --> 00:10:34,580
The node is now scheduled, but remember that it is not necessary that the pods come right back to this

129
00:10:34,580 --> 00:10:34,910
node.

130
00:10:35,390 --> 00:10:41,870
It is only marked as dealable only when the parts are deleted from the other node or when new parts

131
00:10:41,870 --> 00:10:44,710
are scheduled to really come back to this first note.

132
00:10:45,560 --> 00:10:50,710
Well, it will soon come when we take down the second node to perform the same steps to upgrade it.

133
00:10:51,260 --> 00:10:52,910
And finally, the third node.

134
00:10:54,670 --> 00:10:56,680
We now have all nodes upgraded.

135
00:10:57,220 --> 00:10:59,350
Well, that's it for this lecture.

136
00:10:59,680 --> 00:11:05,380
Head over to the practice test where you will practice upgrading and live cluster with applications

137
00:11:05,380 --> 00:11:08,660
running on it without taking the applications down.

138
00:11:09,550 --> 00:11:10,090
Good luck.