1
00:00:00,510 --> 00:00:03,590
Hello and welcome to this lecture. In this lecture

2
00:00:03,600 --> 00:00:07,440
we'll talk about ETCD in a high availability setup.

3
00:00:07,440 --> 00:00:13,020
So this is really a prerequisite lecture for the next lecture where we talk about configuring Kubernetes

4
00:00:13,140 --> 00:00:15,720
in a highly available mode.

5
00:00:15,720 --> 00:00:21,130
Well one portion of that deals with configuring ETCD in HA mode.

6
00:00:21,150 --> 00:00:28,440
So in this lecture we will discuss about ETCD in HA mode in the beginning of this course we took

7
00:00:28,440 --> 00:00:30,510
a quick look at ETCD.

8
00:00:30,570 --> 00:00:36,710
We will now recap real quick and more importantly focus on the cluster configuration on ETCD.

9
00:00:36,810 --> 00:00:41,970
So let’s recap real quick and look at the number of nodes in the cluster, what RAFT protocol is etc.

10
00:00:42,870 --> 00:00:43,980
So what is ETCD?

11
00:00:44,070 --> 00:00:48,580
It's a distributed reliable key value store that is simple secure and fast.

12
00:00:48,600 --> 00:00:49,540
So let's break it up.

13
00:00:49,560 --> 00:00:53,820
Traditionally data was organized and stored in tables like this.

14
00:00:53,820 --> 00:01:00,030
For example to store details about a number of individuals. A key value store stores information in the

15
00:01:00,030 --> 00:01:02,000
form of documents or pages.

16
00:01:02,130 --> 00:01:08,270
So each individual gets a document and all information about that individual is stored within that file.

17
00:01:08,370 --> 00:01:15,090
These files can be in any format or structure and changes to one file does not affect the others.

18
00:01:15,090 --> 00:01:21,470
In this case the working individuals can have their files with salary fields. While you could store and

19
00:01:21,470 --> 00:01:25,540
retrieve simple key and values when your data gets complex.

20
00:01:25,550 --> 00:01:30,090
you typically end up transacting in data formats like JSON or YAML.

21
00:01:30,830 --> 00:01:34,640
So that’s what etcd is and how you quickly get started with it.

22
00:01:34,640 --> 00:01:37,880
We also said that ETC is distributed.

23
00:01:37,880 --> 00:01:38,810
So what does that mean.

24
00:01:39,680 --> 00:01:43,280
And that is what we're going to focus in this lecture.

25
00:01:43,280 --> 00:01:49,340
We had ETCD on a single server. But it’s a database and may be storing critical data.

26
00:01:49,340 --> 00:01:54,090
So it is possible to have your datastore across multiple servers.

27
00:01:54,140 --> 00:02:00,890
Now you have 3 servers, all running etcd, and all maintaining an identical copy of the database.

28
00:02:00,890 --> 00:02:04,470
So if you lose one you still have two copies of your data.

29
00:02:04,580 --> 00:02:05,370
Perfect.

30
00:02:05,660 --> 00:02:09,530
But how does it ensure the data on all the nodes are consistent.

31
00:02:10,220 --> 00:02:18,470
You can write to any instance and read your data from any instance.  ETCD ensures that the same consistent

32
00:02:18,470 --> 00:02:23,800
copy of the data is available on all instances at the same time.

33
00:02:23,810 --> 00:02:26,900
So how does it do that? With reads,

34
00:02:26,900 --> 00:02:30,640
its easy. Since the same data is available across all nodes,

35
00:02:30,710 --> 00:02:35,590
you can easily read it from any nodes. But that is not the case with writes.

36
00:02:35,600 --> 00:02:41,180
What if two write requests come in on two different instances? Which one goes through?

37
00:02:41,240 --> 00:02:49,140
For example I have writes coming in for name set to john on one and with the name joe on the other.

38
00:02:49,250 --> 00:02:53,190
Of course we cannot have two different data on two different nodes.

39
00:02:53,510 --> 00:02:58,890
When I said ETCD can write through any instance, I wasn’t a 100% right.

40
00:02:58,900 --> 00:03:01,760
ETCD does not process the writes on each node.

41
00:03:01,910 --> 00:03:08,600
Instead, only one of the instances is responsible for processing the writes. Internally

42
00:03:08,600 --> 00:03:15,620
the two nodes elect a leader among them. Of the total instances one node becomes the leader and the other

43
00:03:15,620 --> 00:03:17,870
node becomes the followers.

44
00:03:17,870 --> 00:03:23,540
If the writes came in through the leader node, then the leader processes the write. The leader makes sure

45
00:03:23,870 --> 00:03:27,410
that the other nodes are sent a copy of the data.

46
00:03:27,530 --> 00:03:33,200
If the writes come in through any of the other follower nodes, then they forward the writes to the leader

47
00:03:33,260 --> 00:03:36,860
internally and then the leader processes the writes. Again

48
00:03:36,890 --> 00:03:43,640
when the writes are processed, the leader ensures that copies of the write and distributed to other instances

49
00:03:43,790 --> 00:03:45,270
in the cluster.

50
00:03:45,350 --> 00:03:53,630
Thus a write is only considered complete, if the leader gets consent from other members in the cluster.

51
00:03:53,630 --> 00:04:00,050
So how do they elect the leader among themselves? And how do they ensure a write is propogated across

52
00:04:00,140 --> 00:04:06,830
all instances?  ETCD implements distributed consensus using RAFT protocol.

53
00:04:06,830 --> 00:04:09,560
Let's see how that works in a three node cluster.

54
00:04:09,790 --> 00:04:16,100
When the cluster is setup we have 3 nodes that do not have a leader elected. RAFT algorithm uses

55
00:04:16,130 --> 00:04:19,030
random timers for initiating requests.

56
00:04:19,190 --> 00:04:25,790
For example a random timer is kicked off on the three managers the first one to finish the timer sends

57
00:04:25,790 --> 00:04:32,360
out a request to the other node requesting permission to be the leader. The other managers on receiving

58
00:04:32,360 --> 00:04:39,140
the request responds with their vote and the node assumes the Leader role. Now that it is elected the

59
00:04:39,150 --> 00:04:46,020
leader it sends out notification at regular intervals to other masters informing them that it is continuing

60
00:04:46,020 --> 00:04:53,560
to assume the role of the leader. In case the other nodes do not receive a notification from the leader

61
00:04:53,580 --> 00:04:59,370
at some point in time which could either be due to the leader going down or losing network connectivity

62
00:04:59,880 --> 00:05:07,320
the nodes initiate a re-election process among themselves and a new leader is identified going back

63
00:05:07,320 --> 00:05:13,650
to our previous example where a right come seen it is processed by the leader and is replicated to other

64
00:05:13,650 --> 00:05:20,280
nodes in the cluster the right is considered to be complete only once it is replicated to the other

65
00:05:20,280 --> 00:05:22,500
instances in the cluster.

66
00:05:22,500 --> 00:05:29,190
We said that the ETCD cluster is highly available. So even if we lose a node it should still function.

67
00:05:29,250 --> 00:05:35,550
Say for example a new write comes in but one of the node is not responding. And hence the leader is only

68
00:05:35,550 --> 00:05:41,400
able to write to 2 nodes in the cluster. Is the write considered to be complete? Does it wait for the

69
00:05:41,400 --> 00:05:46,390
third node to be up? Or does it fail?  A write is considered to be complete,

70
00:05:46,500 --> 00:05:53,100
if it can be written on the majority of the nodes in the cluster. For example, in this case of the 3 nodes,

71
00:05:53,250 --> 00:05:54,790
the majority is 2.

72
00:05:54,840 --> 00:06:00,550
So if the data can be written on two of the nodes then the right is considered to be complete.

73
00:06:00,570 --> 00:06:05,430
If the third node was to come on line then the data is copied to that as well.

74
00:06:05,490 --> 00:06:07,700
So what is the majority.

75
00:06:07,710 --> 00:06:15,000
Well a more appropriate term to use would be Quorum. Quorum is the minimum number of nodes that must

76
00:06:15,000 --> 00:06:21,720
be available for the cluster to function properly or make a successful right in case of 3.

77
00:06:21,720 --> 00:06:30,460
we know its 2. For any given number of nodes, the quorum is the total number of nodes divided by 2 + 1.

78
00:06:30,540 --> 00:06:36,960
So Quorum of 3 nodes is 3/2 which is 1.5 + 1 equals 2.5.

79
00:06:36,960 --> 00:06:42,280
If there is a .5 consider the whole number only so that's 2.

80
00:06:42,390 --> 00:06:45,310
Similarly quorum of 5 nodes is 3.

81
00:06:45,360 --> 00:06:53,000
So here is a table that shows the quorum of clusters of size 1 to 7. Quorum of 3 and 5 are what

82
00:06:53,000 --> 00:06:56,630
we calculated. Quorum of 1 is 1 itself.

83
00:06:56,630 --> 00:07:00,110
Meaning if you have a single non cluster none of these really apply.

84
00:07:00,110 --> 00:07:05,950
If you lose that note everything's gone if you look at 2 and apply the same formula.

85
00:07:06,120 --> 00:07:10,440
the quorum is 2 itself. 2/2 is 1 and 1 + 1 is 2.

86
00:07:10,890 --> 00:07:17,610
So even if you have 2 instances in the cluster the majority is still 2. If one fails,

87
00:07:17,610 --> 00:07:18,530
There is no quorum.

88
00:07:18,540 --> 00:07:20,540
So rates won't be processed.

89
00:07:20,700 --> 00:07:27,570
So having two instances is like having one instance it doesn't offer you any real value as quorum cannot

90
00:07:27,570 --> 00:07:28,740
be met.

91
00:07:28,740 --> 00:07:35,190
Which is why it is recommended to have a minimum of 3 instances in an ETCD cluster. That way

92
00:07:35,220 --> 00:07:38,380
it offers a fault tolerance of at least 1 node.

93
00:07:39,240 --> 00:07:43,920
If you lose one, you can still have quorum and the cluster will continue to function.

94
00:07:44,730 --> 00:07:50,220
So the first column minus the second column gives you the fault tolerance. The number of nodes that you

95
00:07:50,220 --> 00:07:53,620
can afford to lose while keeping the cluster alive.

96
00:07:53,700 --> 00:07:58,050
So we have 1 to 7 nodes here. 1 and 2 are out of consideration.

97
00:07:58,050 --> 00:08:01,540
So from 3 to 7 what do we consider?

98
00:08:01,770 --> 00:08:07,350
As you can see 3 and 4 have the same fault tolerance of 1 and 5 and  6 have the same fault

99
00:08:07,350 --> 00:08:14,850
tolerance of 2. When deciding on the number of master nodes,it is recommended to select an odd number

100
00:08:14,850 --> 00:08:21,010
as highlighted in the table  3 or 5 or 7. Say we have a 6 node cluster.

101
00:08:21,570 --> 00:08:28,420
So for example due to a disruption in the network it fails and causes the network to partition.

102
00:08:28,440 --> 00:08:31,770
We have 4 nodes on one and 2 on the other.

103
00:08:31,770 --> 00:08:38,980
In this case the group with 4 nodes have quorum and continues to operate normally. However if the

104
00:08:38,980 --> 00:08:45,530
network got partitioned in a different way resulting in nodes being distributed equally between the two,

105
00:08:45,610 --> 00:08:47,860
each group now has 3 nodes only.

106
00:08:47,980 --> 00:08:54,650
But since we originally had 6 manager nodes, the quorum for the cluster to stay alive is 4

107
00:08:54,880 --> 00:08:59,770
But if you look at the groups here neither of these groups have four managers to meet the quorum so

108
00:08:59,770 --> 00:09:01,580
it results in a failed cluster.

109
00:09:02,410 --> 00:09:09,220
So with even number of nodes there is possibility of the cluster failing during a network segmentation.

110
00:09:10,210 --> 00:09:14,190
In case we had odd number of managers originally, say 7,

111
00:09:14,260 --> 00:09:20,390
then after the network segmentation we have 4 on one segmented network and 3 on the other,

112
00:09:20,500 --> 00:09:26,380
and so our cluster still lives on the group with 4 managers as it meets the quorum of 4.

113
00:09:26,710 --> 00:09:33,010
No matter how the network segments there are better chances for your cluster to stay alive in case of

114
00:09:33,010 --> 00:09:35,950
network segmentation with odd number of nodes.

115
00:09:36,580 --> 00:09:44,290
So an odd number of nodes is preferred over even number having 5 is preferred over 6 and having

116
00:09:44,290 --> 00:09:51,750
more than 5 nodes is really not necessary as 5 gives you enough for tolerance. To install ETCD on

117
00:09:51,750 --> 00:09:52,420
a server.

118
00:09:52,530 --> 00:09:55,380
download the latest supported binary.  Extract it,

119
00:09:55,410 --> 00:10:02,840
create the required directory structure. Copy over the certificate files generated for ETCD. We

120
00:10:02,840 --> 00:10:08,630
discussed how to generate these certificates in detail in the TLS Certificates section. Then configure

121
00:10:08,630 --> 00:10:10,490
the ETCD service.

122
00:10:10,490 --> 00:10:16,850
What's important here is to note that the initial cluster option where we passing the peer's information

123
00:10:17,270 --> 00:10:24,320
That’s how each etcd service knows that it is part of a cluster and where its peers are. Once installed

124
00:10:24,380 --> 00:10:31,220
and configured use the etcdctl utility to store and retrieve data. ETCDCTL utility has

125
00:10:31,220 --> 00:10:32,810
two API versions.

126
00:10:32,810 --> 00:10:36,850
V2 and V3. So the commands work different in each version.

127
00:10:37,010 --> 00:10:38,540
Version 2 is default.

128
00:10:38,870 --> 00:10:42,680
However we will use version 3. So set an environment variable

129
00:10:42,680 --> 00:10:47,970
ETCDCTL_API to 3, otherwise the below commands won’t work.

130
00:10:48,000 --> 00:10:55,680
Run the etcdctl put  command and specify the key as name and value as john. To retrieve data from

131
00:10:55,720 --> 00:10:56,740
the etcdctl

132
00:10:56,740 --> 00:11:04,290
get command with the key /name and it returns the value john.To get all keys run the etcdctl

133
00:11:04,330 --> 00:11:05,650
get –keys-only

134
00:11:05,780 --> 00:11:10,040
command. Going back to our design,

135
00:11:10,110 --> 00:11:14,460
how many nodes, should our cluster have?  In an HA environment

136
00:11:14,460 --> 00:11:20,370
as you can see having 1 or 2 instances doesn’t really make any sense. As losing 1 node, in either

137
00:11:20,370 --> 00:11:26,220
case will leave you without quorum and thus render the cluster not functional. Hence the minimum required

138
00:11:26,250 --> 00:11:29,330
nodes in an HA setup is 3.

139
00:11:29,340 --> 00:11:34,980
We also discussed why we prefer odd number of instances over even number.  Having even number of instances

140
00:11:34,980 --> 00:11:40,250
can leave the cluster without quorum in certain network partition scenarios.

141
00:11:40,500 --> 00:11:47,340
So all that even number of nodes is out of scope. So we are left with 3 5 and 7 or any odd number above

142
00:11:47,340 --> 00:11:53,820
that 3 is a good start but if you prefer a higher level of full tolerance then 5 is better. But anything

143
00:11:53,820 --> 00:11:56,270
beyond that is just unnecessary.

144
00:11:56,310 --> 00:12:01,730
So considering your environment the fault tolerance requirements and the cost that you can bear you

145
00:12:01,740 --> 00:12:04,450
should be able to choose one number from this list.

146
00:12:04,530 --> 00:12:06,160
In our case we go with 3.

147
00:12:07,080 --> 00:12:10,200
So how does our design look now? With HA,

148
00:12:10,200 --> 00:12:13,440
the minimum required number of nodes for fault tolerance is 3.

149
00:12:13,890 --> 00:12:21,060
Now while it would be great to have 3 master nodes we are limited by our capacity of our laptop. So we

150
00:12:21,060 --> 00:12:22,190
will just go with 2.

151
00:12:22,260 --> 00:12:27,330
But if you're deploying the setup in another environment and have sufficient capacity feel free to go

152
00:12:27,330 --> 00:12:28,390
with 3.

153
00:12:28,680 --> 00:12:35,130
We also chose to go with the stacked topology where we will have the ETCD servers on the master nodes

154
00:12:35,220 --> 00:12:36,390
itself.

155
00:12:36,390 --> 00:12:37,830
Well that's it for this lecture.

156
00:12:38,340 --> 00:12:39,410
Thank you for listening.

157
00:12:39,450 --> 00:12:40,950
I will see you in the next.