1
00:00:00,990 --> 00:00:09,370
Hi and welcome to the solution video where we will troubleshoot failures with the work of node as soon

2
00:00:09,370 --> 00:00:11,070
as you click on the quiz portal.

3
00:00:11,100 --> 00:00:13,820
We'll give you a message that it's breaking something.

4
00:00:13,990 --> 00:00:20,440
So let's wait for the actual question to load up before we start our troubleshooting exercise.

5
00:00:20,440 --> 00:00:26,410
While it loads the question let's look at the official documentation for troubleshooting clusters

6
00:00:29,990 --> 00:00:36,460
so this particular document might be handy when you want to troubleshoot issues with Master or worker

7
00:00:36,470 --> 00:00:38,460
nodes in this specific lab.

8
00:00:38,480 --> 00:00:41,650
We are looking at what can node related issues.

9
00:00:41,810 --> 00:00:47,580
So the two things that you would have to check is cube bullet and cube proxy because those are the two

10
00:00:47,590 --> 00:00:54,830
Cuban covenants specific competence which are running on the worker node so we'll be focusing mainly

11
00:00:54,830 --> 00:01:00,930
on these two services it looks like the first question has loaded up.

12
00:01:01,040 --> 00:01:06,390
So we have to fix the broken cluster to see if it's really broken.

13
00:01:06,410 --> 00:01:09,220
Let's run the cube CTO get nodes come on

14
00:01:12,580 --> 00:01:19,960
so you can see that this is a two node cluster that has one master and there is a node 0 1 worker which

15
00:01:19,960 --> 00:01:24,810
is now not ready state to troubleshoot further.

16
00:01:24,820 --> 00:01:28,920
Let's use a search into that specific node

17
00:01:33,980 --> 00:01:38,050
so first thing we will do is check the cube led process

18
00:01:44,850 --> 00:01:48,630
so in this case it doesn't look like a process for cube it is running.

19
00:01:48,730 --> 00:01:57,680
Let's check the cube let's service the real use system CDL status cube bullet

20
00:02:01,640 --> 00:02:05,280
now in this case you can see that the cube lets serves as dead

21
00:02:10,780 --> 00:02:16,510
tool has been rotated so there is no additional logs that you can see we can try to look for additional

22
00:02:16,510 --> 00:02:21,180
logs using dash L but sometimes that's not really useful.

23
00:02:22,550 --> 00:02:29,120
What we can try to do is restart the cube let service and see if that works

24
00:02:34,490 --> 00:02:38,790
so now after resetting the cube let's service it's back to an active state

25
00:02:43,420 --> 00:02:52,060
let's see if the node 0 1 goes back to really state with just starting the cube let's service so I'll

26
00:02:52,060 --> 00:02:58,540
go back to the master node and I'll run cube still get nodes so looks like in this case the node 0 one

27
00:02:58,540 --> 00:03:05,290
is back to already state after restarting the cube let service let's validate of that ze correct answer

28
00:03:09,500 --> 00:03:17,820
okay that was successful so let's go on to the next question the cluster is broken again investigate

29
00:03:17,850 --> 00:03:18,780
and fix the issue

30
00:03:21,620 --> 00:03:28,700
so if I run cube city you'll get nodes you can see that node 0 1 as again back to not ready state yet

31
00:03:28,730 --> 00:03:37,210
another command that you can make use of ice cube CDL describe node now this might not give you in-depth

32
00:03:37,510 --> 00:03:44,530
details as to why the node has failed but in some cases if there are issues with the hardware with the

33
00:03:44,530 --> 00:03:52,000
specifications of the nodes or one or the will Conan it can provide you some meaningful insights here

34
00:03:52,000 --> 00:04:01,140
as well but in this case it doesn't look like there is anything specific related to the current issue

35
00:04:01,500 --> 00:04:09,170
and the cube deal described node so we'll have to assess each into node 0 1

36
00:04:15,560 --> 00:04:16,210
as before.

37
00:04:16,230 --> 00:04:20,790
Let's try to run our status for the Cuban it service

38
00:04:25,020 --> 00:04:30,040
I can see that the lip service is in an activating state.

39
00:04:30,040 --> 00:04:32,030
It's not dead or inactive.

40
00:04:32,050 --> 00:04:34,090
Looks like it's not just able to start

41
00:04:41,630 --> 00:04:47,690
case when we can use general CTO to troubleshoot further related to the service

42
00:04:51,410 --> 00:04:51,750
here.

43
00:04:51,750 --> 00:05:00,670
I'm going to run General TTL Dash you killed like I'll scroll down to the bottom by pressing shift.

44
00:05:00,680 --> 00:05:11,400
G and here I can see an error that it's unable to load the client see a file which is supposed to be

45
00:05:11,400 --> 00:05:24,510
located at a DC it is speaking I the wrong see a hyphen file does knows such file or directory.

46
00:05:24,620 --> 00:05:29,560
So now let's look at the cube left configuration by default.

47
00:05:29,660 --> 00:05:43,800
This is located at EDC system de sys cube led dot servers Dottie and in this key since it's created

48
00:05:43,800 --> 00:05:50,370
use in Cuba IDM there is a custom service definition file located in this directory

49
00:05:53,030 --> 00:06:00,770
if you open this file you'll see that it has references to other configuration files such as cube config

50
00:06:01,250 --> 00:06:03,770
and the actual configuration used by cube blade

51
00:06:07,640 --> 00:06:10,220
let's check the cube blade configuration file

52
00:06:19,580 --> 00:06:27,750
if you open this file and check the client see a file you can see the incorrect part let's check if

53
00:06:27,750 --> 00:06:29,450
this part as valid or not.

54
00:06:44,530 --> 00:06:49,930
So the CIA dot CRT file is located under a DC Governor the Speaker I

55
00:06:54,120 --> 00:07:04,950
so let's correct that with the naked blade configuration.

56
00:07:05,020 --> 00:07:10,080
Now once we have made this change we'll have to do a system CDL daemon reload

57
00:07:19,480 --> 00:07:23,830
and once that's complete we can do a restart of the Q Let's service

58
00:07:31,840 --> 00:07:36,000
now the cube lit as an actor or running state.

59
00:07:36,210 --> 00:07:39,170
Let's go back to my dorm astronaut and check the status

60
00:07:42,560 --> 00:07:45,230
not 0 1 is now back to already state

61
00:07:49,420 --> 00:07:51,430
the cluster is broken again.

62
00:07:51,460 --> 00:07:59,170
This is the third and final question we're allowed to do the same thing as before troubleshoot why the

63
00:07:59,170 --> 00:08:00,250
cluster is broken.

64
00:08:03,290 --> 00:08:08,960
If you're on the same command you can see that node 0 1 is now back to not ready.

65
00:08:09,020 --> 00:08:14,590
Let's see if anything is listed within the cube C to describe command

66
00:08:18,980 --> 00:08:23,360
doesn't look like there is anything meaningful related to the area that we are seeing.

67
00:08:23,420 --> 00:08:25,850
So let's again go back to North 0 1

68
00:08:31,340 --> 00:08:32,440
and as before.

69
00:08:32,450 --> 00:08:42,840
Let's run our systems EDL status cube let the skids it is active and running but you can see a few errors

70
00:08:49,710 --> 00:08:55,300
one of the errors that you would see is that it's getting a connection timeout or a connection refused

71
00:08:55,310 --> 00:09:06,540
error DCP 1 7 2 dot 17 0 0 0 22 votes 6 5 5 3 Let's check the journals we deal

72
00:09:12,250 --> 00:09:20,020
it's also complaining of the same error there is a connection refused to the IP 172 dot seventeen dot

73
00:09:20,020 --> 00:09:27,700
zero dot 22 and port six Phi Phi three so the cube bullet is trying to communicate to the API server

74
00:09:27,700 --> 00:09:35,350
here and it's fairly straight away if you remember the concepts the default cube apex of a port user

75
00:09:35,390 --> 00:09:45,770
6 4 4 3 we can validate this by running cube CDL cluster in 4 so here you can see that the correct port

76
00:09:45,800 --> 00:09:50,060
as 172 dots only nodes order 22 6 4 4 3

77
00:09:53,480 --> 00:10:01,670
so let's assess you know 0 1 and check out the configuration for Cuba let as before let's go to the

78
00:10:01,670 --> 00:10:04,670
default cube so this definition file

79
00:10:13,600 --> 00:10:20,900
and let's check the Q config the cube config is where the details for the API server stored

80
00:10:27,680 --> 00:10:35,550
now if you look at the service section you can see that the port is incorrect let's correct that and

81
00:10:35,550 --> 00:10:50,560
restart the cube let's service as before let's reload and restart.

82
00:10:50,820 --> 00:10:53,910
Let's go back to the most known and get the status of the nodes

83
00:10:57,890 --> 00:10:59,800
now it's back to a release date

84
00:11:03,830 --> 00:11:05,730
so that seems to have fixed the cluster.

85
00:11:05,900 --> 00:11:07,820
Thank you for joining me in this video.

86
00:11:07,910 --> 00:11:11,720
I hope that you found it useful and I'll see you in the next one.
