1
00:00:02,570 --> 00:00:10,610
Hello and welcome to the solution video for the practice lab for troubleshooting control plane competence

2
00:00:12,070 --> 00:00:15,520
so the first question the cluster is broken.

3
00:00:15,520 --> 00:00:21,790
We tried deploying an application but it's not working troubleshoot and fix the issue.

4
00:00:21,790 --> 00:00:29,110
So for this practice lab we'll have to repeatedly try and figure out issues with controlled plane competence

5
00:00:29,860 --> 00:00:38,920
in the master node and fix it so the first question let's check the existing deployments in this cluster

6
00:00:42,590 --> 00:00:46,880
so you can see that a deployment called app has been created.

7
00:00:47,080 --> 00:00:49,720
But the pod is in a pending state

8
00:00:52,580 --> 00:00:57,020
let's run describe against the port which is in the pending state

9
00:01:03,170 --> 00:01:11,520
so you can see that the inmate uses engine X Alpine the status as spending that doesn't seem to be any

10
00:01:11,520 --> 00:01:13,580
other issue with this port.

11
00:01:15,140 --> 00:01:21,860
So if you remember from the theory and from the lectures the company which is responsible for scheduling

12
00:01:22,100 --> 00:01:28,850
apart onto a worker node as the governor is scheduler you can read more about it from the Covenant is

13
00:01:28,850 --> 00:01:37,180
documentation search for scheduler and you can see that the default scheduler is called queue by phone

14
00:01:37,190 --> 00:01:45,170
scheduler which runs as part of the control plane let's check the scheduler status by running a CubeSat

15
00:01:45,170 --> 00:01:48,490
he'll get ports in the cube system namespace

16
00:01:53,630 --> 00:02:01,040
now straight away you will observe that the cube scheduler master is in a crash loop back off state

17
00:02:03,160 --> 00:02:06,070
now let's check the logs for the governor the scheduler

18
00:02:14,580 --> 00:02:21,120
if you scroll all the way to the bottom you will see that there's a warning with an error failed to

19
00:02:21,120 --> 00:02:28,570
set container cube scheduler error response and below that you can see that the starting container process

20
00:02:29,140 --> 00:02:35,920
was not found in the path and the process name as cube hyphen scheduler and you can see that scheduler

21
00:02:35,920 --> 00:02:45,410
is spelled incorrectly so we'll have to fix this by updating the command and the cube scheduler pod

22
00:02:45,470 --> 00:02:46,400
definition file

23
00:02:49,780 --> 00:02:56,980
he'll also observe that the cube scheduler is appended by the word master which is the node where it's

24
00:02:56,980 --> 00:02:57,500
running.

25
00:02:57,550 --> 00:03:06,070
So this is a handy little tip which confirms that the pod is created as a static pod for any static

26
00:03:06,070 --> 00:03:12,090
pod the pod name is appended by the name where the pod is running.

27
00:03:12,100 --> 00:03:18,970
So in this case it's running on Master because it's a controlled plane competent so we can assume that

28
00:03:18,970 --> 00:03:27,990
this is a static pod another way to confirm this as by checking air for static pod directory has been

29
00:03:27,990 --> 00:03:29,440
configured in the cube it

30
00:03:32,580 --> 00:03:39,560
to do this let's check these service configuration for the cube let's service so I'm going to the default

31
00:03:39,680 --> 00:03:49,800
service directory here under ATC system the system Kublai dots service and looks like there is a custom

32
00:03:50,490 --> 00:03:59,580
cube HDMI configuration created for the Q blade service and that has an additional argument for the

33
00:03:59,650 --> 00:04:05,210
new blade configuration which is stored on the war lip Q blade slash conflict Dot.

34
00:04:11,670 --> 00:04:23,830
Let's search for static port and legal and its documentation so to create a static port you can see

35
00:04:23,830 --> 00:04:29,370
that a static port part has to be defined with a leakage blade configuration file.

36
00:04:29,400 --> 00:04:33,550
So let's search for this field within the configuration file

37
00:04:44,600 --> 00:04:51,750
so in this case you can see that the static port path is slash EDC slash Governor this slash manifest.

38
00:04:53,100 --> 00:04:58,950
This is the default location for static ports which are created by QB the.

39
00:04:59,190 --> 00:05:05,160
So let's go to the spot and check the Cuban net as scheduler pod definition file

40
00:05:25,910 --> 00:05:32,770
so here you can see that within the command section the name of the scheduler is incorrect.

41
00:05:32,870 --> 00:05:39,060
So let's correct that and save the file as soon as you save the file.

42
00:05:39,060 --> 00:05:46,680
Make any changes to Cuba any scheduler dot Yammer within the static pod but it will recreate the pod

43
00:05:56,080 --> 00:06:01,760
you can see that the cube scheduler master has been recreated 3 seconds ago.

44
00:06:13,020 --> 00:06:20,870
Now that the scheduler is up and running the application port should soon be scheduled onto a a node.

45
00:06:20,890 --> 00:06:23,200
Now you can see that it's another release date

46
00:06:26,430 --> 00:06:27,900
let's validate the answer

47
00:06:31,850 --> 00:06:32,890
for the next question.

48
00:06:32,900 --> 00:06:45,230
We have to scale the deployment to to port so I'll make use of cube CDL scale deployment app with replicas

49
00:06:45,260 --> 00:06:46,460
as equal to to

50
00:06:51,230 --> 00:06:54,660
two looks like that has not vote but let's continue

51
00:06:57,410 --> 00:07:03,210
so for the third question even though the deployment was skill 2 to the number of boards does not seem

52
00:07:03,210 --> 00:07:09,600
to increase investigate and fix the issue inspect the competent responsible for managing deployments

53
00:07:09,600 --> 00:07:11,320
and replica sets.

54
00:07:11,370 --> 00:07:18,390
So again going back to the theory and if you watch the lectures you would have known that the competent

55
00:07:18,390 --> 00:07:25,390
responsible for managing deployments and replicas set as the Covenant as controller manager so let's

56
00:07:25,390 --> 00:07:31,870
go back and check the control plane competence in the coop system namespace and check the status

57
00:07:36,630 --> 00:07:44,440
now in this case the cube controller manager M. is also in a crash loop back off state.

58
00:07:44,670 --> 00:07:51,350
Now again this pod is appended with the name of the node which is M..

59
00:07:51,350 --> 00:07:58,530
So this is also most likely to be a static pod.

60
00:07:58,660 --> 00:08:01,180
We're already in the static pod directory.

61
00:08:01,180 --> 00:08:08,200
But first let's check the logs by running a cube CDL describe against the cube controller manager

62
00:08:28,820 --> 00:08:35,660
Let's run a cube CDL logs against the Q control manager we did not get any meaningful information from

63
00:08:35,660 --> 00:08:43,780
CubeSat still describe but from the logs you can see that it's trying to load a file call EDC government

64
00:08:43,790 --> 00:08:50,840
is controller or manager dash X X X X dot com conf and it's not able to find such a file or directory

65
00:09:00,630 --> 00:09:06,950
now going back to the location where it's trying to face the file there is no file called controller

66
00:09:06,970 --> 00:09:09,680
manager dash X is X dot com.

67
00:09:10,060 --> 00:09:13,540
Instead there is a file called controller manager dot conf

68
00:09:16,820 --> 00:09:23,540
so let's update the port definition file for cube controller manager and correct the file name

69
00:09:29,240 --> 00:09:33,600
too this is the cube config widgets feeling too fine so let's corrector.

70
00:09:33,650 --> 00:09:34,160
Finally

71
00:09:38,940 --> 00:09:46,360
as before as soon as you make a change the port will be redeployed as you can see it's right now in

72
00:09:46,360 --> 00:09:51,240
appending state but soon it will go to a running state.

73
00:09:51,510 --> 00:09:53,430
Now let's check our deployment

74
00:10:02,190 --> 00:10:14,890
as you can see it is now scaling to two replicas let's validate the answer and move on to the next question.

75
00:10:15,050 --> 00:10:17,210
Something is wrong with scaling again.

76
00:10:17,210 --> 00:10:22,640
We just tried scaling the deployment to three replicas but it's not happening so it's safe to assume

77
00:10:22,640 --> 00:10:26,000
that the cube controller manager is broken again

78
00:10:32,080 --> 00:10:40,130
and you can see that the cube controller manager part is now back to a crash loop back off state.

79
00:10:40,170 --> 00:10:42,180
So let's do the same thing as before.

80
00:10:42,180 --> 00:10:45,240
First let's run describe against that part

81
00:10:49,560 --> 00:10:53,550
it doesn't seem like you're getting a lot of information related to the error here.

82
00:10:53,550 --> 00:10:56,250
So let's run a cube CTA log

83
00:11:01,740 --> 00:11:02,010
now.

84
00:11:02,010 --> 00:11:09,770
In this case you're seeing that it is unable to load DC file which should be EDC given it is speaking.

85
00:11:09,780 --> 00:11:11,810
I don't see it on CRT.

86
00:11:12,030 --> 00:11:17,700
So this is from the container log so let's see how the container is making use of this file

87
00:11:23,280 --> 00:11:27,830
so I'm going back to the port definition file for the cube cube controller.

88
00:11:27,860 --> 00:11:28,400
Manager

89
00:11:32,780 --> 00:11:35,340
before that let's make sure that we're on the correct path.

90
00:11:35,340 --> 00:11:36,670
So we are.

91
00:11:36,660 --> 00:11:41,960
And it is equal than it is manifest directory which as the static port part

92
00:11:53,760 --> 00:11:56,480
the CIA file as a D.C. covenant is picky.

93
00:11:56,480 --> 00:12:17,920
I don't see year old CRT so I'm just going to copy this and try to see where this is configured.

94
00:12:18,110 --> 00:12:25,560
And again it is used in the command and as they see it on CRT.

95
00:12:25,580 --> 00:12:31,500
Now let's scroll down and try to make sense of how this is actually used.

96
00:12:31,550 --> 00:12:39,460
Now if you scroll down you'll see that the pod has a mount part with the name key hyphen silhouettes

97
00:12:39,610 --> 00:12:46,450
which makes use of the amount but ADC covenant is large PGI on the container.

98
00:12:46,450 --> 00:12:53,010
Now let's look at how the kid Ivan inserts volume has been configured

99
00:13:02,760 --> 00:13:09,810
so if you search for kids I find certs within the same pod definition file you'll see that it is created

100
00:13:09,840 --> 00:13:18,370
as a host part so the volume refers to the part EDC covenant is slash wrong.

101
00:13:18,420 --> 00:13:24,390
PGI directory within the master node but looks like this part itself is wrong.

102
00:13:29,580 --> 00:13:40,610
Now if you go back to EDC slash given it is there is a wrong B.K. I territory here and it's got no seed

103
00:13:40,660 --> 00:13:42,790
or CRT file within it.

104
00:13:43,110 --> 00:13:49,420
The correct directory should be slash ADC governorates slash picky I

105
00:13:54,700 --> 00:14:00,280
as you can see that as the correct directory which has the C certificate.

106
00:14:00,390 --> 00:14:05,750
Now let's go back and edit the queue controller manager

107
00:14:11,140 --> 00:14:16,050
and we will fix the part for Kate's iPhone so its volume

108
00:14:38,370 --> 00:14:59,910
and as soon as I've made that change the cube controller manager has been redeployed.

109
00:14:59,940 --> 00:15:03,740
Now you can see that the total replica is up and running.

110
00:15:04,060 --> 00:15:06,160
So let's validate the answer.

111
00:15:10,040 --> 00:15:13,270
Congratulations you have successfully finished this lap.

112
00:15:13,370 --> 00:15:17,030
I hope that you found the solution video useful.

113
00:15:17,230 --> 00:15:19,070
I'll see you and the next one.

114
00:15:19,070 --> 00:15:19,730
Thank you.
