1
00:00:01,460 --> 00:00:03,830
Hello and welcome to this lecture.

2
00:00:03,920 --> 00:00:09,020
Throughout this course we have actually worked on a number of troubleshooting exercises with respect

3
00:00:09,020 --> 00:00:11,660
to the topic we were going through at that point in time.

4
00:00:12,260 --> 00:00:15,060
So a lot of troubleshooting is already covered.

5
00:00:15,080 --> 00:00:22,160
We will go through an overview of troubleshooting techniques and procedures and work on some more practice

6
00:00:22,160 --> 00:00:24,290
tests in this section.

7
00:00:24,290 --> 00:00:27,290
We start with application failures.

8
00:00:27,390 --> 00:00:32,580
Let's take a look at a two tier application that has a web and a database server.

9
00:00:32,820 --> 00:00:38,820
The database pod hosting a database and serving the web servers through a database service.

10
00:00:39,320 --> 00:00:45,860
The web server is hosted on a Web pod and serves users through the web service.

11
00:00:45,960 --> 00:00:51,120
It's good to write down or draw a map or chart of how your application is configured

12
00:00:51,150 --> 00:00:57,900
before you start. Depending on how much you know about the failure you may choose to start from either

13
00:00:57,900 --> 00:00:59,160
end of this map.

14
00:00:59,460 --> 00:01:06,900
But remember to check every object and link in this map until you find the root cause of the issue. Say

15
00:01:06,900 --> 00:01:08,020
in our case,

16
00:01:08,020 --> 00:01:12,260
users report some issue with accessing the application. First.

17
00:01:12,310 --> 00:01:16,810
we start with the application front-end. Use standard ways of testing

18
00:01:16,810 --> 00:01:22,630
if your application is accessible. If it’s a web application,  check if the web server is accessible on

19
00:01:22,630 --> 00:01:26,890
the IP of the node-port using curl. Next, check

20
00:01:26,980 --> 00:01:30,830
the service. Has it discovered endpoints for the web pod?

21
00:01:30,850 --> 00:01:32,360
In this case it did.

22
00:01:32,360 --> 00:01:38,810
But if it did not then you might want to check the service to pod discovery. Compare the selectors configured

23
00:01:38,810 --> 00:01:41,090
on the service to the ones on the pod.

24
00:01:41,240 --> 00:01:43,940
Make sure they match. Next,

25
00:01:44,040 --> 00:01:47,820
Check the pod itself and make sure it is in a running state.

26
00:01:48,060 --> 00:01:53,760
The status of the pod as well as the number of restarts can give you an idea of whether the application

27
00:01:53,760 --> 00:01:57,840
on the pod is running or is getting restarted.

28
00:01:57,840 --> 00:02:01,980
Check the events related to the pod using the described command.

29
00:02:01,980 --> 00:02:08,340
Check the logs of the application using the logs command. If the pod is restarting due to a failure

30
00:02:08,430 --> 00:02:13,320
Then the logs in the current version of the pod that that's running the current version of the container

31
00:02:13,500 --> 00:02:16,470
may not reflect why it failed the last time.

32
00:02:17,070 --> 00:02:22,680
So you either have to watch these logs using the –f option and wait for the application to fail

33
00:02:22,690 --> 00:02:28,530
again or use the previous option to view the logs of a previous pod.

34
00:02:29,470 --> 00:02:36,190
Next check the status of the db-service as before. And finally check the DB pod itself.


35
00:02:36,270 --> 00:02:42,670
Check the logs of the D.B. pod and look for any errors in the database.

36
00:02:42,690 --> 00:02:47,920
There are some more tips documented in the Kubernetes documentation page for Troubleshooting applications.

37
00:02:47,940 --> 00:02:51,410
This will help in upcoming practice test as well as in the exam.