1
00:00:03,340 --> 00:00:09,940
For this shell scripting exercise we're going to write a script named show Dasch attackers.

2
00:00:10,000 --> 00:00:17,230
SH And this script is going to require a file be provided as an argument and if a file is not provided

3
00:00:17,260 --> 00:00:21,940
or we can't read it for some reason then what we're going to do is have the script display an error

4
00:00:21,940 --> 00:00:25,620
message an exit with an exit status of one.

5
00:00:25,720 --> 00:00:29,990
This group counts the number of failed log in attempts by IP address.

6
00:00:30,100 --> 00:00:36,400
If there are any IP addresses with more than 10 log in attempts the number of attempts made the IP address

7
00:00:36,400 --> 00:00:41,910
from which those attempts were made and the location of that IP address will be displayed.

8
00:00:42,070 --> 00:00:48,490
By the way we're going to be using a command called Geo IP look up to determine the location of that

9
00:00:48,520 --> 00:00:49,330
IP address.

10
00:00:49,330 --> 00:00:53,700
We may or may not have covered that in a previous demonstration or lesson.

11
00:00:53,710 --> 00:00:59,470
So just be aware of that command give it an IP address and I'll tell you where it thinks that IP address

12
00:00:59,530 --> 00:01:01,270
originates from.

13
00:01:01,270 --> 00:01:04,490
We're going to make this group produce output and CSP.

14
00:01:04,600 --> 00:01:11,260
And of course V stands for comma separated values and we're going to give the output a header of Count

15
00:01:11,590 --> 00:01:14,540
comma I.P comma location.

16
00:01:14,560 --> 00:01:20,350
So here I am on a system and the sample data that we're going to use for this exercise is included in

17
00:01:20,350 --> 00:01:21,490
the course download.

18
00:01:21,520 --> 00:01:25,750
I've already placed a copy of it in the shared vagrant folder for this project.

19
00:01:25,990 --> 00:01:32,500
So if I C.D into forward slash vagrant then I can see all the files that are in that folder on my local

20
00:01:32,500 --> 00:01:33,920
physical machine.

21
00:01:33,970 --> 00:01:39,880
The first thing I do when working on these types of problems is to look at the data that I'm dealing

22
00:01:39,880 --> 00:01:40,370
with.

23
00:01:40,600 --> 00:01:45,670
I want to know what the input looks like first then I can start to look at ways to transform that data

24
00:01:45,940 --> 00:01:51,550
so that it's easy to work with or so that it meets my requirements so let me just look at the contents

25
00:01:51,550 --> 00:01:52,390
of this file.

26
00:01:55,800 --> 00:01:57,710
That's a lot of data.

27
00:01:57,720 --> 00:02:05,940
One thing I notice a lot or I see a lot of are lines that say failed password for root.

28
00:02:06,000 --> 00:02:10,860
I'm going to bet that people are not only trying to log in as the root user but probably some other

29
00:02:10,860 --> 00:02:12,790
common users as well.

30
00:02:12,900 --> 00:02:15,370
Let's see if that's the case.

31
00:02:15,390 --> 00:02:22,260
So a common thing here on these lines for failed password for root is the word failed with a capital

32
00:02:22,260 --> 00:02:22,500
F..

33
00:02:22,500 --> 00:02:31,110
So let me just grep out that pattern grip failed syslog sample here on our screen only see route.

34
00:02:31,110 --> 00:02:35,090
I think I saw a variation or two above when this output was scrolling by.

35
00:02:35,250 --> 00:02:38,460
So let's exclude route to see what else we get.

36
00:02:42,630 --> 00:02:48,090
If we look at the last five lines or so of output we see log in attempts for the buntu user the admin

37
00:02:48,090 --> 00:02:52,980
user the LP user the admin user again and a user named A.

38
00:02:53,250 --> 00:02:57,260
What I notice is that the lines aren't exactly in the same format.

39
00:02:57,270 --> 00:03:03,450
What sticks out to me here is that the words in valid user exist on most of those lines here but they

40
00:03:03,450 --> 00:03:05,160
don't exist on all of the lines.

41
00:03:05,160 --> 00:03:11,480
For example the LP line that third line from the bottom here says for LP from.

42
00:03:11,580 --> 00:03:17,790
But if you look at the next line it says for invalid user admin from the important piece of information

43
00:03:17,790 --> 00:03:21,350
that we want to isolate on each of these lines is the IP address.

44
00:03:21,510 --> 00:03:25,640
One way we could do that is to split the line on the word from.

45
00:03:25,920 --> 00:03:28,650
So let's grep for failed.

46
00:03:28,690 --> 00:03:34,270
We'll pipe this to awk and use the word from as a field separator.

47
00:03:34,410 --> 00:03:40,230
And actually I'm going to use from space so we don't end up with that extra space here and then we'll

48
00:03:40,230 --> 00:03:42,860
just print the second part of that.

49
00:03:42,900 --> 00:03:47,290
So we should end up with the IP address and the rest of the information on the line.

50
00:03:47,290 --> 00:03:48,350
Let's see what happens.

51
00:03:49,350 --> 00:03:53,800
Now we're left with four columns all separated by a single space.

52
00:03:53,820 --> 00:03:55,620
The first column is the IP address.

53
00:03:55,650 --> 00:04:00,240
The second column is the word port followed by the actual port number itself.

54
00:04:00,240 --> 00:04:06,510
And then finally the protocol which here is S-sh to now from here we can print the first column either

55
00:04:06,510 --> 00:04:09,190
with the cut or AWC.

56
00:04:09,200 --> 00:04:15,430
So let me just do this pipe this to awk print dollar sign 1 and that gives us the IP address.

57
00:04:15,650 --> 00:04:23,120
Or instead of using awk here we can use cut cut SD and we're going to separate on a space so that's

58
00:04:23,120 --> 00:04:26,110
our delimiter and we want the first field.

59
00:04:26,180 --> 00:04:28,390
And so we get the same output.

60
00:04:28,400 --> 00:04:32,340
So let's go back and talk about another way to solve this problem.

61
00:04:34,830 --> 00:04:39,780
If we count the number of fields from the left to right we end up with the IP address being in different

62
00:04:39,780 --> 00:04:44,000
fields depending on whether or not the user was valid or invalid.

63
00:04:44,040 --> 00:04:48,570
But if we count the number of fields from the right to left or from the end of the line towards the

64
00:04:48,570 --> 00:04:53,700
beginning of the line you'll see that the IP address is always the fourth column from the end.

65
00:04:53,730 --> 00:04:58,950
That means we can use alks special variable of n f which represents the total number of fields on a

66
00:04:58,950 --> 00:05:03,830
line and then do a little subtraction to end up with the IP address.

67
00:05:03,840 --> 00:05:05,820
So here we can do this

68
00:05:08,370 --> 00:05:15,960
arc print and then we're going to take the number of fields and subtract 3 from that number and we should

69
00:05:15,960 --> 00:05:18,590
end up with a column that has the IP address in it.

70
00:05:19,900 --> 00:05:23,070
So I've demonstrated two ways to extract the IP address.

71
00:05:23,080 --> 00:05:28,960
There are other ways in if you came up with another way that's totally fine as long as you have extracted

72
00:05:28,960 --> 00:05:32,950
the IP address from all those lines in that file.

73
00:05:32,950 --> 00:05:37,720
So this approach that I used here makes the most sense to me and it seems a little bit simpler so that's

74
00:05:37,720 --> 00:05:40,240
what I'm going to use going forward.

75
00:05:40,240 --> 00:05:45,160
We know we can use the unique command to count the number of occurrences of a line.

76
00:05:45,220 --> 00:05:47,810
We also know that unique Richar sorted input.

77
00:05:47,830 --> 00:05:52,040
So let's first assort our list of IP addresses and then send it to unique.

78
00:05:52,240 --> 00:05:57,100
So I'm just going to sort this list doesn't have to be a numeric sort it can be any sort as long as

79
00:05:57,100 --> 00:05:59,690
it's sorted unique doesn't care.

80
00:05:59,830 --> 00:06:05,700
So then all we can do here is pass it into unique and tell unique to count the occurrences of each line.

81
00:06:05,770 --> 00:06:10,450
And of course the only thing that are on these lines are IP addresses so it'll count the occurrences

82
00:06:10,480 --> 00:06:11,890
of these IP addresses.

83
00:06:12,990 --> 00:06:16,540
Now that we have this bit of data let's sort this numerically.

84
00:06:16,560 --> 00:06:20,020
So we'll just pipe this to sort dash in.

85
00:06:20,160 --> 00:06:25,030
Actually let's reverse this order and put the most failed attempts at the top of the list.

86
00:06:25,350 --> 00:06:31,430
So we can just add a dash r to our sort command to reverse it and we end up with the most failed logging

87
00:06:31,470 --> 00:06:35,460
attempts first and the least failed logging attempts last.

88
00:06:35,460 --> 00:06:39,210
By the way this simple log file contains entries from just one day.

89
00:06:39,210 --> 00:06:45,570
That means there were six thousand seven hundred and forty nine failed log in attempts from the IP address

90
00:06:45,600 --> 00:06:47,050
of 182.

91
00:06:47,130 --> 00:06:50,250
One hundred sixty seven dot 59.

92
00:06:50,460 --> 00:06:53,200
Now this could mean a couple of different things.

93
00:06:53,220 --> 00:06:57,770
The first thing that comes to my mind is someone was performing a brute force attack.

94
00:06:57,780 --> 00:07:03,540
However another possibility is that something is wrong with an account that we're using for some sort

95
00:07:03,540 --> 00:07:05,470
of automated process.

96
00:07:05,550 --> 00:07:11,250
Perhaps one of our servers and another data center is connecting to the system over S-sh to do some

97
00:07:11,250 --> 00:07:11,630
work.

98
00:07:11,640 --> 00:07:17,820
But maybe the S-sh key was accidentally changed or deleted or the password for the account was changed

99
00:07:17,820 --> 00:07:21,740
or some other configuration issue has cropped up here.

100
00:07:21,990 --> 00:07:26,460
So what I'm going to do is find the location of this IP address.

101
00:07:26,460 --> 00:07:31,830
Now there's a command called Geo IP look up that returns the location of an IP address and so let's

102
00:07:31,830 --> 00:07:32,610
run that now

103
00:07:40,510 --> 00:07:42,930
that IP address is associated with China.

104
00:07:42,970 --> 00:07:48,970
If you happen to have servers in China or people who work from China this still might not be an attack

105
00:07:49,000 --> 00:07:51,490
but just a misconfiguration of some sort.

106
00:07:51,490 --> 00:07:56,960
However let's assume our people only work in the United States Canada and Europe.

107
00:07:57,010 --> 00:08:03,160
Also let's assume our data centers are located in New York London in Amsterdam and this particular case

108
00:08:03,250 --> 00:08:06,610
I would interpret this activity as a brute force attack.

109
00:08:06,720 --> 00:08:13,330
It would be nice to have this location information for any IP addresses who fail to log into our servers

110
00:08:13,330 --> 00:08:16,030
more than let's say 10 times.

111
00:08:16,060 --> 00:08:22,660
If we look at the data we have we have two columns a count in column one and an IP address in column

112
00:08:22,690 --> 00:08:23,480
two.

113
00:08:23,500 --> 00:08:27,740
We could loop through this output and test to see if the count is greater than 10.

114
00:08:27,850 --> 00:08:32,740
And then if it is performed the geo IP look up on that associated IP address.

115
00:08:32,740 --> 00:08:34,000
Now let's take the command.

116
00:08:34,000 --> 00:08:35,890
We worked out here on the command line.

117
00:08:36,010 --> 00:08:40,080
Put it into a script and start working on this last bit of logic.

118
00:08:49,210 --> 00:08:51,030
We'll give our script a header here.

119
00:09:06,840 --> 00:09:12,390
What I'm going to do is actually use a variable to define or limit in that way if we decide or limit

120
00:09:12,390 --> 00:09:16,600
changes in the future we can quickly update that variable at the top of our script.

121
00:09:26,580 --> 00:09:28,820
Like I said I'm going to use a variable.

122
00:09:29,160 --> 00:09:34,530
You could actually create this script with an option and have the user specify that if you'd like but

123
00:09:34,530 --> 00:09:38,740
I'm just going to keep it simple and leave it at a hard coded number here.

124
00:09:38,760 --> 00:09:44,760
However what we are going to do is ask the user to provide us a file and so that will be the first argument

125
00:09:44,760 --> 00:09:45,880
on the command line.

126
00:09:46,200 --> 00:09:49,530
And if it doesn't exist we need to tell them about it.

127
00:10:16,220 --> 00:10:21,530
OK this is our little check here if the file doesn't exist or we can't open it or read it then we're

128
00:10:21,530 --> 00:10:25,550
going to tell them that we can't open the file they provide it or it's going to be blank if they don't

129
00:10:25,550 --> 00:10:30,580
provide a file and then we're going to exit with the exit status of one.

130
00:10:30,660 --> 00:10:35,080
Now what we need to do is loop through this data that we generated here.

131
00:10:44,520 --> 00:10:49,980
So I'm just going to paste that command we worked out and you remember that the command generated two

132
00:10:49,980 --> 00:10:50,750
columns of data.

133
00:10:50,760 --> 00:10:55,650
The first column being a count and then the second column being an IP address.

134
00:10:55,650 --> 00:11:03,270
So what we can do is pipe this to while read assign the first column to the variable named count and

135
00:11:03,270 --> 00:11:23,850
the second column to a variable named i.p.

136
00:11:23,860 --> 00:11:30,610
So here we can just do a simple check here if the count is greater than the limit we set

137
00:11:36,360 --> 00:11:46,210
then we're going to determine a location we'll use that Geo IP lookup command against the IP address

138
00:11:46,930 --> 00:11:48,880
and then we'll display this information.

139
00:12:00,510 --> 00:12:05,670
Well my common typing mistakes here is to put the dollar sign outside of the quotation marks instead

140
00:12:05,670 --> 00:12:07,930
of inside and I've corrected that here.

141
00:12:08,310 --> 00:12:12,230
Let's see if I have any more I'm just going to go to the top of the script for a quick look.

142
00:12:17,290 --> 00:12:18,460
OK that looks good.

143
00:12:18,460 --> 00:12:27,350
Let's go ahead and save our changes and test our script.

144
00:12:27,440 --> 00:12:30,340
We'll run it against the sample data we have here.

145
00:12:31,530 --> 00:12:35,020
So this is a good start but we need to clean up this output a bit.

146
00:12:35,040 --> 00:12:41,160
First let's remove that little bit of redundant information there GOP country Ed. Let's remove those

147
00:12:41,160 --> 00:12:48,180
words from the geo IP lookup output we could use cut and use a comma as the field separator but that

148
00:12:48,180 --> 00:12:50,560
would leave a space before the country.

149
00:12:50,580 --> 00:12:56,610
However we could use awk and include the space after the comma as the field separator so let's do that

150
00:12:56,640 --> 00:12:57,400
instead.

151
00:12:59,130 --> 00:13:04,170
While we're here let's turn this output into see S V output and let's display a heter

152
00:13:10,310 --> 00:13:11,800
will display the count.

153
00:13:11,940 --> 00:13:18,170
The IP address and the location here for our Hetter.

154
00:13:18,180 --> 00:13:25,830
Now we're going to change this geo IP or look up output going to pipe that to arc with a field separator

155
00:13:25,830 --> 00:13:31,580
of a comma and a space and then that will leave us with the data we need here in the second field.

156
00:13:32,500 --> 00:13:38,470
And also while I'm here I'm going to change this echo command here to have commas in between the data.

157
00:13:38,960 --> 00:13:39,910
And that looks good.

158
00:13:39,910 --> 00:13:44,420
Let's see what this brings us show attackers this log sample.

159
00:13:44,840 --> 00:13:47,930
Ok grace we have a header of Cal Typee location.

160
00:13:48,080 --> 00:13:56,230
Then the first line we have a count of 60 7:49 with an IP of 180 to 167 to 59 and a location of China.

161
00:13:56,230 --> 00:14:00,650
So this is exactly how we want our output to be displayed.

162
00:14:00,970 --> 00:14:06,260
While we're here let's make sure that our little file tests that we wrote at the top of our script works.

163
00:14:06,340 --> 00:14:08,850
So let's provide some fake data to it.

164
00:14:11,970 --> 00:14:13,020
OK that looks good.

165
00:14:13,020 --> 00:14:15,130
We get an exit status of one.

166
00:14:15,360 --> 00:14:19,410
And let's make sure that if we don't provide a file that this also works.

167
00:14:19,500 --> 00:14:22,360
Sure enough it says hey I can't open a file that doesn't exist.

168
00:14:22,710 --> 00:14:24,810
And we also get an exit status of one.

169
00:14:24,810 --> 00:14:29,080
So this completes the exercise walkthrough for this script.
