1
00:00:03,260 --> 00:00:08,540
In this lesson you'll learn how to sort data using these sort and unique commands.

2
00:00:08,790 --> 00:00:13,680
Let's start out with some data we already have on the system which is in the ETSI password file.

3
00:00:16,380 --> 00:00:23,800
If you want to sort the contents of a file alphabetically you can use the sort command here you can

4
00:00:23,800 --> 00:00:30,690
see the line that starts with the box and is last because it is last alphabetically.

5
00:00:30,790 --> 00:00:35,470
We look at the top of the output will see what's first alphabetically so let me just pipe this to Les

6
00:00:36,100 --> 00:00:40,550
and we see that 20:00 is the first line because A's at the beginning of the alphabet.

7
00:00:40,750 --> 00:00:45,170
And for example V is near the end of the alphabet.

8
00:00:45,190 --> 00:00:47,440
Let me hit Q to get out of that view.

9
00:00:47,650 --> 00:00:49,630
If you want to reverse the order of the sort.

10
00:00:49,630 --> 00:00:51,280
Use the dash r option

11
00:00:54,240 --> 00:01:00,410
so now the user is last instead of first because we reverse the order.

12
00:01:00,660 --> 00:01:03,620
Let's see what happens when we use sort with numbers.

13
00:01:03,660 --> 00:01:08,350
I'm going to pull out the ID in the password file with the cut command.

14
00:01:08,580 --> 00:01:16,000
So I use a colon as a delimiter and the ID is in the third field.

15
00:01:16,010 --> 00:01:20,710
Now let's use the output of the cut command as standard input to the sort command.

16
00:01:20,840 --> 00:01:25,000
So here this is demonstrating that you don't have to run sort directly against files.

17
00:01:25,160 --> 00:01:29,610
It can accept standard input as well so we'll do this through a pipe.

18
00:01:29,670 --> 00:01:32,160
This might not be what you expected.

19
00:01:32,160 --> 00:01:35,190
The list is sorted but not numerically.

20
00:01:35,190 --> 00:01:39,040
We have 774 81 etc..

21
00:01:39,150 --> 00:01:45,300
However when working with numbers you probably want a numeric sort so that 7 comes first then 8 then

22
00:01:45,300 --> 00:01:47,010
74 then 81.

23
00:01:47,010 --> 00:01:54,480
So on to do that we can use the dash in option.

24
00:01:54,510 --> 00:02:00,720
Now we have the smallest numbers first and the largest numbers last of course you can reverse this order

25
00:02:00,720 --> 00:02:03,550
with a dash r option.

26
00:02:03,560 --> 00:02:06,000
Let's talk about the do you command quickly.

27
00:02:06,020 --> 00:02:08,490
It displays it disk usage.

28
00:02:08,630 --> 00:02:13,490
So let's see how much space is being used in Ford slash bar by the way they're going to be some files

29
00:02:13,490 --> 00:02:18,890
in there that are not readable by a normal user so I'm going to use sudoku to give us root privileges

30
00:02:18,890 --> 00:02:20,010
to look in there.

31
00:02:22,220 --> 00:02:28,280
You'll notice two columns the first column is a number that represents disk usage and by default this

32
00:02:28,280 --> 00:02:30,030
number is in kilobytes.

33
00:02:30,080 --> 00:02:35,270
The second column is the related directory that is using that particular amount of storage.

34
00:02:35,270 --> 00:02:41,360
Now let's perform a numeric sort to find out which directory and forward slash var is using the most

35
00:02:41,360 --> 00:02:43,880
space.

36
00:02:44,630 --> 00:02:49,160
Of course var itself is at the very bottom since var contains all the subdirectories within it.

37
00:02:49,160 --> 00:02:55,580
So it uses the most base but the one before that var lib is the directory in var that is using the most

38
00:02:55,580 --> 00:03:00,560
space and then above that var lib RPM Varla Yarm and so on.

39
00:03:00,620 --> 00:03:05,850
If we look at the forward slash var forward slash lib directory it says it's using ninety two thousand

40
00:03:05,840 --> 00:03:08,260
seven hundred and forty four kilobytes.

41
00:03:08,300 --> 00:03:14,300
If you don't want to see the size in kilobytes you can use the dash h option with you which makes it

42
00:03:14,300 --> 00:03:17,130
print the sizes in a human readable format.

43
00:03:21,760 --> 00:03:27,510
So now we see at the bottom 93 Meg for k 91 mag 83 megabytes and so on.

44
00:03:27,550 --> 00:03:32,380
If we try to sort this human readable data it's not really going to work how we would like it to work

45
00:03:32,580 --> 00:03:36,920
in this system and state that now.

46
00:03:37,010 --> 00:03:43,420
So at the bottom three lines for example you have 93 megabytes than 96 kilobytes and the 962 chiel bytes

47
00:03:43,430 --> 00:03:47,030
so it's not in the proper human readable order.

48
00:03:47,330 --> 00:03:49,660
And we could even try this with the dash and option.

49
00:03:49,670 --> 00:03:56,870
And again we have 700 K 800 Kane on her 72 K which are smaller amounts than those megabyte numbers that

50
00:03:56,870 --> 00:03:58,360
we were just talking about.

51
00:03:58,570 --> 00:04:04,250
Now the good news is that sored has a dash HD option that performs a human numeric sort.

52
00:04:04,400 --> 00:04:09,710
It understands that a number that ends in a is a gigabyte number that ends in a capital M is a megabyte

53
00:04:09,710 --> 00:04:10,480
and so on.

54
00:04:13,750 --> 00:04:19,990
So now we have 93 am at the very bottom then 91 then 83 then 5 megabyte and so on.

55
00:04:20,000 --> 00:04:27,040
And in the case the smaller sizes are at the top of this sorted list so this sort Dasch h works with

56
00:04:27,040 --> 00:04:29,350
human readable numbers.

57
00:04:29,350 --> 00:04:34,660
In a previous lesson we use the net step command to display open ports and let's just walk through that

58
00:04:34,660 --> 00:04:35,560
again.

59
00:04:35,560 --> 00:04:42,750
So we have net step with the dash and to display numbers instead of port names you for a UDP for TCAP

60
00:04:43,090 --> 00:04:49,330
and L for listening port so we enter that and then we can see we have some data here and the data that

61
00:04:49,330 --> 00:04:52,130
we're most interested in is in the fourth column.

62
00:04:52,360 --> 00:04:58,180
And also we need to remove the headers and so one way we talked about doing that was to look for something

63
00:04:58,180 --> 00:05:02,110
common and all the lines and hear a common thing is a colon.

64
00:05:02,110 --> 00:05:04,990
So let me just script or a colon.

65
00:05:05,530 --> 00:05:10,480
And now we have that separated out the headers out of her way again like I said the data that we're

66
00:05:10,480 --> 00:05:12,020
after is in the fourth column.

67
00:05:12,160 --> 00:05:17,680
So he can use Alk. print dollar sign for to get that for us.

68
00:05:17,690 --> 00:05:25,040
Now all we can do is get the port numbers with awk because they're all in the last IFIELD fields being

69
00:05:25,040 --> 00:05:32,310
separated by Holand in this instance so he can do this on Desch f separate like Holen and then print

70
00:05:32,340 --> 00:05:33,910
dollar N.F..

71
00:05:33,930 --> 00:05:37,150
So now we're left with a list of ports.

72
00:05:37,290 --> 00:05:43,080
However they're not sorted so let's fix that and they're all numbers we can use sort ash end to sort

73
00:05:43,080 --> 00:05:44,080
numerically.

74
00:05:44,430 --> 00:05:50,820
Now we have a sorted list but there are duplicate Portes sort can handle this situation too with its

75
00:05:50,820 --> 00:05:57,600
Dasch you option and Dasch you stands for unique and it only displays a line if it has not been previously

76
00:05:57,600 --> 00:05:58,750
displayed before.

77
00:05:58,950 --> 00:06:02,660
So let's use dash in dash you and hit enter.

78
00:06:02,660 --> 00:06:05,950
So above we had 22 22 25 25.

79
00:06:05,970 --> 00:06:11,640
But the output below when we use the dash you option is 20 to 25 and so on.

80
00:06:11,740 --> 00:06:16,720
By the way we don't have to combine the dash you option with the dash in option we can use dash you

81
00:06:16,720 --> 00:06:17,510
on its own.

82
00:06:17,680 --> 00:06:21,760
So here we have a unique list of ports they're just not numerically sorted.

83
00:06:21,820 --> 00:06:23,700
In addition to sort's dash you option.

84
00:06:23,710 --> 00:06:29,290
There's a command called unique that spelled you and I Q Which does something very similar to the dash

85
00:06:29,290 --> 00:06:35,470
you option with unique however the lines coming to it have to be sorted in order for it to work because

86
00:06:35,710 --> 00:06:38,430
it only compares the current line to the previous line.

87
00:06:38,440 --> 00:06:46,330
So let me show you let's do a sort dash in and then pipe that two unique.

88
00:06:46,350 --> 00:06:49,800
So now we have a unique list of ports with no duplicate.

89
00:06:49,860 --> 00:06:53,380
And just to show you that unique doesn't work with an unsorted set of lines.

90
00:06:53,380 --> 00:06:58,790
Let's do that let's remove out this sort here and see what happens then.

91
00:06:58,800 --> 00:07:02,380
So now we have 22 25:22 Well 22 is a repeat.

92
00:07:02,610 --> 00:07:06,530
And it didn't get extracted because it's comparing 22 to 25.

93
00:07:06,810 --> 00:07:11,300
But if we had sorted it it would have been 22 22 and then unique would have noticed.

94
00:07:11,320 --> 00:07:12,510
Ah that's a duplicate.

95
00:07:12,510 --> 00:07:15,960
So I'm only going to print the first one and not the second one.

96
00:07:16,050 --> 00:07:20,820
So at first glance this might seem like an extra step like why would you ever want to use the unique

97
00:07:20,820 --> 00:07:25,710
command if you have to give it sort of data anyway and sort already has it as you option.

98
00:07:25,740 --> 00:07:32,500
Well when you want to know how many occurrences of each line there were use unique Dasch see.

99
00:07:32,550 --> 00:07:40,120
So let's go up here are sort dash and first two unique and let's add the dash see option the first column

100
00:07:40,120 --> 00:07:45,970
is the number of times the line appeared in the output followed by the line or the output itself.

101
00:07:46,270 --> 00:07:49,450
So here we can see there were two instances of 22.

102
00:07:49,450 --> 00:07:51,230
Two instances of 25.

103
00:07:51,370 --> 00:07:54,090
One instance of 68 and so on.

104
00:07:54,100 --> 00:07:59,530
Let's say you want to find out how many ysis log messages a program is generating and you can do that

105
00:07:59,530 --> 00:08:01,030
by doing something like this.

106
00:08:01,030 --> 00:08:09,600
So let's look at the data we're working with at var log messages and if we count the fields here 1 2

107
00:08:09,600 --> 00:08:16,620
3 4 5 the fifth field contains the application name or the program name that is writing to Sisler.

108
00:08:16,620 --> 00:08:17,990
So let's pull that out.

109
00:08:22,640 --> 00:08:28,010
So here you can see System D command that NSU command back to System D at the bottom you see system

110
00:08:28,070 --> 00:08:29,510
s log in and so on.

111
00:08:30,390 --> 00:08:37,980
So let's sort this list and now let's feed this list to unique and get a count.

112
00:08:38,030 --> 00:08:43,730
So here we have some counts one occurrence of LVM eight of network eighty six of network manager and

113
00:08:43,730 --> 00:08:44,480
so on.

114
00:08:44,600 --> 00:08:50,660
But let's say we want to sort this output So let's take the output from unique and run it back through

115
00:08:50,660 --> 00:08:54,110
sort so we can do this sort dash in.

116
00:08:54,110 --> 00:08:59,950
So now we know there were 342 messages generated from the Currall 310 from System D and so on.

117
00:09:00,050 --> 00:09:04,760
You can apply this to all sorts of situations where you want to know how many occurrences of something

118
00:09:04,760 --> 00:09:05,860
there are.

119
00:09:05,930 --> 00:09:10,970
For example if you want to know what IPs are hitting your web server the most you can strip out the

120
00:09:10,970 --> 00:09:16,940
IP addresses sort them feedom two unique Dessie and then you'll end up with a count of hits by unique

121
00:09:17,000 --> 00:09:18,530
IP address.

122
00:09:18,530 --> 00:09:23,380
While we're on the subject of counting I want to spend just a quick minute here on the wc command.

123
00:09:23,420 --> 00:09:29,570
You can think of it as standing for word count but it not only counts words it can count bytes characters

124
00:09:29,600 --> 00:09:30,860
and lines.

125
00:09:30,860 --> 00:09:34,390
Personally I end up using the line count option most often.

126
00:09:34,550 --> 00:09:39,210
So let's provide the ETSI password file as an argument to the wc command.

127
00:09:41,660 --> 00:09:44,870
The first column is the number of lines in the file.

128
00:09:44,870 --> 00:09:50,600
The second column is the number of words and the third column is the number of characters just to be

129
00:09:50,600 --> 00:09:51,040
clear.

130
00:09:51,040 --> 00:09:57,650
ABC really doesn't understand language it considers a word to be any non zero length sequence of characters

131
00:09:57,920 --> 00:09:59,790
delimited by a white space.

132
00:09:59,810 --> 00:10:02,870
We can make WC display a word count with Dash w

133
00:10:06,370 --> 00:10:07,940
just a byte count with S C

134
00:10:11,650 --> 00:10:17,970
and finally just a line count would dash w.

135
00:10:18,030 --> 00:10:22,650
This tells us that there are twenty five accounts on the system because there's one account on each

136
00:10:22,650 --> 00:10:25,050
line in the ETSI password file.

137
00:10:25,050 --> 00:10:29,310
Let's say you wanted to know how many accounts are using the bash shell.

138
00:10:29,370 --> 00:10:36,450
First you could display the lines that match the pattern Bash with the grip command.

139
00:10:36,480 --> 00:10:40,380
Maybe this isn't the greatest example because we can quickly count that there are two lines but let's

140
00:10:40,380 --> 00:10:45,360
say you have hundreds of accounts on a system and there's a lot more output and you can just visually

141
00:10:45,360 --> 00:10:50,730
see and recognize at that moment then what you would want to do is let me count the number of lines

142
00:10:50,730 --> 00:10:51,610
in the output.

143
00:10:51,690 --> 00:10:57,270
So you feed the output of the command into ABC with a dash l option.

144
00:10:57,270 --> 00:11:01,320
Now I know someone is going to bring this up if I don't put it in the video so I just want to be clear

145
00:11:01,530 --> 00:11:08,150
and say that in this particular situation you can also use the dash see option for grip to perform a

146
00:11:08,150 --> 00:11:14,730
count so we can do this grep deceiver account of how many lines contain bashing the ETSI password file.

147
00:11:14,730 --> 00:11:20,400
However if you're not using a command that performs a count then you'll end up having to pipe that output

148
00:11:20,400 --> 00:11:23,190
TWC to perform the count for you.

149
00:11:23,220 --> 00:11:25,170
OK let's get back to sorting.

150
00:11:25,180 --> 00:11:31,290
There's one last option to sort that I want to cover before we wrap things up and that option is Dasch

151
00:11:31,290 --> 00:11:35,120
K which allows you to specify a sort key.

152
00:11:35,130 --> 00:11:38,820
So far we've been sorting on the very first bit of data in a line.

153
00:11:38,820 --> 00:11:44,460
If you have data separated into multiple fields perhaps you want to sort on a field other than the first

154
00:11:44,460 --> 00:11:45,010
one.

155
00:11:45,030 --> 00:11:47,160
So let's go back to our password file.

156
00:11:47,160 --> 00:11:55,310
So that's just canit now would say we want to sort the NC password file based on ID and ID is in the

157
00:11:55,310 --> 00:11:59,970
third column with each column being separated with a colon by default.

158
00:11:59,990 --> 00:12:03,010
Sort uses white space is a field separator.

159
00:12:03,230 --> 00:12:06,910
So to tell sort to use a colon we need to use the Dashti option.

160
00:12:07,070 --> 00:12:10,580
Then we can use the dash key option to provide a sort key.

161
00:12:10,610 --> 00:12:11,320
The simplest sort.

162
00:12:11,330 --> 00:12:20,220
He is a number which represents the field to sort by so AK as the password hell sort to use a colon

163
00:12:20,220 --> 00:12:22,260
as a field separator and tell it to.

164
00:12:22,290 --> 00:12:23,590
On the third field.

165
00:12:23,610 --> 00:12:26,200
This third field happens to be comprised of numbers.

166
00:12:26,250 --> 00:12:32,120
So we're going to use a numeric sort with Nash in of course we can combine this with other options like

167
00:12:32,120 --> 00:12:34,760
Dasch are for a reverse sort as well.

168
00:12:34,880 --> 00:12:41,570
So as we all know the root account has a id of 0 and then you can see the account here on this particular

169
00:12:41,570 --> 00:12:48,000
system with YOU DO YOU HAVE ONE IS been Damon has a new idea to add man has 93 and so on.

170
00:12:48,320 --> 00:12:54,560
Let's do a little demonstration on how to analyze a web server log file using sort and unique.

171
00:12:54,560 --> 00:13:00,410
Let's say you want to know how many times a particular you are always visited first let's look at what

172
00:13:00,410 --> 00:13:01,370
we have to work with.

173
00:13:01,370 --> 00:13:07,590
So I have a access log file here and forward slash vagrant.

174
00:13:07,630 --> 00:13:11,110
My first goal is to extract the L portion from the file.

175
00:13:11,110 --> 00:13:13,260
Now there are multiple ways to do this.

176
00:13:13,270 --> 00:13:18,670
However what I notice is that the u r l is contained within a set of quotation marks.

177
00:13:18,820 --> 00:13:24,790
So let me split on that and see where that takes us to we'll just feed this to the cut.

178
00:13:24,820 --> 00:13:29,640
We'll use double quotation marks as a delimiter and we'll print the second field.

179
00:13:30,720 --> 00:13:31,940
And hit enter.

180
00:13:33,180 --> 00:13:36,200
By the way I didn't have to cast that into cut like I did here.

181
00:13:36,300 --> 00:13:38,400
What I could do is actually supply that file to cut.

182
00:13:38,400 --> 00:13:39,510
Let me just do that here now.

183
00:13:39,510 --> 00:13:42,670
Kurt desde.

184
00:13:42,730 --> 00:13:46,040
To access Lague and we get the same result.

185
00:13:46,270 --> 00:13:51,060
Now it looks like we're left with three columns of data separated by a single space.

186
00:13:51,130 --> 00:13:56,290
The second column has the U R L in it and let's pull that out so we can do this with the cut command

187
00:13:56,290 --> 00:13:56,970
as well.

188
00:14:01,340 --> 00:14:05,580
Again if you saw things differently or think in a different way that's perfectly fine.

189
00:14:05,600 --> 00:14:11,090
Perhaps your mind went to counting the column numbers first something like this so let me just get the

190
00:14:11,090 --> 00:14:12,310
access log here.

191
00:14:12,500 --> 00:14:17,960
And the first column is an IP address so let's call them 1 dashes call to the dash column 3.

192
00:14:17,960 --> 00:14:21,640
OK so one two three four five six seven.

193
00:14:21,640 --> 00:14:26,800
It looks like on the 7th column is where you are Ellas contain and let's test that.

194
00:14:26,800 --> 00:14:31,260
So let's do ahk print dollar signs 7 access log.

195
00:14:31,500 --> 00:14:31,770
OK.

196
00:14:31,760 --> 00:14:34,040
And so we end up with the same data.

197
00:14:34,070 --> 00:14:38,460
I just want to be clear that there is no one exact perfect way to do this.

198
00:14:38,510 --> 00:14:44,270
So I'd just use whatever makes sense to you and however you visualize the data just keep extracting

199
00:14:44,270 --> 00:14:48,190
parts of it and transforming it until it looks like what you need.

200
00:14:48,440 --> 00:14:48,750
OK.

201
00:14:48,770 --> 00:14:50,090
So let's continue on.

202
00:14:50,090 --> 00:14:53,570
Let me go back to my command here using the cut.

203
00:14:53,570 --> 00:14:59,840
Now I want to count the number of times each one of those you or else was visited.

204
00:14:59,840 --> 00:15:04,400
So I know what I can do that with a unique command and I also know that I need to provide unique with

205
00:15:04,400 --> 00:15:05,790
sort of data first.

206
00:15:05,810 --> 00:15:11,960
So when I have to do is pipe this to sort and then I can go back and pipe it through unique with the

207
00:15:11,970 --> 00:15:14,540
dashi option to let it counted for me.

208
00:15:14,780 --> 00:15:18,740
So now we have a count in column 1 and the you are all in column 2.

209
00:15:18,770 --> 00:15:24,470
So now what we can do is actually sort this uniquely counted output with the sort command.

210
00:15:24,470 --> 00:15:29,950
So let me bring in my command and run this through sort dash in.

211
00:15:30,020 --> 00:15:38,640
So it looks like we had 1271 visits to Fort slash W.P. ad man 1265 to Fort slash Explorer and so on.

212
00:15:38,690 --> 00:15:44,840
Now I would say we only want to display the top three most visited or else and we can do this simply

213
00:15:44,840 --> 00:15:49,820
by displaying the last three lines of output with a tell command so will take this big long command

214
00:15:49,820 --> 00:15:55,100
that we've been building up and will pipe it yet again to another command and this command has tail

215
00:15:55,310 --> 00:15:57,490
dash 3 to print the last three lines.

216
00:15:57,500 --> 00:16:04,130
So here you see the three most visited you or else according to that access log file.

217
00:16:04,130 --> 00:16:08,240
Now I'm going to go ahead and put that command into a script so I don't have to solve this same problem

218
00:16:08,240 --> 00:16:09,450
again.

219
00:16:09,470 --> 00:16:14,430
So I'm actually going to copy it then I will edit my script here.

220
00:16:18,670 --> 00:16:21,860
Start with the Shubane We'll tell what the script is doing.

221
00:16:31,740 --> 00:16:34,380
So let the user pass in the log file.

222
00:16:36,260 --> 00:16:40,070
And we want to make sure that that log file exists so we can do a quick check here.

223
00:16:45,310 --> 00:17:00,740
So this says if not exist log file then we'll give them an error message.

224
00:17:00,750 --> 00:17:07,320
Now I can paste my command that I copied earlier and then I'll just change this to be the log of file

225
00:17:07,350 --> 00:17:08,750
variable.

226
00:17:08,890 --> 00:17:14,810
Let me see my changes make my script executable and then give it a try.

227
00:17:14,810 --> 00:17:18,250
So pass no data or can't open anything.

228
00:17:18,370 --> 00:17:22,650
OK I can open a SDF because that doesn't exist.

229
00:17:22,810 --> 00:17:26,430
So let's actually give it a path to our file that does exist.

230
00:17:26,650 --> 00:17:32,330
And here we go it runs the command against what we provide it giving us the three most visited your

231
00:17:32,410 --> 00:17:33,850
else in that file.

232
00:17:34,120 --> 00:17:39,370
So to recap in this lesson you learn how to sort data using the sort command you learn how to use the

233
00:17:39,370 --> 00:17:41,980
dash in option to sort numerically.

234
00:17:41,980 --> 00:17:46,250
You also learned how to use the dash r option to reverse the sort order.

235
00:17:46,330 --> 00:17:52,230
From there you learn how to display only unique lines using sort Dasch you and the unique command.

236
00:17:52,330 --> 00:17:55,750
You also learned how to count items with a wc command.

237
00:17:55,750 --> 00:18:02,010
Finally you use the Dashti and dash options to sort data based on a specific field.
