WEBVTT 0 00:02.590 --> 00:03.690 Welcome back! 1 00:03.730 --> 00:10.600 The empty file finder program works perfectly fine, but let's say it needs to process ten thousands 2 00:10.600 --> 00:12.610 of files in a directory. 3 00:12.610 --> 00:14.800 Let me show you how we can optimize it. 4 00:20.440 --> 00:21.280 In this program, 5 00:21.280 --> 00:23.050 everything is fine actually. 6 00:23.110 --> 00:30.750 However, one thing that I can optimize is allocating the backing array of the names slice only once. As 7 00:30.790 --> 00:31.770 you know, here, 8 00:31.870 --> 00:39.020 the program appends to the names slice in a loop. Doing so allocates a new backing array whenever the 9 00:39.020 --> 00:40.760 backing array becomes full. 10 00:41.210 --> 00:47.330 Instead, I can calculate the required size of the slice once, so that it will allocate the backing array 11 00:47.480 --> 00:51.530 only once. There are two ways to calculate the required size. 12 00:51.530 --> 00:57.110 The first one is using a heuristic like the average file name lengths across file systems and then 13 00:57.110 --> 01:01.120 multiplying it with the number of files inside the folder. 14 01:01.140 --> 01:06.240 The second way is calculating exactly how many bytes the program is going to need 15 01:06.240 --> 01:11.340 by looking at the name of the files. Let's take a look at the first way. 16 01:11.400 --> 01:14.910 Here is a table for the maximum file name lengths across 17 01:14.910 --> 01:20.790 common file systems. As you can see, on average it is 255 bytes. 18 01:20.790 --> 01:26.760 By the way, a single byte doesn't necessarily mean a single character, a single character can be more 19 01:26.760 --> 01:30.110 than one byte. However, to keep things simpler here, 20 01:30.180 --> 01:31.690 I'm going to assume it like so. 21 01:31.720 --> 01:40.170 OK, so I can multiply the number of files with this number 255 to find out the total required space. 22 01:40.230 --> 01:40.930 OK 23 01:41.190 --> 01:48.870 I'm also going to add a new line character after each file name. So, I'm going to add 1 to make it 256. 24 01:48.870 --> 01:55.000 Remember: This is a nil slice. So, it doesn't have a backing array until the program appends to it. 25 01:55.110 --> 02:01.470 But now, I've got the total required bytes, so I can allocate the backing array of the names slice once. 26 02:01.650 --> 02:02.460 To do that, 27 02:02.490 --> 02:09.420 I'm going to use the make function like so. Remember: When I give the length of 0 to the make function, I can 28 02:09.420 --> 02:12.930 easily add new elements to the beginning of the slice (using append()). 29 02:12.960 --> 02:15.910 Alright, let me run it, okay. 30 02:15.960 --> 02:17.670 It works perfectly. 31 02:17.670 --> 02:20.540 Now it allocates a backing array only once. 32 02:20.730 --> 02:27.300 Let me show you the total allocated bytes for the names slice by printing the capacity of the slice, 33 02:27.300 --> 02:31.990 like so. 34 02:32.240 --> 02:37.330 Let me also show you the total written bytes to the out.txt file. 35 02:37.580 --> 02:42.410 As you can see, I allocate an unnecessarily huge number of bytes. 36 02:42.680 --> 02:49.200 I allocate about one and half thousand bytes but I save only 33 bytes to the output file. 37 02:50.180 --> 02:51.860 To further optimize this, 38 02:51.860 --> 03:00.710 now, let's take a look at the second way. As you can see, for each file in the folder I reserve 256 bytes, 39 03:00.740 --> 03:03.130 no matter whether the file is empty or not. 40 03:03.380 --> 03:07.570 Let's optimize it further by calculating the exact space required. 41 03:07.910 --> 03:13.040 I can count the total length of every file name inside the folder to do that. 42 03:13.130 --> 03:16.670 I need to double this very same loop, like this. 43 03:16.670 --> 03:19.190 I don't know the total required bytes yet. 44 03:19.190 --> 03:22.030 So let me declare the total variable here, 45 03:22.070 --> 03:29.090 like so. Now I'm going to add the length of the next file name to the total like so. You know, the loop 46 03:29.180 --> 03:33.060 below also adds a new line after each file name. 47 03:33.350 --> 03:36.380 So I'm going to add one for the new line as well. 48 03:37.060 --> 03:39.410 Okay let me rerun it. 49 03:39.410 --> 03:45.500 As you can see, now the program only allocates 33 bytes and writes 33 bytes to the 50 03:45.500 --> 03:46.530 file. 51 03:46.580 --> 03:48.940 It is as efficient as it can be. 52 03:49.010 --> 03:53.870 Actually, this optimization isn't vital because this is a batch program. 53 03:53.870 --> 03:56.320 I mean a batch program runs only once, 54 03:56.360 --> 03:57.440 then it quits. 55 03:57.440 --> 04:03.860 However, sometimes you can use the same code in a long-running program such as a server. Especially for 56 04:03.860 --> 04:04.720 that case, 57 04:04.790 --> 04:07.330 you might want to optimize it as I did here. 58 04:07.340 --> 04:10.970 Otherwise, as I said before, you can trust the append function, 59 04:10.970 --> 04:12.590 most of the time... 60 04:12.790 --> 04:13.350 All right. 61 04:13.390 --> 04:14.170 That's all for now. 62 04:14.620 --> 04:15.820 Thank you for watching so far. Bye!