WEBVTT 00:09.600 --> 00:16.140 Welcome in his lecture, I'm going to show you how to make a very, very simple search engine for now, 00:16.140 --> 00:20.070 it will search the words inside the gibbons' think and let people curate. 00:20.520 --> 00:22.440 Now, let me show you how it's going to work. 00:24.210 --> 00:30.420 They say that you have a series of words or in other words, a slice of words like these, once, as 00:30.480 --> 00:33.670 you know, each element here is a street value, right. 00:34.530 --> 00:40.280 OK, let's also say that the user want to search inside these words by using a query like this. 00:40.950 --> 00:46.950 This query is also a slice of sorts so the user can create the system for multiple words. 00:47.400 --> 00:51.840 For example, here, the imaginary user searches for again and words. 00:53.270 --> 01:00.740 So the user searches for these highlighted words inside the words right, as you can see, there are 01:00.740 --> 01:02.970 multiple matches, for example. 01:03.050 --> 01:08.000 Again, word appears three times, the antwoord appears twice. 01:08.360 --> 01:10.700 Let me show you how to build this system. 01:10.730 --> 01:10.900 The. 01:18.510 --> 01:25.830 First, I'm going to put the words inside a constant named corpus here, lazy cat jumps again and again 01:25.830 --> 01:26.390 and again. 01:27.060 --> 01:32.580 By the way, what's inside the corpus is an important the important thing is the algorithm that I'm 01:32.580 --> 01:33.840 going to program in a minute. 01:35.240 --> 01:39.200 I put into a constant fear because I don't want this to sink obscurity called. 01:40.420 --> 01:40.990 All right. 01:41.180 --> 01:46.300 Saddam, previously, I'm going to split the corpus into its words, using the strings that fields function. 01:50.270 --> 01:53.920 Now, the words variable contains the verse is a slice of strings, right? 01:55.380 --> 02:02.670 As you can see here, think of the worst variable as a database where people can curate its contents, 02:02.670 --> 02:06.740 might have came from a file or from VEP or from network and so on. 02:07.110 --> 02:11.490 So it doesn't matter where its content comes from, as long as you have that corpus. 02:12.720 --> 02:17.730 All right, now I'm going to get the queries from the come line using the Oyster ARC's, as usual. 02:20.720 --> 02:23.810 Here I used the expression that I showed you before. 02:24.290 --> 02:27.650 Remember, this returns a new slice without its first item. 02:27.870 --> 02:28.220 OK. 02:29.590 --> 02:36.700 OK, now I'm going to type two nested loops using for the first one, we loop over the correct words 02:37.090 --> 02:40.990 so each query will be searched against the words the corpus. 02:40.990 --> 02:46.750 Our database, by the way, of course, this is not a perfect search algorithm, but it will help me 02:46.750 --> 02:49.090 to show you how a simple search algorithm works. 02:51.600 --> 02:56.460 Anyway, first, let me type the first loop that ranges or the slice like this. 02:58.220 --> 03:03.500 Well, to start here, I didn't use the index variable and I skipped it by using a blank identifier 03:03.500 --> 03:08.420 because I don't need it right in this loop, I'm only going to check or each credit card. 03:09.500 --> 03:11.450 All right, now let me show you how it works. 03:12.350 --> 03:14.090 Let me first come on top these words. 03:14.090 --> 03:21.800 We're both here to prevent unused verbal warning labels at a princella, here to print the correct words. 03:23.910 --> 03:26.220 Go around, mean that, go again and. 03:28.720 --> 03:32.020 Let's try again and this try weed out the words. 03:33.020 --> 03:34.810 OK, it pertains to words. 03:35.330 --> 03:36.020 Let's get back. 03:37.030 --> 03:41.770 Let me remove this printer line here and on comment, the words variable here. 03:42.900 --> 03:43.320 OK. 03:44.410 --> 03:46.690 Now, I'm going to at the next loop inside here. 03:48.060 --> 03:53.630 This will loop over the words variable, so inside these nested loop, I'm going to search for the each 03:53.650 --> 03:55.540 square it word in the word slice. 03:56.220 --> 03:56.520 OK. 03:57.450 --> 04:04.560 So how can I check whether security measures to avert inside the word less, I can use a simple if statement, 04:04.560 --> 04:06.090 right, let's do that. 04:07.070 --> 04:11.880 So when aquarium matches to avert, Dango will simply execute this if statement. 04:12.740 --> 04:18.500 Now I'm going to print a message which prints the position of the birds inside the corpus and the word 04:18.500 --> 04:19.070 itself. 04:20.050 --> 04:24.640 Now I'm going to type the position of the bird inside the words with the verb. 04:25.990 --> 04:31.210 Then I'm going to actually print the word using this cue, up next, I'm going to pass the position 04:31.210 --> 04:31.780 like this. 04:33.550 --> 04:35.380 Then I'm going to pass the word like this. 04:37.310 --> 04:39.680 I plus mom here, do you know why? 04:40.680 --> 04:46.890 I did so because I want to print the position of the word, not its index inside the slice, it is like 04:46.890 --> 04:49.500 one, two, three, not like zero one to. 04:50.450 --> 04:52.860 All right, let's run this now and see it in action. 04:53.510 --> 05:01.310 Let's try it again and again only and and without any arguments. 05:02.730 --> 05:03.900 OK, it works, right? 05:03.940 --> 05:08.310 Cool, however, as you can see, it brings back duplicate results. 05:09.160 --> 05:13.930 For example, when I search for the again, word brings multiple results, right? 05:15.820 --> 05:17.740 OK, let's talk about this a little bit. 05:19.060 --> 05:25.060 They say that the system should only return unique words, so it should keep searching for more words 05:25.060 --> 05:28.550 after it finds the words, for example, isn't here. 05:28.570 --> 05:31.290 The user searches for multiple words, right? 05:31.900 --> 05:38.580 So the system will only print the first matched words once, even though there are multiple matches. 05:39.190 --> 05:40.780 So it'll keep the duplicates. 05:41.620 --> 05:42.550 Who can do that? 05:44.180 --> 05:49.440 Well, in the search logic, you can use a brake statement and quit from the Knesset look when it finds 05:49.440 --> 05:49.960 the word. 05:50.270 --> 05:50.900 Let me show you. 05:53.090 --> 05:59.450 As I said, I need to break from the nested loop and find some of the corporate words, so to do that, 05:59.450 --> 06:01.550 I just need to add a brake statement here. 06:03.530 --> 06:04.370 OK, Illustrated's. 06:05.850 --> 06:10.980 As you can see, as soon as it finds the corporate world, it stops searching for the rest of the birds 06:10.980 --> 06:12.000 inside the corpus. 06:13.070 --> 06:17.630 The big statement terminates the nested loop after it finds the search words. 06:18.620 --> 06:21.290 Now it only prints the unique words cool. 06:23.980 --> 06:29.830 Now is the time I prepared a little exercise for you, you can find them all in the course of history, 06:29.830 --> 06:31.170 of course, here. 06:31.180 --> 06:32.890 I'm only going to show you one of them. 06:33.190 --> 06:33.900 Let's check it out. 06:35.600 --> 06:41.780 First exercise, his name is Case Insensitive Search, currently the word finder program doesn't return 06:41.780 --> 06:44.870 results for the words with different types, right? 06:46.450 --> 06:52.390 For example, he doesn't find the word when the user queries the word with uppercase letters, they 06:52.390 --> 06:54.730 say that the user runs the program like this. 06:58.630 --> 06:59.590 All like this. 07:04.440 --> 07:05.130 Ah, this. 07:08.220 --> 07:14.760 OK, here is your exercise for all the cases here, the word finder program should find the lazy keyword 07:14.760 --> 07:15.830 inside the corpus. 07:16.290 --> 07:18.310 Of course, this lazy keyword can change. 07:18.330 --> 07:19.440 This is just an example. 07:20.190 --> 07:25.580 OK, for more details about this exercise and for all the other exercises, please check out the car. 07:25.950 --> 07:27.210 Story has always. 07:28.250 --> 07:32.290 All right, there is only one step left labeled statements are important. 07:33.230 --> 07:34.470 OK, I hope to see you there. 07:34.610 --> 07:35.260 Bye for now.