WEBVTT 00:02.430 --> 00:06.130 Hi and welcome to the last episode of this curse. 00:06.140 --> 00:12.170 Today we will be building a distributed search engine a very little one to do so. 00:12.300 --> 00:19.880 We're going to be using struct slices in statesman for loops range classes channels guaranteeing select 00:19.910 --> 00:29.100 statements in a very very short amount of lines the search engine consists in one database which is 00:29.760 --> 00:37.110 collection of users as you can see the user is just a structure with an email and a name and the database 00:37.170 --> 00:40.370 is a collection of those with random bodies. 00:40.380 --> 00:46.800 So what we have to do is to implement some sort of function that given an input which is going to be 00:46.800 --> 00:51.810 an email it should return the name warns that even. 00:51.810 --> 00:52.110 Right. 00:52.800 --> 00:58.110 And we're gonna be using this to structure that we will call worker. 00:58.110 --> 00:59.640 So let's get started. 00:59.700 --> 01:07.890 The first thing we have to do is to create a worker structure so let's do time worker structure and 01:07.890 --> 01:14.300 this worker will have a copy of our database which will call users. 01:15.420 --> 01:20.920 So we have the users and let's build a constructor for these workers. 01:20.940 --> 01:28.090 So new worker is going to receive the database. 01:28.460 --> 01:31.580 And it will return our worker pointer. 01:31.680 --> 01:33.050 Right. 01:33.120 --> 01:41.030 So return worker and we'll pass the user as an argument. 01:41.030 --> 01:41.650 Right. 01:41.670 --> 01:47.490 And now we need to build defined function which is going to be a method of the worker structure. 01:47.570 --> 01:56.030 So there's gonna be worker and the find function will receive an email as a string and then will return 01:56.440 --> 01:57.210 a user. 01:57.230 --> 01:57.620 OK. 01:57.920 --> 01:59.600 So how do we do this. 01:59.630 --> 02:03.080 We need to iterate over our database. 02:03.080 --> 02:15.770 So what is going to be raise over not new users will have a user is going to be removed. 02:15.830 --> 02:18.730 Why is the index of users so is going to be like this. 02:18.730 --> 02:23.830 But as we want to return a pointer is going to be the address user. 02:23.840 --> 02:24.750 Right. 02:24.810 --> 02:26.020 And now we need to do the checking. 02:26.020 --> 02:35.130 So if user that email is equal to the e-mail that we just pass we will return the user and after if 02:35.130 --> 02:38.570 we didn't return anything which is return nil. 02:38.800 --> 02:39.960 Right. 02:40.070 --> 02:42.410 We have an error in line 3 4. 02:42.510 --> 02:44.300 Because this is plural. 02:44.360 --> 02:44.630 Yes. 02:45.120 --> 02:45.530 OK. 02:46.380 --> 02:49.030 Well we still can run anything because we know the main function. 02:49.140 --> 02:52.740 So let's write a main function. 02:52.740 --> 02:59.550 So first of all we need to create the worker because going to be new worker in the argument is gonna 02:59.580 --> 03:02.250 be the database. 03:02.380 --> 03:05.890 Think of all the database something as external perhaps. 03:05.920 --> 03:13.450 We're just putting in the code for this example and we can easily call w that fine and some e-mail right 03:13.780 --> 03:15.940 now where this e-mail came from. 03:15.940 --> 03:19.750 We're gonna take it from the arguments when we call the program. 03:19.750 --> 03:25.410 We can pass arguments and those arguments can be retrieved using the OS package. 03:26.020 --> 03:34.310 So let's important always package R's and let's say the first one is not number zero because number 03:34.310 --> 03:36.140 zero is the name of the program. 03:36.200 --> 03:37.770 So there's is gonna be. 03:37.970 --> 03:38.480 Right. 03:38.510 --> 03:47.520 So we will say log print looking for e-mail right. 03:47.560 --> 03:49.880 There's gonna be a user can happen. 03:49.880 --> 03:50.520 Two things. 03:50.530 --> 03:56.670 The first one is a user is not Neil which means we found something. 03:56.670 --> 04:07.050 So we will print the e mail is owned by this. 04:07.050 --> 04:13.120 We're going to pass user the email and user the name right now. 04:13.200 --> 04:17.850 If we couldn't find the e-mail we will say log print. 04:17.940 --> 04:22.260 The e-mail was not found. 04:23.490 --> 04:23.850 OK. 04:24.110 --> 04:26.680 And now we can try this thing. 04:27.040 --> 04:34.630 So let's go round and try to find some e-mail it says. 04:34.630 --> 04:35.320 OK. 04:35.350 --> 04:36.900 The e-mail was not found. 04:36.970 --> 04:42.210 If I type an invalid e-mail it's not found. 04:42.720 --> 04:47.700 But if I go to my database and I try to find for me estimate for example 04:51.620 --> 05:01.220 Smith's example dot com it says the e mail me that's made an example that com is owned by me as me so 05:01.690 --> 05:02.000 good. 05:02.420 --> 05:04.250 But this is not distributed at all. 05:04.250 --> 05:10.280 So we need to make these distributed and to do so we need to run multiple instances of worker at the 05:10.280 --> 05:18.290 same time Crawley give them a portion of the database to each of those so they can iterate over their 05:18.290 --> 05:25.900 own database so we all need to do some changes the first thing is to make these a synchronous. 05:25.890 --> 05:36.610 So for doing we will start by creating some channel so there's going to be a channel of user and the 05:36.610 --> 05:43.470 response of the method of defined function is not going to be returned as we're doing here in line 59 05:44.470 --> 05:49.090 but what will happen is that the worker will go right into the channel. 05:49.090 --> 05:56.980 So when it's about the channel we will pass it here and we are passing the channel here is going to 05:56.980 --> 06:03.620 be a channel of user and we need to create it and destruct. 06:03.650 --> 06:10.270 So it is going to be a channel of pointers to user and finally we do channel tunnel. 06:10.300 --> 06:10.780 Okay. 06:10.850 --> 06:11.750 Right. 06:11.850 --> 06:14.050 And now wanting to change a signature of this function. 06:15.010 --> 06:18.370 So instead of returning our user we're not returning it. 06:18.380 --> 06:21.990 We are just sending it to the child. 06:22.500 --> 06:30.490 We can remove the return and remove the return and we're not returning the user anymore. 06:30.490 --> 06:32.750 So you're gonna be doing this. 06:32.830 --> 06:37.990 And we also want these to run in the background because it's synchronized it's concurrent right. 06:38.710 --> 06:46.180 So this is gonna be running in the background and now we can safely wait for the channel to pass a message. 06:46.180 --> 06:51.820 So the user is not coming from the find function but it's coming from the channel. 06:51.820 --> 06:52.240 Right. 06:52.300 --> 07:00.830 So the fine will right into the channel whenever she can do it and we're gonna be listening here we're 07:00.830 --> 07:06.110 rich from the channel and we can come into this house because we don't need it anymore. 07:06.340 --> 07:09.490 So we have the user and we're printed right. 07:09.680 --> 07:10.980 An undefined channel. 07:11.000 --> 07:11.310 Yeah. 07:11.360 --> 07:16.420 Because this has to be a property of the worker. 07:16.870 --> 07:17.480 Okay. 07:17.620 --> 07:19.380 So let's try it again. 07:19.390 --> 07:19.780 Good. 07:19.810 --> 07:20.390 We have it. 07:20.640 --> 07:24.340 But what happens if we go and we pass a new valid email address. 07:24.340 --> 07:26.540 There is an error but it's not an error. 07:26.550 --> 07:29.280 We are controlling is kind of random errors. 07:29.290 --> 07:30.340 Go error. 07:30.340 --> 07:33.920 And what it's saying is that the Gore team is asleep. 07:34.030 --> 07:41.140 So the compiler is smart enough to detect that this function is gonna be waiting forever. 07:41.500 --> 07:43.560 Why is going to be waiting forever. 07:43.990 --> 07:49.470 Because dysfunction basically return and never wrote to the channels. 07:49.780 --> 07:56.160 So we need a way to say OK if after a while I can switch from the channel will timeout. 07:56.560 --> 07:56.850 OK. 07:57.250 --> 08:06.240 So to do so we will use the select statements and we will say OK in the case this channel return print 08:06.330 --> 08:14.410 is no way to defend a timeout and to do it we will use the time package and what time does is kind of 08:14.530 --> 08:23.020 after and if you see the signature of this function it receives a duration and returns a channel of 08:23.020 --> 08:24.320 time messages. 08:24.400 --> 08:29.820 So we'll say OK after one second we can assume that we couldn't find anything. 08:29.830 --> 08:34.550 So we'll reduce this thing and we'll print the e-mail was that far. 08:35.000 --> 08:35.380 OK. 08:35.650 --> 08:36.720 Let's try to run this. 08:37.430 --> 08:44.060 If you take a closer look to this there is one second between these first message and the second message 08:45.270 --> 08:51.420 so let's increment the time to three seconds for example and let's run again and there is three seconds 08:51.420 --> 08:57.830 between each of those which means that we are timing out for a fixed amount of time. 08:57.840 --> 09:02.420 Let's turn it back to one if we want shorter time we can set OK. 09:02.460 --> 09:09.430 Let's do it for a hundred measly seconds so the output is gonna be way faster. 09:09.430 --> 09:16.610 Now you see it's kind of good enough so you can even keep it for 100 milliseconds. 09:17.080 --> 09:17.610 OK. 09:17.710 --> 09:19.520 So we have a timeout now. 09:19.570 --> 09:24.340 Now this thing started looking like client server architecture where I make requests I waste of time 09:24.670 --> 09:28.360 and then get response is during specific amount of time. 09:28.360 --> 09:29.050 I got nothing. 09:29.050 --> 09:31.900 I can assume nothing happens. 09:31.900 --> 09:32.800 The e-mail was not found. 09:32.800 --> 09:33.220 Right. 09:33.280 --> 09:39.580 So now we're in pretty good shape to basically this to win this thing among many workers. 09:39.580 --> 09:47.460 So if we see the size of the database is one two three four 18. 09:47.500 --> 09:49.790 So we have 18 registries here. 09:49.870 --> 09:55.720 What I want to do is to give the first half to one worker and the second half of the data to the other 09:55.720 --> 09:58.600 workers so they don't have to repeat the job. 09:58.600 --> 10:02.530 It wouldn't make sense to create many workers doing exactly the same thing. 10:02.560 --> 10:09.790 So what I'm doing is what in databases they call shutting splitting the database into different portions 10:09.880 --> 10:14.060 and each of the workers will work in a particular portion. 10:14.080 --> 10:15.680 How do I do this. 10:15.760 --> 10:20.390 First of all literary factor this into a way simpler notation. 10:20.500 --> 10:26.980 So we're gonna do metal Cheney here on creating the worker but I'm not holding the reference to the 10:26.980 --> 10:27.520 worker. 10:27.970 --> 10:31.510 I'm just calling find out over this expression. 10:31.540 --> 10:31.890 OK. 10:31.960 --> 10:37.380 I'm gonna do the same for four days now instead of putting the whole database when I'm gonna do this 10:37.370 --> 10:45.110 passive slice of 0 to 9 and 9 to 18. 10:45.150 --> 10:52.950 So this is the same as doing 0 2 9 9 2 18 but you can do that with this syntax which is shorter. 10:53.130 --> 10:59.190 So I'm passing the first half of database to one worker and the second half to the other worker and 10:59.190 --> 11:01.290 the effect should be the same. 11:01.380 --> 11:04.190 Nothing was found and me asking me it was fine. 11:04.260 --> 11:04.620 Right. 11:05.720 --> 11:12.470 But think about these when you have millions of registries then you can take advantage of the distributed 11:12.530 --> 11:14.500 side of this implementation. 11:14.650 --> 11:14.930 OK. 11:15.350 --> 11:23.150 So again if I want to do it for three workers I could do something like from zero to six from six to 11:23.390 --> 11:26.450 twelve from 12 to the end. 11:26.450 --> 11:27.430 Right. 11:27.530 --> 11:29.580 And it's probably gonna work as well. 11:29.630 --> 11:33.330 Now just if you're curious you can pass. 11:33.350 --> 11:35.180 Let's say we want to name the workers. 11:35.180 --> 11:35.690 So 11:38.780 --> 11:41.740 number three here we could name them. 11:41.810 --> 11:47.810 But the need to receive a name here was going to be a string here same here. 11:47.810 --> 11:51.560 Name String and we can pass the name here. 11:51.800 --> 11:55.430 Name and one specific worker finds it. 11:55.490 --> 11:57.300 We can just print it here. 11:57.700 --> 12:01.570 Okay so print w the name right. 12:01.820 --> 12:03.650 And some station. 12:03.980 --> 12:08.210 Okay so it's I just want to know who was the worker. 12:08.290 --> 12:10.020 This is gonna be number three right. 12:11.160 --> 12:13.570 So let's try with Alex Hunter Davies. 12:13.590 --> 12:14.990 It should be worker number one. 12:15.620 --> 12:18.830 Alex Hunter the Davies number one right. 12:19.500 --> 12:21.370 So this is cool. 12:21.480 --> 12:26.670 We basically at this point we already have our distributed search engine. 12:26.670 --> 12:35.370 We basically split the database into three fractious segments and each of my worker is working over 12:35.640 --> 12:37.790 one of these subsets. 12:37.820 --> 12:43.790 Now this is pretty boring because if I don't know the exact email I'll probably can't find the name 12:43.790 --> 12:44.300 of the owner. 12:44.300 --> 12:52.550 So what I want to do now is to make this search engine a little bit more smart and pass team a putter 12:52.610 --> 12:54.540 like a fragment of the e-mail. 12:54.710 --> 12:59.610 It makes sure that I can get the answer back anyway. 12:59.610 --> 13:05.270 Now if I pass a fraction of the email I might get more than one answer. 13:05.390 --> 13:09.480 So let's see how we do with that as well. 13:09.530 --> 13:18.050 First of all instead of doing this comparison we will use the strings package in the string package 13:18.050 --> 13:29.610 has these functions contains and contains basically verifies that in this stream the email that this 13:29.610 --> 13:31.220 fragment is contained. 13:31.360 --> 13:31.780 Okay. 13:31.920 --> 13:39.260 And there is pretty much the same Eve user that email contains the key word that I'm gonna give you. 13:39.300 --> 13:40.520 Put it in the channel again. 13:40.860 --> 13:42.690 So let's try it. 13:42.780 --> 13:47.300 This is still working but unable to pass Alexander. 13:47.460 --> 13:51.840 And if you take a look Alexander is owned by Alexander Davies. 13:51.840 --> 14:02.200 So first I will need to use the email Yeah but here I have two e-mails with Alexander I have Alexander 14:02.210 --> 14:05.060 Davis and Jackson and I just got them one. 14:05.060 --> 14:07.610 So what happened after I read from this channel. 14:07.610 --> 14:09.220 The program basically quit. 14:09.290 --> 14:14.690 So the only thing that I need to do which you are really probably guess is I need to put this into a 14:14.690 --> 14:22.330 four and this is basically going to be reading from the channel all the time until I got to the timeout 14:22.450 --> 14:22.990 and that's it. 14:22.990 --> 14:24.740 It's that simple. 14:24.760 --> 14:27.910 So let me run this again. 14:28.260 --> 14:29.060 Yeah. 14:29.320 --> 14:32.920 What happened is I got the two Alex Sanders and then not found forever. 14:32.920 --> 14:40.020 So when I reached this point instead of printing not found I will say return basically quit. 14:40.360 --> 14:42.700 And now I have to Alexander's. 14:42.730 --> 14:50.170 And also I can be less specific and said OK give me all the e-mails that contain that sworn word and 14:50.180 --> 14:55.300 get Jackson Jackson on Robinson Mason and Jackson again. 14:55.300 --> 14:55.590 Right. 14:56.840 --> 15:06.140 So this is pretty much it for this chapter with cover mainly every feature with learn in the Go language 15:06.170 --> 15:14.360 so that you can see that it's pretty simple so we went through our own custom types constructors methods 15:14.990 --> 15:23.830 are passing channels go routines select statements range for us many stuff. 15:24.220 --> 15:28.560 So I think from now on you are able to start writing your own applications. 15:28.570 --> 15:31.180 You have all the tools you need. 15:31.270 --> 15:33.400 So I think that's it. 15:33.400 --> 15:34.180 Happy Hacking.