1
00:00:00,090 --> 00:00:05,930
In this session we are going to  talk about Data Persistence in Redis. We have two approaches

2
00:00:05,930 --> 00:00:14,720
One is called RDB and the other one is called AOF  and both are having its own advantages and disadvantages

3
00:00:14,750 --> 00:00:16,790
which we are going to discuss.

4
00:00:16,790 --> 00:00:20,340
So which actually brings a third approach,  which is called a Hybrid Approach.

5
00:00:20,360 --> 00:00:22,160
So you use both.

6
00:00:22,280 --> 00:00:25,260
You can use RDB with combination of AOF

7
00:00:26,060 --> 00:00:28,010
Okay.  So that is going to be the third approach

8
00:00:31,510 --> 00:00:38,410
so how RDB works is,  you enable RDB by making the change in the Configuration File.

9
00:00:38,790 --> 00:00:42,340
or  you can also use the Command Line to provide the

10
00:00:44,930 --> 00:00:48,020
Configuration related instruction using Redis cli.

11
00:00:49,100 --> 00:00:56,240
Okay so what RDB is going to do is,  it is going to perform the  point-in-time snapshot  of data which is

12
00:00:56,250 --> 00:01:04,900
available in Redis.  So by that what I mean is,  you define a, you know,  kind of Snapshotting policy and then

13
00:01:04,900 --> 00:01:11,740
based on whatever the Snapshotting policy,  which has been defined, Redis  is going to take a Snapshot

14
00:01:11,770 --> 00:01:16,050
of the Data which is available in that point of time.  Okay.

15
00:01:16,060 --> 00:01:24,000
Based on the configuration which you have specified and whenever you restart Redis or whenever you

16
00:01:24,000 --> 00:01:29,610
know some crashes occurs,  you can always restore from that point -in saved  Snapshot.

17
00:01:30,570 --> 00:01:30,900
Okay.

18
00:01:30,930 --> 00:01:36,670
So if you focus here on the second line,  which is called Save   60 then space 1000

19
00:01:37,140 --> 00:01:42,590
So this is one of the way to define RDB Snapshotting policy.

20
00:01:42,700 --> 00:01:46,590
So you tell Redis to save every 60 seconds.

21
00:01:46,630 --> 00:01:51,830
If there are 1000, you know,  keys  has been updated or keys has been inserted.

22
00:01:52,580 --> 00:01:52,860
Okay

23
00:01:57,080 --> 00:02:01,820
so you can have many number of, you know,  such configuration defined in the config file.

24
00:02:01,900 --> 00:02:09,110
We'll see how to refine these things,  but you can have something like one line, one piece of line is this

25
00:02:09,110 --> 00:02:17,000
and then you can have another line of code,  which says save every 5 seconds when you have 100000 keys

26
00:02:17,090 --> 00:02:23,090
inserted or updated and then you can have third line which says,  save every 1 second.  If you have 1 million

27
00:02:23,860 --> 00:02:26,720
keys inserted or updated.  So something like  that

28
00:02:31,090 --> 00:02:33,420
and then the second approach which is called AOF.

29
00:02:33,460 --> 00:02:42,260
So AOF  is just append only file. So what it does is,  basically you are telling Redis  is to make a

30
00:02:42,260 --> 00:02:48,130
note of all the operations,  all the right operations, which has been performed in Redis

31
00:02:48,320 --> 00:02:54,560
And basically it creates one file and it keeps on writing all the operation, which has been performed

32
00:02:54,560 --> 00:02:55,830
by Redis.

33
00:02:55,910 --> 00:03:04,480
So whenever Redis  goes down you actually use this file and you, you bring all the instructions.

34
00:03:04,480 --> 00:03:08,330
You basically replay whatever has been logged in AOF

35
00:03:08,900 --> 00:03:09,180
Okay.

36
00:03:09,200 --> 00:03:15,440
So when you are, let's say writing a Query to  Redis  by using Set  command. So you define a Set,

37
00:03:15,500 --> 00:03:17,880
you define a Key and you define a Value.

38
00:03:18,050 --> 00:03:23,480
So once you run that command,  the same command also going to get logged  in this AOF

39
00:03:23,480 --> 00:03:30,460
File.  So whenever Redis  is shut down and it is started again.

40
00:03:30,500 --> 00:03:32,590
Whatever is there in this AOF file.

41
00:03:32,630 --> 00:03:34,550
This is going to be played back.

42
00:03:34,670 --> 00:03:37,440
So that is a mechanism how Redis  is

43
00:03:37,490 --> 00:03:44,680
restoring all the data,  which is stored using  AOF mechanism. Ok,  in AOF

44
00:03:44,680 --> 00:03:52,740
we have something called Defining fsync policy.  So we can define how frequently we want to sync

45
00:03:52,930 --> 00:03:55,690
We want to keep writing the logs.

46
00:03:55,690 --> 00:03:59,810
Remember  whatever the policies which we define,  is going to impact the  performance.

47
00:04:00,250 --> 00:04:06,700
So if you are asking Redis  to, you know, always sync with AOF.

48
00:04:07,030 --> 00:04:13,000
So basically AOF resides on the Disk and whenever you are asking that for each operation which Redis is

49
00:04:13,010 --> 00:04:17,790
performing,  it should be writing to Disk,  then the performers is going to be very slow.

50
00:04:18,540 --> 00:04:19,200
OK.

51
00:04:19,330 --> 00:04:24,050
If you want Increase in Performance,  you can define a policy  which is Every Second.

52
00:04:24,310 --> 00:04:29,110
And then there is another policy,  which is called Never.  Ok in that case most of the Operating System

53
00:04:29,110 --> 00:04:35,450
itself performs commits,  which is usually around every 30 seconds. So the performance is going to be

54
00:04:35,450 --> 00:04:44,380
faster.  Okay,   we'll be discussing about all of these approaches and we will see how difference, you know,   performance

55
00:04:44,380 --> 00:04:45,650
which we are able to achieve.

56
00:04:46,060 --> 00:04:53,530
So the point is, there is a Persistence Mechanism available which is called RDB and RDB  is not

57
00:04:53,830 --> 00:04:59,040
you know,  you are not going to get  point of failure, in a data recovery.

58
00:04:59,040 --> 00:05:00,800
So it is Point In Time Data Recovery.

59
00:05:00,790 --> 00:05:08,450
So basically you define a Configuration and based on that you are going to be able to restore whatever

60
00:05:08,450 --> 00:05:10,440
the latest data available in that RDB.

61
00:05:14,190 --> 00:05:21,400
and AOF,  you can actually make it like,  almost point of failure, you know, you can restore, you can restore

62
00:05:21,400 --> 00:05:23,820
without losing any data in Redis.

63
00:05:29,120 --> 00:05:33,760
So now let's discuss more about RDB advantages and disadvantages.

64
00:05:33,860 --> 00:05:37,740
So RDB basically is a  Compact,  Single File.

65
00:05:38,150 --> 00:05:45,740
So by that what I mean is,  when you perform RDB Snapshotting,  Redis runs a child  process which is called

66
00:05:45,740 --> 00:05:46,290
Fork.

67
00:05:46,940 --> 00:05:54,460
So that process snapshots all the Redis data in a single RDB file,  which is also called

68
00:05:54,480 --> 00:05:55,220
RDB.dump. Ok

69
00:05:55,520 --> 00:06:02,710
that single file actually consist all of the data,  which is available in Redis. OK.

70
00:06:02,720 --> 00:06:11,180
So this is very good,  if you want to perform any external backups.  Let's say every 24 hours or  every 6

71
00:06:11,180 --> 00:06:17,680
hours.  You want to, you want to backup or  you want to perform External Backup.

72
00:06:17,800 --> 00:06:18,080
OK.

73
00:06:18,100 --> 00:06:24,730
So RDBs are  very good in those kind of scenarios and the performance of RDB backup is very good.

74
00:06:24,730 --> 00:06:30,670
The reason is,  there is no overhead on Redis. The  only thing which Redis has to do is just creating

75
00:06:30,670 --> 00:06:40,300
a child process,  which takes care of the Snapshotting and also when you  are backing up using  RDB,  the restart, when you are

76
00:06:40,300 --> 00:06:47,090
restarting Redis , it is going to be faster because it can quickly load whatever there  in RDB to

77
00:06:47,090 --> 00:06:48,670
Redis memory.

78
00:06:48,690 --> 00:06:56,800
OK so faster approach when you compare it with AOF and disadvantage in case of RDB is

79
00:06:56,830 --> 00:06:58,750
that these are Chances of Data Loss.

80
00:06:58,750 --> 00:07:03,290
So it is not, you know, not giving you point of failure of data recovery.

81
00:07:03,700 --> 00:07:10,060
So whatever the, whenever the latest snapshot has happened, you are going to restore only from that point

82
00:07:10,060 --> 00:07:13,600
of time.  You are not going to restore from the point of failure.

83
00:07:14,500 --> 00:07:21,810
So yes there is a chances of Data  Loss, when you are using only RDB approach and also when you

84
00:07:21,810 --> 00:07:32,290
have very huge Data Set.  Since RDB has to create a  Child Process or Fork Process to perform Snapshotting.

85
00:07:32,330 --> 00:07:40,670
So for  few milliseconds, it may has  to temporarily suspend or stop writing operations,  but it is only

86
00:07:40,670 --> 00:07:45,710
going to happen,  when you have really large Data set.  It doesn't,  you will not even notice when the Data set

87
00:07:45,980 --> 00:07:49,180
is not very large.

88
00:07:49,190 --> 00:07:50,780
Now let's discuss about AOF

89
00:07:51,170 --> 00:07:57,830
So AOF is Append Only File. Append Only File is highly durable,  when you compare it with RDB

90
00:08:00,490 --> 00:08:06,550
and in AOF you can configure different fsync  policies. So it gives you more flexibility when compared with

91
00:08:06,610 --> 00:08:07,090
RDB.

92
00:08:07,120 --> 00:08:12,520
So you can define how frequent you want your data to be saved or log to AOF.

93
00:08:15,450 --> 00:08:20,490
and in RDB,  usually we have chances of corruption,  when there is any power loss.  in AOF  the chances

94
00:08:20,490 --> 00:08:23,070
of corruption  is very less.

95
00:08:23,100 --> 00:08:26,950
The reason being is  AOF consist of

96
00:08:27,450 --> 00:08:31,230
So all the instruction,  which you have given to
Redis Line by Line.

97
00:08:31,230 --> 00:08:38,830
So basically if you ran 1000, you know,  Set  instructions AOF   is going to keep all those thousand lines.

98
00:08:38,880 --> 00:08:42,360
Line by Line and going to keep on appending in a single file.

99
00:08:42,360 --> 00:08:47,270
So in case there is a power outage,  the chances of corruption of that file is very less.

100
00:08:47,580 --> 00:08:52,560
And even though if  there is a corruption happens,  there is a tool which is called

101
00:08:52,560 --> 00:08:53,520
redis-check-aof

102
00:08:54,000 --> 00:08:57,840
You can use this tool to fix any problem with this AOF file

103
00:09:02,690 --> 00:09:09,230
and since AOF file is more like, you know, your data file or any other text file.  You can open this file

104
00:09:09,260 --> 00:09:16,540
and if you want to remove or you want to, you know, do not want to run some of the lines when Redis restarts

105
00:09:16,990 --> 00:09:19,010
you can remove those lines.

106
00:09:19,010 --> 00:09:24,740
This is especially going to be useful, when let's say accidentally you run Flushall or Flush command and

107
00:09:25,940 --> 00:09:28,670
then you just want to restore everything back from AOF.

108
00:09:28,760 --> 00:09:30,440
So when you run Flushall command,

109
00:09:30,880 --> 00:09:34,580
AOF is also going to make an entry o Flushall command  in AOF File.

110
00:09:35,500 --> 00:09:39,830
Okay but let's say when you want to restart data,  you don't want all the data to be flushed.  You can

111
00:09:39,830 --> 00:09:46,220
remove that Flushall line from this AOF File and then you can restore  from this AOF File

112
00:09:46,620 --> 00:09:46,890
OK.

113
00:09:46,910 --> 00:09:55,010
So basically this AOF File gives you liberty to pass the file and play with the, you know,  the set of instructions

114
00:09:55,010 --> 00:10:01,840
or the commands which are available in this file.  So if we discuss about disadvantages when compared

115
00:10:01,840 --> 00:10:02,900
to RDB.

116
00:10:03,220 --> 00:10:11,230
So usually the size of AOF  is going to be bigger than RDB.  RDB compresses or performs snapshotting, compresses

117
00:10:11,230 --> 00:10:11,670
the data.

118
00:10:11,680 --> 00:10:20,470
So the size is lesser when compared to AOF  and AOF  can also be slower than RDB,  again depending

119
00:10:20,470 --> 00:10:22,660
on the fsync  policies which you have defined.

120
00:10:22,660 --> 00:10:29,380
So if you have defined fsync  policy of Always Sync.   In those cases  all the instruction has to be

121
00:10:29,380 --> 00:10:36,440
written to AOF  and then AOF can grow bigger in size and AOF can be  slower OK.

122
00:10:36,470 --> 00:10:38,500
So let's see all of these,

123
00:10:38,520 --> 00:10:44,610
I mean both of these things in action and then we will also see how we can take advantage of Hybrid

124
00:10:44,610 --> 00:10:45,030
approach.
