How to backup MongoDB database using lvm snapshots – Part 1

In this series of blog posts for MongoDB replica sets, I will show you how to properly run full backups using lvm snapshot followed by incremental backups using the oplog. I will also cover restores with point in time recovery using the previous backup. Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to “hard links.” As the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result, the snapshot only grows as data is modified.
What is LVM
Logical volume manager – lvm is a program that abstracts disk images from physical devices. It provides a number of raw disk manipulation and snapshot capabilities, useful for system management. A logical volume snapshot is a copy-on-write technology that monitors changes to an existing volume’s data blocks. When a write is made to one of the blocks, the block’s value at the snapshot time is copied to a snapshot volume.
To be able to utilize lvm snapshot, your server must be using logical volume management, especially the partition where you mount your MongoDB data directory. Let’s see this with actual example:
I’m not going into details on how to configure lvm and how to create your volumes. It’s something that is already very well covered in different blog posts. I’ll quickly share sample output of pvdisplay, vgdisplay, lvdisplay from my test system. We will use the output in our backup commands later. Additionally try to separate your MongoDB directories from the rest of your file system so the backup will only have MongoDB related data.
root@mongodbhc:~# pvdisplay --- Physical volume --- PV Name /dev/sdc1 VG Name vgdiskdata PV Size <50.00 GiB / not usable 3.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 12799 Free PE 7679 Allocated PE 5120 PV UUID d1xGEN-mwW0-edRe-l1dF-SAjp-qmMA-1RdE9a
The above output from pvdisplay shows that there is a physical volume /dev/sdc1 and it’s used in a volume group vgdiskdata. Please note in this case we only have a single physical volume, but there could be more in your case.
root@mongodbhc:~# vgdisplay --- Volume group --- VG Name vgdiskdata System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 341 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 1 Max PV 0 Cur PV 1 Act PV 1 VG Size <50.00 GiB PE Size 4.00 MiB Total PE 12799 Alloc PE / Size 5120 / 20.00 GiB Free PE / Size 7679 / <30.00 GiB VG UUID d99fRZ-Naup-ixHj-GRYl-XIBp-tJgv-T958AO
The output of vgdisplay as shown above, has the volume group name, its format, size and other parameters like Free. From a total 50GiB we have <30GiB free. We can extend this volume group size if we add more physical volumes to it.
root@mongodbhc:~# lvdisplay --- Logical volume --- LV Path /dev/vgdiskdata/lvmongo LV Name lvmongo VG Name vgdiskdata LV UUID kJ2e5W-DVWM-rXwC-xEwn-H5kP-Ws0w-TtuI2L LV Write Access read/write LV Creation host, time mongodbhc, 2022-09-27 10:59:30 +0000 LV Status available # open 1 LV Size 20.00 GiB Current LE 5120 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 root@mongodbhc:~#
Finally, lvdisplay output shows the logical volume path, name and to which volume group it belongs to. See the name matches with our volume group vgdiskdata as that’s the volume group in which it’s created. Each time we create a snapshot from this logical volume lvmongo, it will be created in its volume group vgdiskdata. This is where the size of the volume group is important.
Now, if we want to see where this logical volume lvmongo is mounted to, we can run the command lsblk or use df. In my case, the MongoDB data directory is set as /mnt/mongo, so that is the file system path that I want to take backups on.
root@mongodbhc:~# lsblk /dev/sdc1
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdc1 8:33 0 50G 0 part
??vgdiskdata-lvmongo 253:1 0 20G 0 lvm /mnt/mongo
root@mongodbhc:~# df -hTP /mnt/mongo/
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/vgdiskdata-lvmongo xfs 20G 2.2G 18G 11% /mnt/mongo
Taking the snapshot
From the above output we can see the logical volume is mounted to /mnt/mongo mountpoint and its type lvm. So this is the partition on the file system that belongs to the logical volume /dev/vgdiskdata/lvmongo
Now that we have all of the information for file system partitions and logical volumes on our system, let’s create a lvm snapshot backup. We can restore our database in case of a failure scenario using this backup.
Before we create the snapshot, let’s use some best practices from MongoDB and lock the database for writes. Even though this is not required for lvm snapshot, taking this step ensures success of the backup and eventual restore by guaranteeing that writes do not happen during the snapshot process. As soon as we create the snapshot and mount as read-only on the file system, we will unlock the database. This will be also useful for us to get the oplog position as we want to take incremental backups after each full backup.
Typically we run backups on MongoDB Secondary nodes, so this should have zero impact. In particular, you could have a hidden node or delayed Secondary as a dedicated backup node in your replica set.
If you are running the command fsyncLock() in a mongo shell, it will look like this:
mongohc:SECONDARY> db.fsyncLock() { "info" : "now locked against writes, use db.fsyncUnlock() to unlock", "lockCount" : NumberLong(1), "seeAlso" : "http://dochub.mongodb.org/core/fsynccommand", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1666351737, 1), "signature" : { "hash" : BinData(0,"5cqLjc3Rr7I9/Y+8D+dqeGrUBCY="), "keyId" : NumberLong("7148394471967686661") } }, "operationTime" : Timestamp(1666351737, 1) } mongohc:SECONDARY>
You will most likely be using a script to perform this sequence of steps. This could be scripted as fsynclock.js file and then called with mongo client like this:
mongo -u<username> -p<password> --port <port> --quiet fsynclock.js
One additional note I’d like to highlight here for incremental backups using the oplog. If you want to take oplog backups between each full daily backup, you should take the oplog position now, when the database is locked and before you create the snapshot. This is something that you can perform using a script like below:
root@mongodbhc:~# cat backup_oplog_ts.js var local = db.getSiblingDB('local'); var last = local['oplog.rs'].find().sort({'$natural': -1}).limit(1)[0]; var result = {}; if(last != null) { result = {position : last['ts']}; } print(JSON.stringify(result)); root@mongodbhc:~# mongo -u<username> -p<password> --port <port> --quiet backup_oplog_ts.js > oplog_position
The oplog_position file will have information like below:
{"position":{"$timestamp":{"t":1666355398,"i":1}}}
We have full backup up to this point of time, for our incremental backups we will use this timestamp as a starting point. How to run the incremental backups will be part two of this blog series.
Our next command will create a snapshot using lvmongo as a source volume.
root@mongodbhc:~# lvcreate -L500M -s -n mongosnap_21oct2022 /dev/vgdiskdata/lvmongo Logical volume "mongosnap_21oct2022" created. root@mongodbhc:~#
Few notes on the above command:
-L specifies the size of the snapshot. We need to ensure there is enough space in the volume group where the original volume is created for the extra space we specify here. Even if we go above the available size, it will only use what is remaining in the volume group. This is important if you plan to keep your snapshot for a longer time on a system with heavy writes.
root@mongodbhc:~# lvcreate -L40G -s -n largesnap /dev/vgdiskdata/lvmongo Reducing COW size 40.00 GiB down to maximum usable size 20.08 GiB. Logical volume "largesnap" created. root@mongodbhc:~#
-n specifies the snapshot name, please note the name like snapshot is reserved and cannot be used.
-s specifies it’s a snapshot
Let’s check how the lvdisplay output looks now
root@mongodbhc:~# lvdisplay --- Logical volume --- LV Path /dev/vgdiskdata/lvmongo LV Name lvmongo VG Name vgdiskdata LV UUID kJ2e5W-DVWM-rXwC-xEwn-H5kP-Ws0w-TtuI2L LV Write Access read/write LV Creation host, time mongodbhc, 2022-09-27 10:59:30 +0000 LV snapshot status source of mongosnap_21oct2022 [active] LV Status available # open 1 LV Size 20.00 GiB Current LE 5120 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 --- Logical volume --- LV Path /dev/vgdiskdata/mongosnap_21oct2022 LV Name mongosnap_21oct2022 VG Name vgdiskdata LV UUID fv7BHZ-JClM-axTQ-AJUU-XTkP-DezD-c8e8Q2 LV Write Access read/write LV Creation host, time mongodbhc, 2022-10-21 10:57:57 +0000 LV snapshot status active destination for lvmongo LV Status available # open 0 LV Size 20.00 GiB Current LE 5120 COW-table size 500.00 MiB COW-table LE 125 Allocated to snapshot 0.31% Snapshot chunk size 4.00 KiB Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:3 root@mongodbhc:~#
As we can see, there is another logical volume with the name: mongosnap_21oct2022 and part of the same volume group vgdiskdata. For lvmongo we can see details about “Source of mongosnap_21oct2022, while for mongosnap_21oct2022, we can see “Active destination for lvmongo”
Additionally we can use the command lvs that has the output below. There are two logical volumes, belonging to the same volume group, but for the snapshot we can see its origin is lvmongo.
root@mongodbhc:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvmongo vgdiskdata owi-aos--- 20.00g mongosnap_21oct2022 vgdiskdata swi-a-s--- 500.00m lvmongo 3.25
The new snapshot can be used as logical volume and mounted on the file system. From there we can copy the files to offsite locations. Using this approach, we can restore the exact same copy of the database at the time we took the backup on any system.
The steps will include:
Mount the snapshot to tmp mount as a read only file system. This is important because with lvm snapshot, the data is changing. The snapshot is capturing the changes and that will be reflecting on the file system. If there are writes while we backup and copy the archive elsewhere, the database will be in a corrupted state when we try to restore.
mkdir /tmp/mongosnap mount -t xfs -o nouuid,ro /dev/vgdiskdata/mongosnap_21oct2022 /tmp/mongosnap/
Before we go further, the snapshot is taken, mounted as read only on the file system. Let’s unlock the database for writes and let the replication resume. Between the lock and unlock it should be a few seconds time, so this should not be any impactful on replication lag.
mongohc:SECONDARY> db.fsyncUnlock() { "info" : "fsyncUnlock completed", "lockCount" : NumberLong(0), "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1666351827, 1), "signature" : { "hash" : BinData(0,"/ZbNlG1binKXSO9f4trXCc2LdsE="), "keyId" : NumberLong("7148394471967686661") } }, "operationTime" : Timestamp(1666351737, 1) } mongohc:SECONDARY>
Or if you want to call this from mongo client, place the command in a fsyncunlock.js file and execute
mongo -u<username> -p<password> --port <port> --quiet fsyncunlock.js
Tar the directory with the designated name and move the tar backup file to backups location. This can be an NFS mount point, separate disk on the system, or even copied to a cloud bucket.
tar -czf mongodb_backup_$(date '+%Y%m%d%H%M').tar.gz -C /tmp/mongosnap/ . mv mongodb_backup_$(date '+%Y%m%d%H%M').tar.gz /backups/
Or
tar -czf /backups/mongodb_backup_$(date '+%Y%m%d%H%M').tar.gz --absolute-names /tmp/mongosnap
Finally, unmount the partition and remove the snapshot.
umount /tmp/mongosnap lvremove /dev/vgdiskdata/mongosnap_21oct2022
If you repeat this daily, you can have daily full backup and between each full backup take incremental backups using the oplog.
NEXT:
INCREMENTAL BACKUPS USING THE OPLOG is covered in Part 2 of this series.
RESTORES WITH POINT IN TIME RECOVERY is covered in Part 3 of this series.
Conclusion
Running LVM provides additional flexibility and enables the possibility of using snapshots to back up MongoDB. Using daily full backup followed by incremental oplog backup will allow you to do PITR. You can restore your database just before you execute an erroneous operation and recover your data with confidence.