Choosing a database for your valuable data is always a difficult process. You need to assess different aspects such as performance, scalability, features and robustness. The ultimate goal of this assessment is to increase the level of trust you have in the solution you choose. Among the important criteria to build up this trust is the ability for the solution to provide easy and reliable means of performing backups and restores of your data.
This article explains how you can perform backups and restores of data managed by Warp 10. We will cover both the standalone and distributed versions, so fasten your seatbelts!
The standalone version of Warp 10 uses the LevelDB library to manage the data it persists. We covered how LevelDB works in a previous post, but to present it simply, LevelDB manages a set of files under a single directory (configured via
leveldb.home in Warp 10).
It could be tempting to periodically backup those files and to use one of those backups in case something went wrong. Unfortunately you would risk creating corrupt backups as some of the files in the
leveldb directory are updated by the library as data are pushed. The only way you can ensure the integrity of the files in the
leveldb directory is by shutting down Warp 10 so the LevelDB database is closed, then you can backup the files safely.
This works but this is rather impractical as shutting down and restarting Warp 10 will imply reloading the metadata for all known series to populate the directory at startup. And even though this process is rather performant, it can take several minutes if you have millions of series.
To overcome this problem, Warp 10 ships with the capability to perform snapshots. Snapshots are what they imply, a coherent photograph of the LevelDB files.
Snapshots are created by setting up hard links to LevelDB files. Creating those links is an order of magnitude faster than copying them and consumes far less space. The constraint is that hard links can only be created on the same block device as the original file. The standard place where snapshots are created by Warp 10 is in a
snapshots subdirectory of the
Creation of a snapshot is fast because we only create hard links but also because we do not shut down Warp 10 completely. And therefore we do not need to reload the directory after the creation.
Two types of snapshots
Warp 10 can create two types of snapshots, full or incremental. Full snapshots will pause the LevelDB backend and will create hard links to all SST files, the
MANIFEST and the latest log file from the
leveldb directory. Incremental snapshots are based on another snapshot, LevelDB is paused, the list of files to hard link is established and those files which need to be linked but are not in the base snapshot are linked from the
leveldb directory, at which point LevelDB is unpaused. The rest of the files are linked from the base snapshot directory. The time during which LevelDB is paused is therefore reduced some more. By taking regular incremental snapshots, the pause time of each one can go down to a few milliseconds per snapshot.
Both types of snapshots are then completed with a copy of the configuration files.
Once a snapshot has been taken, its content can be copied to an external system for long term archival. Snapshots which were copied can be safely removed. The hard links will be updated accordingly, removing files which no longer have any remaining hard links. If performing incremental snapshots, you may want to retain one snapshot to serve as the base snapshot for the next incremental one.
Restoring a Warp 10 instance from a snapshot is as simple as stopping Warp 10, replacing the content of the
leveldb directory with that of the snapshot, replacing the configuration files by those in the snapshot and restarting Warp 10.
Taking a snapshot of a Warp 10 instance is done using the
warp10-standalone.sh script with the following syntax:
warp10-standalone.sh snapshot FULL
this will create a full snapshot named
FULL. The snapshot will be in the subdirectory
snapshots/FULL of your
To create an incremental snapshot named
INCREMENTAL based on a snapshot named
FULL, use the syntax:
warp10-standalone.sh snapshot INCREMENTAL FULL
The resulting incremental snapshot will be in
The distributed version of Warp 10 uses Apache HBase as a storage backend. HBase organizes its key space in key ranges called Regions. Regions are backed by files in HDFS.
As in the case of the standalone version, copying the underlying HDFS files is not a suitable way of performing backups as the integrity of the backed up data would then not be guaranteed.
In order to ensure you backup a coherent set of HDFS files for the data stored in HBase, you should first create a HBase snapshot. Note that due to the distributed nature of HBase, your snapshot may not reflect a global state of your data unless you stop the
Directory Warp 10 components during the time of the snapshot.
The following syntax in the HBase shell triggers a snapshot:
hbase> snapshot 'table-name' 'snapshot-name'
The resulting snapshot will be stored under
hbase is the base directory for your HBase cell. Note that a HBase snapshot does not copy data files. It simply creates a manifest listing the files belonging to the snapshot and marks those files as part of a snapshot so the cleaning mechanisms of HBase will not delete them.
Once your HBase snapshot is taken, you can export the snapshot to an external system. Exporting a snapshot may be a long operation as it involves copying the data files.
You could also use the HDFS snapshot capability to snapshot the
hbase directory after you have taken a HBase snapshot (and before you restart the
Directory components). This will provide an additional security measure as you would be able to restore a deleted HBase backup directly from your Hadoop cluster.
Restoring data from a HBase snapshot is very easy. You can replace an existing table with the content of a snapshot, or create a new table from a snapshot. In the latter case, by spawning dedicated
Egress components you could read the data from the snapshot using Warp 10.
The two methods described above create backups of the whole data stored in Warp 10. If you wish to create partial backups, such as backups for a given application, you need to use the
/api/v0/fetch endpoint, this will dump the selected data in a format suitable for later ingestion via
Such data dumps require access to a read token with access to the data you wish to dump. The time needed to dump the data will depend on the number of Geo Time Series™ and the amount of data as every single datapoint will have to be read. The good part is that those dumps do not require you to stop Warp 10, as a matter of fact they would not work if you did!
Whether you are using the standalone or distributed version of Warp 10, backing up and restoring your data is straightforward.
Don't wait until it's too late. Plan early and create periodic snapshots and backups of your data, your users will love you for that!
Co-Founder & Chief Technology Officer