LevelDB is the storage library used in the standalone Warp 10 version. Learn how it works and how to read its LOG.
The standalone version of Warp 10 persists its data on disk using the LevelDB library. LevelDB is an open source library created by Jeff Dean and Sanjay Ghemawat at Google. It implements a Log Structured Merge Tree which provides very fast writes, independent of the size of the database, and scan capabilities which make LevelDB the ideal storage library for a time series database.
LevelDB is very robust and rather simple, but understanding how it works could benefit from a clear explanation. This is what this blog post aims at doing, providing you with sufficient knowledge to understand how LevelDB behaves and ultimately how your Warp 10 instance performs.
Data pushed to LevelDB ends up in files called SST files, SST for Sorting String Table. An SST is a lexicographically sorted list of key/value pairs where both keys and values are byte arrays. SST files are immutable, meaning that once created they will not be altered. They may be removed during a process called compaction. SST files are all the files with suffix
.sst in your
leveldb directory. SST files are organized in levels.
Having a Log-Structured Merge-Tree architecture, LevelDB does not store data directly in SST files. It stores new key/value pairs mutations (either additions or deletions) in a log file. This log file is the file with the
.log suffix in your
leveldb directory. The data stored in the log file is also stored in a memory structure called the memtable.
When the log file reaches a certain size (around 4 MB), its content is transferred to a new SST file and a new log file and memtable are initiated, the previous memtable is discarded. Those fresh SST files are collectively called the level 0 files. Files at level 0 are special because their keys can overlap since they are simply copies of the various log files.
When the number of files at level 0 reaches a threshold (around 10 files), a compaction is triggered. Compaction will chose a set of overlapping files at level 0 and a set of files they overlap at the next level (level 1) and will merge the data in those files to create a new set of SST files at level 1. The chosen SST files are then discarded.
LevelDB continuously inspects files at each level and triggers compactions when the number of files or the total size at a given level goes beyond a set threshold. LevelDB manages 7 levels of files. The list of current SST files is kept in a
MANIFEST file. The id of the current MANIFEST file is stored in the
When reading data, the set of SST files to access is retrieved from the data in the MANIFEST and the required files are opened and the data to read is reconciled from those files and the current memtable, managing overwrites and deletions.
All operations performed on the SST, log and MANIFEST files are logged in the
Understanding the LevelDB LOG entries
LOG file in your
leveldb directory is a rich source of information to understand how your Warp 10 instance is behaving.
LevelDB start up
When the LevelDB database is opened, the current
.log file is recovered and converted to an SST file, a new
.log file is initiated and a new
MANIFEST file is created with updated information about all the known
.sst files. While scanning the SST files upon startup, LevelDB may decide to remove unreferenced files.
When opening a LevelDB database with the following files:
CURRENT LOG MANIFEST-000839 .... plenty of .sst files .... 000837.sst 000841.log 000842.sst
The following entries will appear in the new
LOG file (the previous LOG file will be renamed
2019/09/05-09:54:01.235885 7000045d0000 Recovering log #841 2019/09/05-09:54:01.237443 7000045d0000 Delete type=0 #841 2019/09/05-09:54:01.237509 7000045d0000 Delete type=3 #839
The first line says the log file was recovered, since it was empty in our case, it did not lead to the creation of a new SST file. Then the log file (
type=0) was deleted and the old MANIFEST (
type=3) was removed. The
leveldb now contains the following files:
000837.sst 000842.sst LOG.old CURRENT 000844.log MANIFEST-000843 LOG
You can see a fresh log file (with id 844) and a fresh MANIFEST (with id 843). The
CURRENT file contains the full name of the MANIFEST file (
As you push data, the
.log file fills up and will reach a point when LevelDB will decide to rotate it. When this happens, the following LOG entries will appear:
2019/09/05-10:00:07.976574 7000043c7000 Level-0 table #846: started 2019/09/05-10:00:08.022811 7000043c7000 Level-0 table #846: 3264502 bytes OK 2019/09/05-10:00:08.024013 7000043c7000 Delete type=0 #844
The first line indicates that a new SST file (with id 846) was initiated at level 0. The second line, 50ms later, states that 3,264,502 bytes were transferred to it from the log file being rotated. Then the third line indicates that the log file that was just rotated was deleted. The new log file was created just before the creation of the new SST file and thus has the id 845. The files in the
leveldb are therefore:
000842.sst LOG.old CURRENT 000846.sst 000845.log LOG MANIFEST-000843
After writing data for a while, SST files at level 0 will reach a threshold which will trigger a compaction. LevelDB will then select files at level 0 and level 1 and will merge them, creating new SST files at level 1.
The LOG entries for compaction look like this:
2019/09/05-10:12:29.932735 7000043c7000 Compacting 3@0 + 5@1 files 2019/09/05-10:12:29.935664 7000043c7000 Generated table #855: 1 keys, 299 bytes 2019/09/05-10:12:29.953873 7000043c7000 Level-0 table #858: started 2019/09/05-10:12:29.971680 7000043c7000 Level-0 table #858: 1836221 bytes OK 2019/09/05-10:12:29.973850 7000043c7000 Delete type=0 #853 2019/09/05-10:12:30.004067 7000043c7000 Generated table #856: 181829 keys, 2120146 bytes 2019/09/05-10:12:30.063737 7000043c7000 Generated table #859: 181828 keys, 2120137 bytes 2019/09/05-10:12:30.117959 7000043c7000 Generated table #860: 181825 keys, 2120084 bytes 2019/09/05-10:12:30.137805 7000043c7000 Generated table #861: 58498 keys, 682097 bytes 2019/09/05-10:12:30.238360 7000043c7000 Generated table #862: 13 keys, 153 bytes 2019/09/05-10:12:30.238379 7000043c7000 Compacted 3@0 + 5@1 files => 7042916 bytes 2019/09/05-10:12:30.238592 7000043c7000 compacted to: files[ 2 8 56 69 0 0 0 ]
The compaction will involve 3 files at level 0 (
3@0) and 5 files at level 1 (
5@1). The result of this compaction is a set of 5 files (
862). You can see the size of each of those files and the number of key/value pairs they contain. This gives you an idea of the footprint of each datapoint. In our example file
856 contains 181,829 entries for a total of 2,120,146 bytes, so a footprint of 11.66 bytes per datapoint, that's 11.66 bytes for the id of the series, the timestamp and the value.
The last two LOG entries summarize the compaction, giving you the total size of the generated SST files and the current number of files at the various levels (2 at level 0, 8 at level 1, 56 at level 2, etc).
If we continue pushing data to our Warp 10 instance, we can see new types of LOG entries, such as:
2019/09/05-10:21:04.142429 7000043c7000 Level-0 table #864: started 2019/09/05-10:21:04.172528 7000043c7000 Level-0 table #864: 3264502 bytes OK 2019/09/05-10:21:04.174262 7000043c7000 Delete type=2 #745 2019/09/05-10:21:04.175685 7000043c7000 Delete type=2 #746 2019/09/05-10:21:04.176116 7000043c7000 Delete type=2 #747 2019/09/05-10:21:04.176521 7000043c7000 Delete type=2 #748 2019/09/05-10:21:04.176642 7000043c7000 Delete type=2 #780 2019/09/05-10:21:04.177114 7000043c7000 Delete type=2 #850 2019/09/05-10:21:04.177535 7000043c7000 Delete type=2 #852 2019/09/05-10:21:04.177843 7000043c7000 Delete type=2 #854 2019/09/05-10:21:04.178211 7000043c7000 Delete type=0 #857 2019/09/05-10:21:04.178511 7000043c7000 Moved #856 to level-2 2120146 bytes OK: files[ 2 8 57 69 0 0 0 ] 2019/09/05-10:21:04.178773 7000043c7000 Moved #859 to level-2 2120137 bytes OK: files[ 2 7 58 69 0 0 0 ]
Delete type=2 entries inform about the cleaning of the SST files merged in a previous compaction.
The last two entries indicate that some SST files were moved as is from level 1 to level 2 to rebalance the number of files at the various levels.
You should now understand how LevelDB works and you should be able to read the
LOG file to find valuable information such as the average size of datapoints in SST files.
As an LSM trees backend, LevelDB handles deletions by writing tombstones to the log file, this means that deletions will only reclaim space when the SST files containing the tombstones are merged at the deepest level. To overcome this, SenX offers a commercial extension to Warp 10 to reclaim space immediately by intelligently suppressing SST files. Please contact our sales team if you would be interested by this offering. Note that this extension is used by default in the sandbox and in our hosted Warp 10 instances.
In this post, we explain how to install and use the Py4J plugin for Warp 10™. This plugin allows Python scripts to interact with the WarpScript™ language.
WarpScript has support for Apache Arrow Format. Discover what this format is with examples using WarpScript with R, Python and Spark.