Demystifying LevelDB

Demystifying LevelDB

The standalone version of Warp 10™ persists its data on disk using the LevelDB library. LevelDB is an open source library created by Jeff Dean and Sanjay Ghemawat at Google. It implements a Log Structured Merge Tree which provides very fast writes, independent of the size of the database, and scan capabilities which make LevelDB the ideal storage library for a time series database.

LevelDB is very robust and rather simple, but understanding how it works could benefit from a clear explanation. This is what this blog post aims at doing, providing you with sufficient knowledge to understand how LevelDB behaves and ultimately how your Warp 10 instance performs.

Overview

Data pushed to LevelDB ends up in files called SST files, SST for Sorting String Table. An SST is a lexicographically sorted list of key/value pairs where both keys and values are byte arrays. SST files are immutable, meaning that once created they will not be altered. They may be removed during a process called compaction. SST files are all the files with suffix .sst in your leveldb directory. SST files are organized in levels.

Having a Log-Structured Merge-Tree architecture, LevelDB does not store data directly in SST files, it stores new key/value pairs mutations (either additions or deletions) in a log file. This log file is the file with the .log suffix in your leveldb directory. The data stored in the log file is also stored in a memory structure called the memtable.

When the log file reaches a certain size (around 4 MB), its content is transferred to a new SST file and a new log file and memtable are initiated, the previous memtable is discarded. Those fresh SST files are collectively called the level 0 files. Files at level 0 are special because their keys can overlap since they are simply copies of the various log files.

When the number of files at level 0 reaches a threshold (around 10 files), a compaction is triggered. Compaction will chose a set of overlapping files at level 0 and a set of files they overlap at the next level (level 1) and will merge the data in those files to create a new set of SST files at level 1. The chosen SST files are then discarded.

LevelDB continuously inspects files at each level and triggers compactions when the number of files or the total size at a given level goes beyond a set threshold. LevelDB manages 7 levels of files. The list of current SST files is kept in a MANIFEST file. The id of the current MANIFEST file is stored in the CURRENT file.

When reading data, the set of SST files to access is retrieved from the data in the MANIFEST and the required files are opened and the data to read is reconciled from those files and the current memtable, managing overwrites and deletions.

All operations performed on the SST, log and MANIFEST files are logged in the LOG file.

Understanding the LevelDB LOG entries

The LOG file in your leveldb directory is a rich source of information to understand how your Warp 10 instance is behaving.

LevelDB start up

When the LevelDB database is opened, the current .log file is recovered and converted to an SST file, a new .log file is initiated and a new MANIFEST file is created with updated information about all the known .sst files. While scanning the SST files upon startup, LevelDB may decide to remove unreferenced files.

When opening a LevelDB database with the following files:

CURRENT
LOG
MANIFEST-000839
.... plenty of .sst files ....
000837.sst
000841.log
000842.sst

The following entries will appear in the new LOG file (the previous LOG file will be renamed LOG.old):

2019/09/05-09:54:01.235885 7000045d0000 Recovering log #841
2019/09/05-09:54:01.237443 7000045d0000 Delete type=0 #841
2019/09/05-09:54:01.237509 7000045d0000 Delete type=3 #839

The first line says the log file was recovered, since it was empty in our case, it did not lead to the creation of a new SST file. Then the log file (type=0) was deleted and the old MANIFEST (type=3) was removed. The leveldb now contains the following files:

000837.sst
000842.sst
LOG.old
CURRENT
000844.log
MANIFEST-000843
LOG

You can see a fresh log file (with id 844) and a fresh MANIFEST (with id 843). The CURRENT file contains the full name of the MANIFEST file (MANIFEST-000843).

log rotations

As you push data, the .log file fills up and will reach a point when LevelDB will decide to rotate it. When this happens, the following LOG entries will appear:

2019/09/05-10:00:07.976574 7000043c7000 Level-0 table #846: started
2019/09/05-10:00:08.022811 7000043c7000 Level-0 table #846: 3264502 bytes OK
2019/09/05-10:00:08.024013 7000043c7000 Delete type=0 #844

The first line indicates that a new SST file (with id 846) was initiated at level 0. The second line, 50ms later, states that 3,264,502 bytes were transferred to it from the log file being rotated. Then the third line indicates that the log file that was just rotated was deleted. The new log file was created just before the creation of the new SST file and thus has the id 845. The files in the leveldb are therefore:

000842.sst
LOG.old
CURRENT
000846.sst
000845.log
LOG
MANIFEST-000843

Compactions

After writing data for a while, SST files at level 0 will reach a threshold which will trigger a compaction. LevelDB will then select files at level 0 and level 1 and will merge them, creating new SST files at level 1.

The LOG entries for compaction look like this:

2019/09/05-10:12:29.932735 7000043c7000 Compacting 3@0 + 5@1 files
2019/09/05-10:12:29.935664 7000043c7000 Generated table #855: 1 keys, 299 bytes
2019/09/05-10:12:29.953873 7000043c7000 Level-0 table #858: started
2019/09/05-10:12:29.971680 7000043c7000 Level-0 table #858: 1836221 bytes OK
2019/09/05-10:12:29.973850 7000043c7000 Delete type=0 #853
2019/09/05-10:12:30.004067 7000043c7000 Generated table #856: 181829 keys, 2120146 bytes
2019/09/05-10:12:30.063737 7000043c7000 Generated table #859: 181828 keys, 2120137 bytes
2019/09/05-10:12:30.117959 7000043c7000 Generated table #860: 181825 keys, 2120084 bytes
2019/09/05-10:12:30.137805 7000043c7000 Generated table #861: 58498 keys, 682097 bytes
2019/09/05-10:12:30.238360 7000043c7000 Generated table #862: 13 keys, 153 bytes
2019/09/05-10:12:30.238379 7000043c7000 Compacted 3@0 + 5@1 files => 7042916 bytes
2019/09/05-10:12:30.238592 7000043c7000 compacted to: files[ 2 8 56 69 0 0 0 ]

The compaction will involve 3 files at level 0 (3@0) and 5 files at level 1 (5@1). The result of this compaction is a set of 5 files (856, 859, 860, 861 and 862). You can see the size of each of those files and the number of key/value pairs they contain. This gives you an idea of the footprint of each datapoint. In our example file 856 contains 181,829 entries for a total of 2,120,146 bytes, so a footprint of 11.66 bytes per datapoint, that's 11.66 bytes for the id of the series, the timestamp and the value.

The last two LOG entries summarize the compaction, giving you the total size of the generated SST files and the current number of files at the various levels (2 at level 0, 8 at level 1, 56 at level 2, etc).

Housekeeping

If we continue pushing data to our Warp 10 instance, we can see new types of LOG entries, such as:

2019/09/05-10:21:04.142429 7000043c7000 Level-0 table #864: started
2019/09/05-10:21:04.172528 7000043c7000 Level-0 table #864: 3264502 bytes OK
2019/09/05-10:21:04.174262 7000043c7000 Delete type=2 #745
2019/09/05-10:21:04.175685 7000043c7000 Delete type=2 #746
2019/09/05-10:21:04.176116 7000043c7000 Delete type=2 #747
2019/09/05-10:21:04.176521 7000043c7000 Delete type=2 #748
2019/09/05-10:21:04.176642 7000043c7000 Delete type=2 #780
2019/09/05-10:21:04.177114 7000043c7000 Delete type=2 #850
2019/09/05-10:21:04.177535 7000043c7000 Delete type=2 #852
2019/09/05-10:21:04.177843 7000043c7000 Delete type=2 #854
2019/09/05-10:21:04.178211 7000043c7000 Delete type=0 #857
2019/09/05-10:21:04.178511 7000043c7000 Moved #856 to level-2 2120146 bytes OK: files[ 2 8 57 69 0 0 0 ]
2019/09/05-10:21:04.178773 7000043c7000 Moved #859 to level-2 2120137 bytes OK: files[ 2 7 58 69 0 0 0 ]

The Delete type=2 entries inform about the cleaning of the SST files merged in a previous compaction.

The last two entries indicate that some SST files were moved as is from level 1 to level 2 to rebalance the number of files at the various levels.

Takeaway

You should now understand how LevelDB works and you should be able to read the LOG file to find valuable information such as the average size of datapoints in SST files.

As an LSM trees backend, LevelDB handles deletions by writing tombstones to the log file, this means that deletions will only reclaim space when the SST files containing the tombstones are merged at the deepest level. To overcome this, SenX™ offers a commercial extension to Warp 10 to reclaim space immediately by intelligently suppressing SST files. Please contact our sales team if you would be interested by this offering. Note that this extension is used by default in the sandbox and in our hosted Warp 10 instances.

Share