Warp 10 Raspberry Pi 4 benchmark for industrial IoT

Ingesting 300k datapoints per second in the Warp 10 time series database on a Raspberry Pi is possible with Warp 10 2.1 new ingestion format. We also benched the PINE64 board!

Warp 10™ Raspberry bench for industrial IoT

Time for IoT again! Embedding Warp 10 on a Raspberry Pi or another embedded target has some advantages. You can use WarpScript everywhere to process data locally, use REXEC and other server to server functions for data synchronization. But what about performances?

If you look at the Warp 10 documentation, you can read that edge deployments can handle around 10k data points/s. You can also see that the standalone version on a single computer can handle around 100k data points/s.

Time to check those figures!

Explicit Warning: Warp 10 2.1 spoilers inside.

Bench material

ARM32: Rpi3 B

Not the most recent, but so common for makers, it is a "must bench". Note there is now some industrial-grade Raspberry.

Quad Core 1.2GHz Broadcom BCM2837 64bit SOC (Cortex A-53)
1 GB RAM
32 bits (armv7l) Raspbian OS.

Yes, Raspbian is a 32 bits OS on a 64bits ARM. This has an impact on the LevelDB implementation.

ARM64: PINE64 LTS

Not very common, but a very good performance/price ratio.

Quad Core 1.15 GHz Allwinner A64 SOC (Cortex A-53)
LPDDR3 RAM (up to 2 GB)
Armbian 64 bits (aarch64) with a pretty old patched 3.10 kernel (before sunxi starts merging with mainline).

ARM32: Rpi4 B

The most recent release of the famous Raspberry

1.5-GHz, Quad-Core Broadcom BCM2711B0 (Cortex A-72)
2 GB RAM
Twice the RAM bandwidth as Rpi3.
32 bits (armv7l) Raspbian OS.

Again, Raspbian is a 32 bits OS on a 64bits ARM.

A fan

Yes. I want to bench Warp 10, not the weird throttling strategies implemented in both SOC.

SD Cards

OK, these are not the fastest on the market, but I don't care about sequential read or write. I used them for a customer because they have a SMART-like protocol which clearly tells you how many write cycles you did.

Meanwhile, during the bench, I was so impressed by the Rpi4 that I also bought an A2 class SD card. This kind of SD card sustains 2000 IOPS random write. For a database, this is more important than the sequential write performance. When you buy an SD card, forget about the "speeds up to 60 MB/s". The card I bought is A2 class (random write 10 MB/s, 2000 IOPS), and V30 (30 MB/s minimum sequential write).

Marketing: "up to 60MB/s". Reality: A2 V30

Anyway, I will also do tests with a RAM drive. Again, I want to bench Warp 10, not SD cards.

Discover the Raspberry Beer'o'meter

Software setup

Java: OpenJDK 1.8

Warp 10: latest 2.0.3 master… And a pretty cool new branch we will merge soon. I'm going to explain that later on.

Warp 10 configuration: the out-of-the-box default configuration. There may be room for improvement.

Datasets

Dataset has a huge impact on performance. Ingesting booleans is quicker than doubles (parsing time), and there are a few tweaks you must know about the Warp 10 input format. Warp 10 2.1 also allows something really new.

For the test, I imagine I am logging 22 channels on the CAN bus of a car.

The naive dataset format

1412918379300000/47.363331:4.898984/214748 throttlepedal{id=xaaabvb} 23.5
1412918379300000/47.363331:4.898984/214748 torque{id=xaaabvb} 78
1412918379300000/47.363331:4.898984/214748 boostpressure{id=xaaabvb} 0.69
1412918379300000/47.363331:4.898984/214748 gammalongitudinal{id=xaaabvb} -0.05078125
1412918379300000/47.363331:4.898984/214748 speed{id=xaaabvb} 101.818769659
1412918379300000/47.363331:4.898984/214748 wheeltorque{id=xaaabvb} 222
1412918379300000/47.363331:4.898984/214748 externaltemp{id=xaaabvb} 15.0
1412918379300000/47.363331:4.898984/214748 yawrate{id=xaaabvb} -11.89
1412918379300000/47.363331:4.898984/214748 engineoiltemp{id=xaaabvb} 92.0

Each line is a different classname, label is repeated everywhere.

The ordered dataset format

1435854991700000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991600000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991500000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991400000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991300000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991200000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991100000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991000000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854990900000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854990800000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0

I grouped every channels, but the classname and label are still repeated.

The optimized dataset format

1435854991700000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
=1435854991600000/47.36333099193871:4.898983938619494/214748 0
=1435854991500000/47.36333099193871:4.898983938619494/214748 0
=1435854991400000/47.36333099193871:4.898983938619494/214748 0
=1435854991300000/47.36333099193871:4.898983938619494/214748 0
=1435854991200000/47.36333099193871:4.898983938619494/214748 0
=1435854991100000/47.36333099193871:4.898983938619494/214748 0
=1435854991000000/47.36333099193871:4.898983938619494/214748 0
=1435854990900000/47.36333099193871:4.898983938619494/214748 0
=1435854990800000/47.36333099193871:4.898983938619494/214748 0

I used the continuation syntax, with = sign to tell Warp 10: "go on, it is the same GTS as on the previous line".

The brand new Warp 10 2.1 multi value format

Up to Warp 10 2.0, you can store booleans, longs, doubles, or string values. Warp 10 2.1 introduces a new format: the binary format. It means you can push whatever you want in the value field, starting with b64: (followed by Base64 URL content) or hex: (followed by hex encoded content), or multiple values enclosed in [ ].

In this example, I have a fixed list of 22 channels. I will use a list of values, one value per channel. The timestamp and the position are repeated once, and my GTS is named allchannels.

1435848939300000/47.36333099193871:4.898983938619494/214748 allchannels{id=xaaabvb} [ 222 37.0 71.89889 830.909846282 826.901874311 92.0 -11.89 823.158183564 3 -0.05078125 101.818769659 78 15.0 4796.16306954 0.69 0.578125 810.153929247 0.0 86.0 23.0 23.5 53 ]
=1435848939400000/47.36333099193871:4.898983938619494/214748 [ 203 38.0 71.90889 831.370375502 827.586206897 92.0 -12.69 823.723228995 3 -0.046875 101.898755149 71 15.0 4803.84307446 0.69 0.55859375 809.71659919 0.0 86.0 23.0 23.5 49 ]
=1435848939500000/47.36333099193871:4.898983938619494/214748 [ 209 38.0 71.90889 834.144306965 830.909846282 92.0 -13.19 824.175824176 3 -0.0546875 102.009079961 74 15.0 4815.40930979 0.69 0.640625 810.920394648 0.0 86.0 23.0 29.5 50 ]
=1435848939600000/47.36333099193871:4.898983938619494/214748 [ 258 37.0 71.90889 837.637861231 832.177531207 92.0 -12.89 826.56013225 3 -0.0546875 102.229729587 91 15.0 4830.9178744 0.69 0.640625 811.249323959 0.0 86.0 23.0 36.3 62 ]
=1435848939700000/47.36333099193871:4.898983938619494/214748 [ 371 37.0 71.91889 837.637861231 832.523935063 92.0 -11.69 836.353498745 3 -0.05078125 102.629657033 131 15.0 4854.36893204 0.65 0.64453125 814.885236996 0.0 86.0 23.0 37.0 91 ]
=1435848939800000/47.36333099193871:4.898983938619494/214748 [ 394 37.0 71.91889 840.925017519 836.353498745 92.0 -11.19 841.160801907 3 -0.0625 103.168869555 139 15.0 4870.12987013 0.75 0.58984375 816.659861168 0.0 85.0 23.0 38.0 96 ]
=1435848939900000/47.36333099193871:4.898983938619494/214748 [ 369 36.0 71.91889 843.407365758 835.421888053 92.0 -10.69 843.407365758 3 -0.0625 103.539836738 130 15.0 4885.99348534 0.85 0.5625 821.35523614 0.0 85.0 23.0 39.1 91 ]
=1435848940000000/47.36333099193871:4.898983938619494/214748 [ 399 36.0 71.91889 846.501128668 842.815002107 92.0 -11.19 846.023688663 3 -0.08984375 103.859778695 141 15.0 4901.96078431 1.0 0.5625 825.082508251 0.0 85.0 23.0 39.1 98 ]
=1435848940100000/47.36333099193871:4.898983938619494/214748 [ 384 36.0 71.92889 849.978750531 843.88185654 92.0 -11.39 849.016555823 3 -0.08984375 104.299698885 135 15.0 4926.10837438 0.94 0.625 826.332461094 0.0 85.0 23.0 40.1 95 ]
=1435848940200000/47.36333099193871:4.898983938619494/214748 [ 408 36.0 71.92889 852.999715667 847.457627119 92.0 -11.29 853.606487409 3 -0.09375 104.68997291 144 15.0 4942.33937397 1.0 0.5546875 831.8314155 0.0 85.0 23.0 40.7 101 ]

The list is encoded by the Warp 10 ingestion process into a wrapped ENCODER. When you will FETCH the data, you will need to decode it to recover values. This operation is really fast.

If you want an even faster ingestion rate, you can remove the compression of this ENCODER: just add ! right after the [:

=1435848939300000/47.36333099193871:4.898983938619494/214748 [! 222 37.0 71.89889 830.909846282 826.901874311 92.0 -11.89 823.158183564 3 -0.05078125 101.818769659 78 15.0 4796.16306954 0.69 0.578125 810.153929247 0.0 86.0 23.0 23.5 53 ]
...

The number of lines is 57841, instead of 1272524 (22 channels, 22 times fewer lines).

Ingest 300k data points per second in the Warp 10 time series database on a Raspberry Pi Share on X

Raw Bench Results

Here are the results in data points per second for each dataset, remember this is on rather low performance hardware:

	naïve	ordered	optimized	Warp 10™ 2.1
Raspberry Pi 3B, SD card	7 200	13 000	19 000	77 000 (**)
Raspberry Pi 3B, ram drive	8 900	17 000	23 000	100 000
PINE64, SD card	10 600	26 000	31 000	74 000 (**)
PINE64, ram drive	15 000	33 000	45 000	130 000
Raspberry Pi 4B, SD card	11 000	16 500	32 000	140 000 212 000 w/o compression
Raspberry Pi 4B, A2 SD card	14 000	18 000	48 000	180 000 300 000 w/o compression
Raspberry Pi 4B, ram drive	17000	23 000	55 000	195 000 330 000 w/o compression
i7 ssd laptop	113 000	340 000	430 000	(*)

(*) Ingestion time is between 1.2s and 1.7s (~1 000 000 data points/s). This 1.2M data points dataset is too small for a reliable figure.
(**) hardly repeatable results. Somewhere between 70 000 and 80 000. The Cortex A-53 SD card/CPU interface is the limit here.

Analysis

We had not benched the Raspberry Pi in two years. On the Warp 10 website, we advertise the ingestion rate to be 10k data points/s for edge applications, which was a mix of some hardware. We see a Raspberry Pi 3B can now reach 20k data points/s.
A Pine64 LTS outperforms the Raspberry Pi 3 everywhere the SD card is not limiting performance. Both are Cortex A-53 with nearly the same clock. Is it the 64 bits effect?
I used the same SD card everywhere… The Cortex A-72 of the Raspberry 4 obviously removed the CPU/SD card bottleneck!
With the Warp 10 2.1 multi-value ingestion format, A2 class SD card performance is really close to a RAM drive…
The SD card random access is still a bottleneck for every hardware: with the Warp 10 2.1 multi-value ingestion format, both the Raspberry Pi and PINE64 are limited to ~70k data points/s by the SD card.

Conclusion

The Warp 10 2.1 multi-value ingestion format is perfect for aligned data, typically in industrial applications. A CAN or a Modbus network can be stored "as is". Each request or frame is stored in a row. You need to keep the mapping/database/schema somewhere else for easy decoding. This mapping could be stored as JSON in an attribute of the GTS. If the mapping is subject to a major breaking change, just add the mapping major revision number in a label to create a new GTS.

Video: Etch-a-Time Series: a RaspberryPi, a laser, and Warp 10…

As nobody among serious Warp 10 users does use the naive GTS input format, we can reasonably say that 20k data points/s is now the average performance achieved for Warp 10 2.x edge applications.

By the way… This is a "one thread" benchmark. If several sources push data, multithreading will speed up ingestion!

How does your time series database performance compare to that of Warp 10? Let us know!