Warp 10™ Raspberry Pi 4 bench for industrial IoT

Warp 10™ Raspberry bench for industrial IoT

Time for IoT again! Embedding Warp 10™ on a Raspberry Pi or another embedded target has some advantages. You can use WarpScript™ everywhere to process data locally, use REXEC and other server to server functions for data synchronization. But what about performances?

If you look at the Warp 10™ documentation, you can read that edge deployments can handle around 10k datapoints/s. You can also see that the standalone version on a single computer can handle around 100k datapoints/s.

Time to check those figures!

Explicit Warning: Warp 10™ 2.1 spoilers inside.

Bench material

ARM32: Rpi3 B

Bench Me I’m Famous.

Not the most recent, but so common for makers, it is a “must bench”. Note there is now some industrial grade Raspberry.

  • Quad Core 1.2GHz Broadcom BCM2837 64bit SOC (Cortex A-53)
  • 1GB RAM
  • 32 bits (armv7l) raspbian OS.

Yes, raspbian is a 32 bits OS on a 64bits ARM. This has an impact on the LevelDB implementation.

ARM64: PINE64 LTS

I am not famous, but I can manage a battery.

Not very common, but a very good performance/price ratio.

  • Quad Core 1.15 GHz Allwinner A64 SOC (Cortex A-53)
  • LPDDR3 RAM (up to 2GB)
  • armbian 64 bits (aarch64) with a pretty old patched 3.10 kernel (before sunxi starts merging with mainline).

ARM32: Rpi4 B

I am Lord Of the Rpi!

The most recent release of the famous Raspberry

  • 1.5-GHz, Quad-Core Broadcom BCM2711B0 (Cortex A-72)
  • 2 GB RAM
  • Twice the RAM bandwidth as Rpi3.
  • 32 bits (armv7l) raspbian OS.

Again, raspbian is a 32 bits OS on a 64bits ARM.

A fan

Why use a small one?

Yes. I want to bench Warp 10™, not the weird throttling strategies implemented in both SOC.

SD Cards

Industrial SD cards for everyone

OK, these are not the fastest on the market, but I don’t care about sequential read or write. I used them for a customer because they have a SMART-like protocol which clearly tell you how many write cycles you did.

Meanwhile, during the bench, I was so impressed by the Rpi4 that I also bought an A2 class SD card. This kind of SD card sustains 2000 IOPS random write. For a database, this is more important than the sequential write performance. When you buy an SD card, forget about the “speeds up to 60MB/s”. The card I bought is A2 class (random write 10MB/s, 2000 IOPS), and V30 (30MB/s minimum sequential write).

Marketing: “up to 60MB/s”. Reality: A2 V30

Anyway, I will also do tests with a RAM drive. Again, I want to bench Warp 10™, not SD cards.

Software setup

Java: OpenJDK 1.8

Warp 10™: latest 2.0.3 master…. And a pretty cool new branch we will merge soon. I’m going to explain that later on.

Warp 10™ configuration: the out-of-the-box default configuration. There may be room for improvement.

Datasets

Dataset has a huge impact on performance. Ingesting booleans is quicker than doubles (parsing time), and there are a few tweaks you must know about the Warp 10™ input format. Warp 10™ 2.1 also allows something really new.

For the test, I imagine I am logging 22 channels on the CAN bus of a car.

The naive dataset format

1412918379300000/47.363331:4.898984/214748 throttlepedal{id=xaaabvb} 23.5
1412918379300000/47.363331:4.898984/214748 torque{id=xaaabvb} 78
1412918379300000/47.363331:4.898984/214748 boostpressure{id=xaaabvb} 0.69
1412918379300000/47.363331:4.898984/214748 gammalongitudinal{id=xaaabvb} -0.05078125
1412918379300000/47.363331:4.898984/214748 speed{id=xaaabvb} 101.818769659
1412918379300000/47.363331:4.898984/214748 wheeltorque{id=xaaabvb} 222
1412918379300000/47.363331:4.898984/214748 externaltemp{id=xaaabvb} 15.0
1412918379300000/47.363331:4.898984/214748 yawrate{id=xaaabvb} -11.89
1412918379300000/47.363331:4.898984/214748 engineoiltemp{id=xaaabvb} 92.0

Each line is a different classname, label is repeated everywhere.

The ordered dataset format

1435854991700000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991600000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991500000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991400000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991300000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991200000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991100000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854991000000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854990900000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
1435854990800000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0

I grouped every channels, but the classname and label are still repeated.

The optimized dataset format

1435854991700000/47.36333099193871:4.898983938619494/214748 wheeltorque{id=xaaabvb} 0
=1435854991600000/47.36333099193871:4.898983938619494/214748 0
=1435854991500000/47.36333099193871:4.898983938619494/214748 0
=1435854991400000/47.36333099193871:4.898983938619494/214748 0
=1435854991300000/47.36333099193871:4.898983938619494/214748 0
=1435854991200000/47.36333099193871:4.898983938619494/214748 0
=1435854991100000/47.36333099193871:4.898983938619494/214748 0
=1435854991000000/47.36333099193871:4.898983938619494/214748 0
=1435854990900000/47.36333099193871:4.898983938619494/214748 0
=1435854990800000/47.36333099193871:4.898983938619494/214748 0

I used the continuation syntax, with ‘=‘ sign to tell Warp 10™: “go on, it is the same GTS as on the previous line”.

The brand new Warp 10™ 2.1 multi value format

Up to Warp 10 2.0, you can store booleans, longs, doubles or string values. Warp 10™ 2.1 introduces a new format: the binary format. It means you can push whatever you want in the value field, starting with b64: (followed by Base64 URL content) or hex: (followed by hex encoded content), or multiple values enclosed in [ ].

In this example, I have a fixed list of 22 channels. I will use list of values, one value per channel. The timestamp and the position is repeated once, and my GTS is named allchannels.

1435848939300000/47.36333099193871:4.898983938619494/214748 allchannels{id=xaaabvb} [ 222 37.0 71.89889 830.909846282 826.901874311 92.0 -11.89 823.158183564 3 -0.05078125 101.818769659 78 15.0 4796.16306954 0.69 0.578125 810.153929247 0.0 86.0 23.0 23.5 53 ]
=1435848939400000/47.36333099193871:4.898983938619494/214748 [ 203 38.0 71.90889 831.370375502 827.586206897 92.0 -12.69 823.723228995 3 -0.046875 101.898755149 71 15.0 4803.84307446 0.69 0.55859375 809.71659919 0.0 86.0 23.0 23.5 49 ]
=1435848939500000/47.36333099193871:4.898983938619494/214748 [ 209 38.0 71.90889 834.144306965 830.909846282 92.0 -13.19 824.175824176 3 -0.0546875 102.009079961 74 15.0 4815.40930979 0.69 0.640625 810.920394648 0.0 86.0 23.0 29.5 50 ]
=1435848939600000/47.36333099193871:4.898983938619494/214748 [ 258 37.0 71.90889 837.637861231 832.177531207 92.0 -12.89 826.56013225 3 -0.0546875 102.229729587 91 15.0 4830.9178744 0.69 0.640625 811.249323959 0.0 86.0 23.0 36.3 62 ]
=1435848939700000/47.36333099193871:4.898983938619494/214748 [ 371 37.0 71.91889 837.637861231 832.523935063 92.0 -11.69 836.353498745 3 -0.05078125 102.629657033 131 15.0 4854.36893204 0.65 0.64453125 814.885236996 0.0 86.0 23.0 37.0 91 ]
=1435848939800000/47.36333099193871:4.898983938619494/214748 [ 394 37.0 71.91889 840.925017519 836.353498745 92.0 -11.19 841.160801907 3 -0.0625 103.168869555 139 15.0 4870.12987013 0.75 0.58984375 816.659861168 0.0 85.0 23.0 38.0 96 ]
=1435848939900000/47.36333099193871:4.898983938619494/214748 [ 369 36.0 71.91889 843.407365758 835.421888053 92.0 -10.69 843.407365758 3 -0.0625 103.539836738 130 15.0 4885.99348534 0.85 0.5625 821.35523614 0.0 85.0 23.0 39.1 91 ]
=1435848940000000/47.36333099193871:4.898983938619494/214748 [ 399 36.0 71.91889 846.501128668 842.815002107 92.0 -11.19 846.023688663 3 -0.08984375 103.859778695 141 15.0 4901.96078431 1.0 0.5625 825.082508251 0.0 85.0 23.0 39.1 98 ]
=1435848940100000/47.36333099193871:4.898983938619494/214748 [ 384 36.0 71.92889 849.978750531 843.88185654 92.0 -11.39 849.016555823 3 -0.08984375 104.299698885 135 15.0 4926.10837438 0.94 0.625 826.332461094 0.0 85.0 23.0 40.1 95 ]
=1435848940200000/47.36333099193871:4.898983938619494/214748 [ 408 36.0 71.92889 852.999715667 847.457627119 92.0 -11.29 853.606487409 3 -0.09375 104.68997291 144 15.0 4942.33937397 1.0 0.5546875 831.8314155 0.0 85.0 23.0 40.7 101 ]

The list is encoded by the Warp 10™ ingestion process into a wrapped ENCODER. When you will FETCH the data, you will need to decode it to recover values. This operation is really fast.

If you want even faster ingestion rate, you can remove the compression of this ENCODER: just add ! right after the [:

=1435848939300000/47.36333099193871:4.898983938619494/214748 [! 222 37.0 71.89889 830.909846282 826.901874311 92.0 -11.89 823.158183564 3 -0.05078125 101.818769659 78 15.0 4796.16306954 0.69 0.578125 810.153929247 0.0 86.0 23.0 23.5 53 ]
...

The number of lines is 57841, instead of 1272524 (22 channels, 22 times less lines).

Raw Bench Results

Here are the results in datapoints per second for each dataset, remember this is on rather low performance hardware:

naïveorderedoptimizedWarp 10™ 2.1
Raspberry Pi 3B, SD card7 20013 00019 00077 000 (**)
Raspberry Pi 3B, ram drive8 90017 00023 000100 000
PINE64, SD card10 60026 00031 00074 000 (**)
PINE64, ram drive15 00033 00045 000130 000
Raspberry Pi 4B, SD card11 00016 50032 000140 000
212 000 w/o compression
Raspberry Pi 4B, A2 SD card14 00018 00048 000180 000
300 000 w/o compression
Raspberry Pi 4B, ram drive1700023 00055 000195 000
330 000 w/o compression
i7 ssd laptop113 000340 000430 000(*)

(*) ingestion time is between 1.2s and 1.7s (~1 000 000 datapoints/s). This 1.2M datapoints dataset is too small for a reliable figure.
(**) hardly repeatable results. Somewhere between 70 000 and 80 000. The Cortex A-53 SD card/CPU interface is the limit here.

Analysis

  • We had not benched the Raspberry Pi in two years. On the Warp 10™ website, we advertise the ingestion rate to be 10k datapoints/s for edge applications, that was a mix of several hardware. We see a Raspberry Pi 3B can now reach 20k datapoints/s.
  • A Pine64 LTS outperforms the Raspberry Pi 3 everywhere the SD card is not limiting performance. Both are Cortex A-53 with nearly the same clock. Is it the 64 bits effect?
  • I used the same SD card everywhere… The Cortex A-72 of the Raspberry 4 obviously removed the CPU/SD card bottleneck!
  • With the Warp 10™ 2.1 multi value ingestion format, A2 class SD card performance is really close to a RAM drive…
  • The SD card random access is still a bottleneck for every hardware: with the Warp 10™ 2.1 multi value ingestion format, both the Raspberry Pi and PINE64 are limited to ~70k datapoints/s by the SD card.

Conclusion

The Warp 10™ 2.1 multi value ingestion format is perfect for aligned data, typically in industrial applications. A CAN or a Modbus network can be stored “as is”. Each request or frame is stored in a row. You need to keep the mapping/database/schema somewhere else for easy decoding. This mapping could be stored as JSON in an attribute of the GTS. If the mapping is subject to a major breaking change, just add the mapping major revision number in a label to create a new GTS.

As nobody among serious Warp 10™ users does use the naive GTS input format, we can reasonably say that 20k datapoints/s is now the average performance achieved for Warp 10™ 2.x edge applications.

By the way… This is a “one thread” benchmark. If several sources push data, multithreading will speed up ingestion!

How does your timeseries database performance compare to that of Warp 10™? Let us know!

Hold my fan… I will try Raspberry 3 too.

Share