Thinking in WarpScript™ – Multi-Value Support

Thinking in WarpScript #5 Multi-value Support

With the advent of release 2.1 of Warp 10™ a new feature is available to store multiple values at a given timestamp of a Geo Time Series™. This post will walk you through this feature and how you can put it to work for different use cases.

Overview

Geo Time Series™ in Warp 10™ can have values of four different types, LONG, DOUBLE, BOOLEAN, and STRING. Until release 2.1, the STRING values had to be valid UTF-8 strings. Starting with release 2.1, a derived datatype we call BINARY has been introduced. It allows storing STRING values that use the ISO-8859-1 charset and thus can contain any sequence of bytes.

Such values are inserted into Geo Time Series™ using the b64:<base64url_encoded_content> or hex:<hex_encoded_content> syntax for the values pushed to the /update endpoint. Within WarpScript™ code, using a byte array as the value parameter of a call to ADDVALUE has the same effect. This capability means we can efficiently store any byte sequence, i.e. without encoding it to make it a valid UTF-8 string, including serialized objects.

It turns out the result of the WRAPRAW function is a byte sequence representing a serialized object called a GTS Wrapper which contains a compressed Geo Time Series™, or more precisely a compressed GTS Encoder. You can now store such objects as values of Geo Time Series™. This is exactly what the multi-value support does. Upon receiving a value in multi-value format, it packs it in a GTS Wrapper and stores its serialized form as a binary value in the Geo Time Series™. When using FETCH the binary values can then be extracted and unwrapped to recover each element of the original multivariate value.

Multi-value Syntax

Specifying a multivariate value is very easy, the syntax is a mix of the WarpScript™ list syntax ([ ... ]) and of the GTS Input Format.

A multivariate value is a list of space separated elements enclosed between an opening [ and a closing ]. Each element can have one of the following formats:

VALUE
TIMESTAMP/VALUE
TIMESTAMP/LAT:LON/VALUE
TIMESTAMP/LAT:LON/ELEVATION/VALUE
TIMESTAMP//ELEVATION/VALUE

When using the VALUE format without a TIMESTAMP, the implied timestamp is 0.

Each VALUE can be any of the supported Warp 10™ value format, including the multi-value syntax. Yes, you have guessed right, this means you can generate values which are multi-dimensional arrays or tensor-like structures. The example below shows how to format the input format passed to the /update endpoint.

1561734797000000// class{label=value} [ 1 2/42.0 3/48.0:-4.5/T 4/48.0:-4.5/10000/'foo' 5//5000/[ 1 2 3 ] ]

The content enclosed between the opening and closing brackets is packed into a GTS Encoder. A GTS Encoder is similar to a Geo Time Series™ but can contain values of multiple types, and then compressed as a GTS Wrapper. If the opening bracket is replaced by [! the GTS Wrapper will be a little less compact but creation time will be greatly decreased.

The PARSEVALUE function can parse the multi-value syntax and create a matching GTS Wrapper. Such a wrapper can be converted back to a representation using the multi-value syntax via the ->MVSTRING function.

The example below illustrates the multi-value syntax and its manipulation using PARSEVALUE and ->MVSTRING.

"[ 1 2/42.0 3/48.0:-4.5/T 4/48.0:-4.5/10000/'foo' 5//5000/[ 1 2 3 ] ]" PARSEVALUE
UNWRAPENCODER
->MVSTRING

Similarly GTS encoders created via NEWENCODER can be converted to their multi-value syntax representation.

NEWENCODER

0 NaN NaN NaN 1 ADDVALUE
2 NaN NaN NaN 42.0 ADDVALUE
3 48.0 -4.5 NaN T ADDVALUE
4 48.0 -4.5 10000 'foo' ADDVALUE
5 NaN NaN 5000 "[ 1 2 3 ]" PARSEVALUE ADDVALUE

->MVSTRING

Retrieving multivariate Geo Time Series™

As mentioned above, BINARY values are really STRING values using the ISO-8859-1 charset instead of UTF-8. Therefore we added features to WarpScript™ to be able to identify those values specifically when retrieving Geo Time Series™ from the Warp 10™ storage engine.

The FETCH syntax using a parameter map with the typeattr option set will retrieve Geo Time Series™ and split each one into a set of GTS bearing an attribute with the type of the values. A Geo Time Series™ containing values of different types is therefore retrieved as up to 5 GTSs with the attribute whose name is specified in typeattr set to either LONG, DOUBLE, BOOLEAN, STRING and since 2.1, BINARY.

The BINARY values can then be interpreted using WarpScript™ code similar to:

// Extract the bytes from the string (BINARY) value using the ISO-8859-1 charset
$value 'ISO-8859-1' ->BYTES

// Deserialize the GTS Wrapper into a GTS Encoder
UNWRAPENCODER

If the extracted encoder contains nested multivariate values, you can use the ->GTS function to split the encoder into one GTS per type and repeat the above process.

The above method deals with BINARY values at a very low level. Release 2.1 also brings two handy functions, MVINDEXSPLIT and MVTICKSPLIT, which take as input a Geo Time Series™ or GTS Encoder and output a list of Geo Time Series™ built by extracting values from the input, expanding multivariate values in a transparent manner.

Using those functions you can FETCH a single multivariate GTS and end up with multiple GTSs just as if the data in the multivariate values had been stored in separate GTS.

Please refer to the MVINDEXSPLIT and MVTICKSPLIT documentation for more details.

Use cases

The multi-value support opens a broad range of possibilities that few if any other time-series database support. Here are just a few examples of the possible applications.

VTQ support for industrial data historians

The multi-value syntax allows Warp 10™ to be used as a super-efficient industrial data historian as it can then support the VTQ (Value, Time, Quality) data model. The industrial world needs to have a quality indicator associated with values produced by sensors, with the multi-value syntax. You can easily represent such a model with input data of the form:

TIME// class{labels} [ VALUE QUALITY ]

Where QUALITY could be a numerical, boolean or string indicator of the quality of VALUE. At retrieval time, the quality indicator can be extracted in a Geo Time Series™ of its own and manipulated jointly with the GTS of sensor values.

Efficient storage of industrial bus data or PLC process image

Industrial systems rely on PLCs and industrial buses such as CANs bus or Modbus. The PLCs store all of their inputs and outputs in a process image, a range of memory with bits for each input and output. The process image is a coherent ensemble, so storing it as a whole is what makes the most sense. With release 2.1 you can store it either as a binary value and decide to extract elements later, or using the multi-value syntax and access individual fields in a very convenient way at read time.

The data flowing on industrial buses can also be stored using the multi-value syntax. A bus such as Modbus will send the state of multiple registers when polled. You can store all those values using the multi-value format.

Storing multi-sensor measurements

It is quite common to have real-life systems monitored by multiple sensors. Those sensors are very often sampled as a whole and the sampled values can therefore be stored using the multi-value syntax. Modern BTSs in cellular networks, for example, have close to 2000 values coming from either monitored sensors or counters. Those values are sampled at the same time and storing them as a multi-value is both straightforward and space-efficient.

A carrier with 200,000 BTSs could store each sensor or counter data in its own Geo Time Series™. But this would lead to 400M series, not infeasible with Warp 10™ but very demanding in terms of resources. Instead, storing the 2000 values as a single multi-value value will lead to only 200,000 GTSs. It could very well fit a single small size server.

Storing maps

Using the nesting ability of the multi-value format it is possible to store a multi-value value containing a multi-value value with each key/pair or two multi-value values, one with the field names, the other with the associated values.

The syntax would be:

[ [ 'key1' 42.0 ] [ 'key2' 'bar' ] ]

or

[ [ 'key1' 'key2' ] [ 42.0 'bar' ] ]

Which represents the mapping of key 1 to 42.0 and of key 2 to 'bar'.

Storing market data

End of day market data traditionally comprise what are called OHLC values, for Open, High, Low and Close prices of each instrument. Those four values, and possibly some more, can be stored together using the multi-value syntax. This means a single Geo Time Series™ per financial instrument, speeding up retrieval and shrinking storage requirements.

Storage of tensor like structures

With the advent of AI and Machine Learning, tensors are getting more and more common in the computer science space, and the time-series database field is no different. As Tensors can be seen as multi-dimensional arrays, they can very easily be represented using the multi-value syntax.

The following syntax is for a 3x3x3 tensor:

[ [ [ 0 1 2 ] [ 3 4 5 ] [ 6 7 8 ] ] [ [ 9 10 11 ] [ 12 13 14 ] [ 15 16 17 ] ] [ [ 18 19 20 ] [ 21 22 23 ] [ 24 25 26 ] ] ] 

Saving big bucks on your SaaS bill

Imagine you have a SaaS plan with a limited series budget. Using multivariate values you can pack measurements sampled at the same time in a single value, using the ticks of the multivariate values as indices to your data. Later with MVTICKSPLIT you can dispatch the measurements in their own series. This way you can turn a plan for 1000 series in a plan for a million at no additional cost.

Takeways

The multi-value syntax is a major feature of release 2.1, we really encourage you to migrate to this version to benefit from it.

Let us know what you build using that new syntax not found in any other time series database!

Share