Learn how to use Thrift and Protocol Buffers directly from within WarpScript and without the need for code generation
Introduction
Serialization/deserialization frameworks such as Thrift and Protocol Buffers (Protobuf) have slowing gained in popularity. They are now being used in many applications, whether for exchanging messages through RPC calls or for storing data in extensible and compact structures.
The benefit of their approach is simple. Define your data structures once and use them in multiple languages without having to write specific code. This way interoperability can be ensured between heterogeneous environments and without having to learn all the dialects involved.
The magic is performed by a compiler, which takes as input the structure and service definitions and generates code for a target language.
The workflow for using either Thrift or Protobuf is therefore to get a hold of the definitions you need, compile them, so the code generation magic happens, and then include this generated code inside your application, so you can serialize and deserialize according to the defined schemas.
This works great for applications which manipulate a pre-defined set of structures. But for data analytics applications which need to adapt to the datasets the user wants to access this is cumbersome. In the case of WarpScript, for example, one would need to add an extension with support for a given set of structures, and repeat this process for each new dataset. This leads to slow development cycles and loss of efficiency.
Introducing dynamic structures
WarpScript is a very flexible language for doing data analytics, on Time Series data of course as it was introduced by Warp 10, but really on any type of data. So we wanted to make WarpScript the most flexible environment when it comes to using Thrift and Protobuf.
Read more about WarpScript.
To achieve that, we recently created two extensions for supporting Thrift and Protobuf within WarpScript code. They both work similarly, so it makes sense to describe them in the same post.
The idea implemented by those extensions is to dynamically generate structures directly from their description. So at runtime, it is possible to parse a Thrift .idl
or Protobuf .proto
description and end up with entities which can be used for serializing or deserializing content.
The possibilities this opens are endless. It is, for example, possible to read a schema definition from a database or website, parse it and start serializing data according to the newly loaded schema. One could also generate or extend structure definitions on the fly. A huge gain in flexibility and productivity compared with the standard workflow described above.
Read more about saving and processing sensor data with Node-RED and WarpScript |
Parsing
The parsing step consumes a structure definition expressed in the native syntax of either Thrift or Protobuf. The WarpScript extensions mentioned earlier add functions THRIFTC
and PROTOC
which parse those native definitions and produce the described dynamic structures as an internal WarpScript object.
Those internal objects are correctly recovered after a SNAPSHOT
/ EVAL
cycle, so they can be transmitted along REXEC
calls.
The examples below show THRIFTC
and PROTOC
in action.
<'
enum ThriftEnum { A, B, C,}
struct ThriftStructure {
1:ThriftEnum enumField,
2:string stringField,
3:i32 intField,
}
'>
THRIFTC
'thrift' STORE
<'
message ProtobufMessage {
enum ProtobufEnum {
A = 0;
B = 1;
C = 2;
}
ProtobufEnum enumField = 1;
string stringField = 2;
uint32 intField = 3;
}
'>
PROTOC
'proto' STORE
Serializing
Creating Thrift or Protobuf serialized content is straightforward, create a MAP containing the values for the various fields and use the structure definitions from the parsing step.
Protobuf serialization is performed using the ->PB
function and Thrift serialization via the ->THRIFT
function.
The result is a byte array containing the original structure serialized using the specified definition.
Any field present in the input map but absent from the structure definition will simply be ignored.
{
'enumField' 'A'
'stringField' 'Hello Protobuf'
'intField' 42
}
$proto 'ProtobufMessage' ->PB
{
'enumField' 'A'
'stringField' 'Hello Thrift'
'intField' 42
} $thrift 'ThriftStructure' ->THRIFT
Deserializing
Deserializing is following a similar pattern. Using the PB->
or THRIFT->
function jointly with structure definitions from PROTOC
or THRIFTC
a byte array containing serialized content can be deserialized into a MAP.
The field values will be converted to WarpScript types, meaning that all numeric fields will either be DOUBLE
or LONG
.
Enum values will appear as a STRING
containing the name of the value.
All you need to know about getting help from Warp 10 community |
Takeaways
The Thrift and Protocol Buffers WarpScript extensions bring the power of those serialization frameworks without the hassle of intermediate code generation.
That possibility coupled with a large number of integrations of the WarpScript data analytics language confirms it as one of the best choices for manipulating data.
Both the Thrift and Protobuf extensions are readily available on WarpFleet.
Read more
Industry 4.0: Data on the critical pathway (2/3)
August 2022: Warp 10 release 2.11.0
Industrie du futur : les données sur le chemin critique - Partie 2
Co-Founder & Chief Technology Officer