Thrift and Protocol Buffers support in WarpScript

Learn how to use Thrift and Protocol Buffers directly from within WarpScript and without the need for code generation

Thinking In WarpScript: Thrift and Protocol Buffers

Introduction

Serialization/deserialization frameworks such as Thrift and Protocol Buffers (Protobuf) have slowing gained in popularity. They are now being used in many applications, whether for exchanging messages through RPC calls or for storing data in extensible and compact structures.

The benefit of their approach is simple. Define your data structures once and use them in multiple languages without having to write specific code. This way interoperability can be ensured between heterogeneous environments and without having to learn all the dialects involved.

The magic is performed by a compiler which takes as input the structure and service definitions and generates code for a target language.

The workflow for using either Thrift or Protobuf is, therefore, to get a hold of the definitions you need, compile them so the code generation magic happens, and then include this generated code inside of your application so you can serialize and deserialize according to the defined schemas.

This works great for applications which manipulate a pre-defined set of structures. But for data analytics applications which need to adapt to the datasets the user wants to access this is cumbersome. In the case of WarpScript, for example, one would need to add an extension with support for a given set of structures, and repeat this process for each new dataset. This leads to slow development cycles and loss of efficiency.

Introducing dynamic structures

WarpScript is a very flexible language for doing data analytics, on Time Series data of course as it was introduced by Warp 10, but really on any type of data. So we wanted to make WarpScript the most flexible environment when it comes to using Thrift and Protobuf.

Read more about WarpScript.

To achieve that we recently created two extensions for supporting Thrift and Protobuf within WarpScript code. They both work in a similar way so it makes sense to describe them in the same post.

The idea implemented by those extensions is to dynamically generate structures directly from their description. So at runtime, it is possible to parse a Thrift .idl or Protobuf .proto description and end up with entities which can be used for serializing or deserializing content.

The possibilities this opens are endless. It is, for example, possible to read a schema definition from a database or website, parse it and start serializing data according to the newly loaded schema. One could also generate or extend structure definitions on the fly. A huge gain in flexibility and productivity compared with the standard workflow described above.

Parsing

The parsing step consumes a structure definition expressed in the native syntax of either Thrift or Protobuf. The WarpScript extensions mentioned earlier add functions THRIFTC and PROTOC which parse those native definitions and produce the described dynamic structures as an internal WarpScript object.

Those internal objects are correctly recovered after a SNAPSHOT / EVAL cycle, so they can be transmitted along REXEC calls.

The examples below show THRIFTC and PROTOC in action.

<'
enum ThriftEnum {
  A,
  B,
  C,
}
struct ThriftStructure {
  1:ThriftEnum enumField,
  2:string stringField,
  3:i32 intField,
}
'>
THRIFTC 'thrift' STORE
<'
message ProtobufMessage {
  enum ProtobufEnum {
    A = 0;
    B = 1;
    C = 2;
  }
  ProtobufEnum enumField = 1;
  string stringField = 2;
  uint32 intField = 3;
}
'>
PROTOC 'proto' STORE

Serializing

Creating Thrift or Protobuf serialized content is straightforward, create a MAP containing the values for the various fields and use the structure definitions from the parsing step.

Protobuf serialization is performed using the ->PB function and Thrift serialization via the ->THRIFT function.

The result is a byte array containing the original structure serialized using the specified definition.

Any field present in the input map but absent from the structure definition will simply be ignored.

{
  'enumField' 'A'
  'stringField' 'Hello Protobuf'
  'intField' 42
} $proto 'ProtobufMessage' ->PB
{
  'enumField' 'A'
  'stringField' 'Hello Thrift'
  'intField' 42
} $thrift 'ThriftStructure' ->THRIFT

Deserializing

Deserializing is following a similar pattern. Using the PB-> or THRIFT-> function jointly with structure definitions from PROTOC or THRIFTC a byte array containing serialized content can be deserialized into a MAP.

The field values will be converted to WarpScript types, meaning that all numeric fields will either be DOUBLE or LONG.

Enum values will appear as a STRING containing the name of the value.

Takeaways

The Thrift and Protocol Buffers WarpScript extensions bring the power of those serialization frameworks without the hassle of intermediate code generation.

That possibility coupled with the large number of integrations of the WarpScript data analytics language confirm it as one of the best choices for manipulating data.

Both the Thrift and Protobuf extensions are readily available on WarpFleet.