Archiving Time Series Data into Amazon S3

Archiving your Time Series data into a service such as S3 is something you may want to do as your data volumes grow. Learn how the S3 WarpScript extension can help you do just that.

When collecting Time Series data, whether from monitoring your infrastructure, tracking your assets, or listening to your IoT devices, at one point you are willing to move older data to less expensive storage.

If you are using Warp 10, there is a WarpScript extension which can do just that, enabling you to push some of your data to Amazon S3 while retaining the capability to fetch those data and analyze them in the same way you analyze the data still stored in Warp 10.

This extension called warp10-ext-s3 is fully Open Source and compatible with any S3 compatible object store, such as Ceph Object Gateway, or Scality's Zenko multi-cloud controller. It adds three new functions, S3STORE, S3LOAD, and S3BUCKETS, to store and retrieve data and to list available buckets.

The rest of this post will guide you through the deployment and use of this extension so you can start using S3 as a Warp 10 data source.

Learn more about Time Series: the future of data

Installing the S3 WarpScript Extension

The source code for the S3 extension is available on GitHub. First clone the repository using

git clone https://github.com/senx/warp10-ext-s3.git

Then you can build the extension:

cd warp10-ext-s3
./gradlew shadowJar

The build process produces a jar file in build/libs/warp10-ext-s3.jar, copy this jar file in the lib directory of your Warp 10 instance and enable the S3 extension by adding the following line in your Warp 10 configuration:

warpscript.extension.s3 = io.warp10.script.ext.s3.S3WarpScriptExtension

You can then restart your Warp 10 instance.

Launching a test S3 server

In order to test the S3 extension, you need to have an S3 server you can talk to. If you have an Amazon S3 account you can use that, otherwise, you can install Scality's S3 Server via Docker.

docker run -d --name s3server -p 8000:8000 scality/s3server

The local S3 Server instance will be available at http://localhost:8000/.

Using the S3 extension

Once your Warp 10 instance has been restarted with the S3 extension enabled, you can interact with S3 object stores from WarpScript. You can use the object store as a repository for various data structures using serialization functions available in WarpScript.

The SNAPSHOT function can be used to serialize the content of a complete stack into a WarpScript code fragment. When executed, this code will re-create the stack as it was prior to the call to SNAPSHOT. For Geo Time Series, the WRAPRAW function can be used to serialize a GTS as a byte array. Such a byte array can later be converted back to a Geo Time Series using UNWRAP.

Snapshotting a stack and storing the result into an S3 object is simply done as:

SNAPSHOT 'UTF-8' ->BYTES

// S3 storage key
'key'

//
// S3 parameter map
//
{ 
  'accessKey' 'accessKey1'
  'secretKey' 'verySecretKey1'
  'endPoint' 'http://localhost:8000'
}
S3STORE

And later retrieving the snapshot and re-creating the stack is as simple as:

// S3 storage key
'key'

//
// S3 parameter map
//
{ 
  'accessKey' 'accessKey1'
  'secretKey' 'verySecretKey1'
  'endPoint' 'http://localhost:8000'
}
S3LOAD
'UTF-8' BYTES-> EVAL

In order to list the available buckets in an object store, you can use the S3BUCKETS function:

//
// S3 parameter map
//
{ 
  'accessKey' 'accessKey1'
  'secretKey' 'verySecretKey1'
  'endPoint' 'http://localhost:8000'
}
S3BUCKETS

The call to S3BUCKETS will leave a list of known buckets onto the stack.

Using this logic you can archive your Time Series data to an S3 service by calling FETCH, wrapping the resulting Geo Time Series, and storing the produced blobs into S3. Once archived you can call DELETE to purge the data you just archived. Should you need to manipulate the archived data at a later time, it would simply be a matter of calling S3LOAD and UNWRAP to re-create GTS you can access in your WarpScript code.