FETCHez la data !

Accessing your time series data is the first thing you have to do, that is why Warp 10 offers so many options for retrieving data. Discover them in this post!

Fetch: FETCHez la data ! Thinking in WarpScript

Your journey with time series data has to start somewhere, when using Warp 10, your starting point is usually a call to FETCH which will retrieve data from the underlying Time Series DataBase (TSDB), namely the Warp 10 Storage Engine.

The FETCH function has a plethora of options, this post will cover some very useful combinations so you can rapidly access your data in ways you did not know were possible.

General syntax

The FETCH function has a simple calling signature using a list of parameters as input and an advanced one relying on a parameter map. This post will focus on this latter function as it is the one allowing the most flexibility.

Selecting Geo Time Series

In order to fetch data, you will need credentials in the form of a Read Token. This token should always be present in the parameter map under the key token.

The next thing you will need to select which series to access is a set of parameters defining them. Those parameters will match the class and labels or attributes of the series. They can be specified in one of both ways.

Using class and labels

The class key of the parameter map can be associated with a value representing either an exact match on the Geo Time Series class or a regular expression classes should match. An exact match is simply a STRING which should be prefixed with = if the exact match starts with either a = or a ~. A regular expression is a STRING prefixed with ~ and followed by the regular expression to use for matching.

The labels key of the parameter map should be associated with a map whose keys are label or attribute names and values are either exact matches (prefixed with =) or regular expressions (prefixed with ~).

The following example will select all series from class foo with a label bar matching the regular expression .*matchme.*.

{ 'token' 'xxxxx' 'class' 'foo' 'labels' { 'bar' '~.*matchme.*' } } FETCH

Using selectors

A Geo Time Series (GTS) selector is a STRING of the form CLASS_SEL{LABELS_SEL} where CLASS_SEL conforms to the syntax of the class parameter above and LABELS_SEL is a comma separated list of label or attribute name immediately followed by an exact match (prefixed with =) or regular expression (prefixed with ~).

The parameter map can contain a single selector under the key selector or a list of selectors under selectors. This allows you to retrieve series for which a single regular expression would prove cumbersome to craft.

The example below fetches two sets of series, based on two selectors.

{ 'token' 'xxxxx' 'selectors' [ '=class1{label1~.*regexp.*}' '~class2.*{label=value}' ] } FETCH

Note that for both approaches, STRING values should be URL encoded (using %hh) if they contain characters such as ,, {, =, ~ or }.

Retrieving data within a time range

The most common pattern for accessing data is to retrieve the data within a given time range. In WarpScript this is easily done by specifying either an end timestamp and a duration (timespan), or start and end timestamps.

When retrieving data with an end timestamp and a duration, the combination of those elements will be converted to a start timestamp (start = end - timespan + 1).

Here are two examples:

{ 'token' 'xxxxx' 'selector' '....' 'end' NOW 'timespan' 24 h // timespan is expressed in platform time units } FETCH { 'token' 'xxxxx' 'selector' '....' NOW 'now' STORE 'end' $now 'timespan' $now 24 h - 1 + } FETCH

Accessing the most recent values

Another popular pattern is to retrieve N values before or at a given instant. This is easily done with FETCH by specifying an end timestamp and a number of datapoints to retrieve.

{ 'token' 'xxxxx' 'selector' '....' 'end' NOW 'count' 20 // Retrieve up to 20 data points at or before end. } FETCH

Selecting active or quiet Geo Time Series

Warp 10 can track what it calls activity of Geo Time Series. When this feature is enabled (via ingress.activity.window), updates and metadata changes performed on a GTS will modify its last activity timestamp.

It is then possible for a FETCH or FIND call to only select series which were active or quiet after a given timestamp.

This is simply done by using the active.after or quiet.after keys in the parameter map.

{ 'token' 'xxxxx' 'selector' '....' 'end' NOW 'count' 20 // Retrieve up to 20 datapoints at or before end. // Only consider series which were active within the last 24 hours // but quiet in the last 12 hours 'active.after' NOW 24 h - 'quiet.after' NOW 12 h - } FETCH

Fetching boundary values

Apart from selecting active or quiet series, everything we have described so far can more or less be found in every time series database.

Boundaries are a totally different story, and to the best of our knowledge, Warp 10 is the only time series database to support them with some industrial data historians.

Boundaries are data points which are either before (pre boundary) or after (post boundary) a specific time range. When working with IoT data, boundaries are very useful, because they allow you to greatly limit the amount of data you need to store. How is that?

Well imagine you have sensors whose values hardly ever change, this is rather common when a sensor reports the position of a valve for example. You could store the value of the sensor at periodic intervals, consuming possibly costly bandwidth, or you could only store the value when it changes. The latter case is the ideal one, but without boundary support in your TSDB, you will have a hard time fetching and analyzing those values.

Indeed, imagine your valve only changes state every week, so you record one value per week for the associated sensor. How do you determine the state of the valve at any given time? Without boundaries you have to basically guess, fetching data at random after the moment you are interested in, hoping to find the first value after that instant. The value before the moment you are interested in is usually easier to fetch, luckily.

With boundaries, the problem is easily solved, simply specify the time range you are interested in and ask FETCH to retrieve the first value just before the time range and the first value right after it. If the valve did not change state within the specified time range it is no big deal, you will still end up with the value before and after it and you can proceed with your analysis.

The boundaries are specified in the following manner:

{ 'token' 'xxxxx' 'selector' '....' NOW 'now' STORE 'end' NOW 7 d - 'timespan' 24 h 'boundary' 1 // specify 1 datapoint for both pre and post boundaries // Alternatively you can specify pre and post boundaries separately // 'boundary.pre' 1 // 'boundary.post' 1 } FETCH

If you have followed carefully the explanation, you have correctly concluded that by using boundary.post you can retrieve the first datapoints after a given timestamp, something very few time series databases can do!

Sampling values

Sometimes you are not interested in all the values within a time range, this is why FETCH supports sampling of your data.

Simply specify the ratio of data points which should be returned, the FETCH call will sample your data, selecting only this amount of values.

Note that sampling is not applied to pre and post boundaries.

The syntax for sampling is:

{ 'token' 'xxxxx' 'selector' '....' 'end' NOW 'sample' 0.5 // Retain 50% of the data points encountered } FETCH

The sampling mechanism still has to seek to each value, so even though only a portion of the encountered data points is returned, no significant performance improvement will be noticed.

Skipping values

If you should find it useful to skip values, know this is something FETCH can do. Simply use the following syntax:

{ 'token' 'xxxxx' 'selector' '....' 'end' NOW 'count' 10 // Return 10 data points 'skip' 10 // Skip the 10 most recent values before returning data } FETCH

As for sampling, skipping still needs to seek in the data files, so performance should only be slightly better than when fetching the skipped values.

Combining it all

Yes, all the options we have described can be combined, giving you the most flexible fetching capability of all time series databases.

Tell us what you achieved using those fancy fetching patterns!

In case you were using a version of Warp 10 older than 2.3.0, some of the options described above may not be available. Please upgrade to test drive everything.