We present 12 simple tips to make full use of the powerful MAP framework for manipulating time series with sliding window algorithms.
Obviously in a time series, time matters, and in most cases each data point is well correlated with neighboring points. So, when applying an operation on data points, we may often want to have access to their neighbors. That is why WarpLib has the MAP
framework: it allows to apply a function on a sliding window so that each data point will be processed with its contextual information. Functions that are computed along the way on the sliding window are called mappers.
If WarpLib is new to you, we recommend that you read first a gentle introduction to WarpScript, and the WarpScript 101 about the syntax. Or if you prefer a conventional syntax to use WarpLib, you can also read an introduction to FLoWS.
Oftentimes when the MAP
function is called in WarpScript, it takes a list of five arguments:
[ gts mapper pre post occurrences ] MAP
For example, computing a moving average can be done with a single line:
// compute the moving average with 9 values
[ $gts mapper.mean 4 4 1000 ] MAP
Here the moving window includes the current value, 4 before (pre), and 4 after (post). The mapper is computed for the first thousands of ticks (occurrences).
It's no surprise that MAP
is one of the most used functions of WarpLib. But we can do much more with it than the basic usage provided with the default arguments.
Here in what follows are some useful tips to make better use of MAP
.
Mastering these few tips is all it takes to become an expert on applying sliding window algorithms.
Tips about the syntax
#1 If you operate on multiple GTS at once, all you need is still a single line of code
It is easy to forget that this is a feature since it is often naturally used in scripts and is common to other WarpLib's frameworks as well. But since they are the most used functions, that saves quite a lot of code.
For example, in the line above that computes the moving average, the variable $gts
can be a single GTS or a list of GTS. No need to declare a loop in the latter case.
That is, if $gts
is a list of GTS, the line:
[ $gts mapper.mean 4 4 0 ] MAP
is equivalent to:
[
$gts
<%
[ SWAP mapper.mean 4 4 0 ] MAP
%>
FOREACH
]
FLATTEN
Recall that this tip is also true for other framework functions. Making use of that feature easily improves your script readability.
#2 If occurrences is negative, the GTS will be traversed in reverse order
In the argument list of MAP
, the number of occurrences is the fifth one:
[ $gts $mapper $pre $post $occurrences ] MAP
Note that the forward order is defined from the oldest tick to the most recent (or from left to right on the time axis). So for example, if we want to apply a mapper on the 60
most recent ticks, we would need $occurrences
to be equal to -60
.
#3 If occurrences equals 0, the mapper will be applied for every tick
Obviously one would not use MAP
to not use the mapper a single time… So the value 0
is used to be a shorthand that means "set the number of occurrences to the size of the input time series".
As a consequence, we often see this idiomatic code pattern in WarpScript for applying single-value mappers:
[ $gts $mapper 0 0 0 ] MAP
#4 We can use a map syntax instead of a list syntax to declare the arguments
This one can be surprising since it is seldom seen but yes, MAP
can accept a map as input. For example, the following line is perfectly valid:
$gts { 'mapper' 0.1 mapper.mul } MAP
And is equivalent to this one:
[ $gts 0.1 mapper.mul 0 0 0 ] MAP
Tips for defining the width of the sliding window
#5 Don't forget that the sliding window can have different sizes, especially on the borders!
Suppose we have a basic GTS with these ticks: 100 200 300 400 500
If we slide a window with pre = 2 like that [ $gts $mapper 2 0 0 ] MAP
, then the windows will be:
[100], [100,200], [100,200,300], [200,300,400], [300,400,500]
If we forgot this detail, we could easily introduce a bug in the code without knowing it.
A convenient function to use in that scenario is STRICTMAPPER
. It will ensure that the mapper is computed only at windows with a certain number of ticks.
For example, with [ $gts $mapper 3 3 STRICTMAPPER 2 0 0 ] MAP
, the windows will have at least 3 ticks (and at most 3 ticks):
[100,200,300], [200,300,400], [300,400,500]
#6 If pre (or post) is negative, it will define a window size in time units
Even though it is usually more convenient to define the width of the sliding window in terms of a number of ticks, sometimes, it will create windows with ticks that are very far apart.
For example, suppose we have this series of ticks: 100 200 300 1700 1800 1900 ...
With this example [ $gts $mapper 1 1 0 ] MAP
, the window centered around 1700
would be [200,1700,1800]
. The data point at tick 200
is probably irrelevant.
Instead, we can do [ $gts $mapper -100 -100 0 ] MAP
, and the windows would be:
[100,200], [100,200,300], [200,300], [1700,1800], [1700,1800,1900], [1800,1900,2000], ...
Note that we can mix a window with both semantic, for example with the argument pre defining a time width, and the argument post defining a number of ticks.
In our example, ticks are small values to ease the reading. In reality, your ticks will have values like 1620831427195637
and separated by several milliseconds, seconds or minutes, etc. In that case, pre and post can have values like -10 s
or -5 m
or even 2 d
for instance, meaning respectively 10 seconds, 5 minutes or 2 days.
Tips for defining the positions of the sliding window
#7 The hidden argument step allows to jump ticks
As a matter of fact, the input list of MAP
can have more than 5 arguments! The sixth one is the step argument. It allows you to jump ticks by not computing the mapper for every tick. This can be handy to subsample the input GTS. That can save a lot of time if the mapper is time-consuming.
For example, this line [ $gts $mapper 100 100 0 10 ] MAP
will call the mapper 1 out of every 10 times.
#8 You can define precisely where the windows are created
With the default syntax, a window is created for each tick of the GTS. But what if the ticks are randomly distributed, and we want to output results at precise timestamps?
For example, suppose that we want to output one value per day, for this use case, we can use the hidden argument ticks, available only with the map syntax (see tip #4).
First, let us create a list of ticks. The following line creates a list with 366 ticks, each being the last timestamp of each day of 2020:
[
0 365
<%
'2020-01-01T23:59:59.999999Z' TOTIMESTAMP
SWAP ADDDAYS
%>
FOR
]
'outputTicks' STORE
Then we call MAP
using the map syntax:
$gts
{
'mapper' $mapper
'pre' 12 h
'ticks' $outputTicks
}
MAP
This will create windows that span 12 hours each day of 2020. The outputs will have one tick per day, as defined by the ticks list.
Tips for defining custom mappers
#9 You can pass a macro as argument instead of a mapper
WarpLib has a lot of mappers that are ready to use off-the-shelf. But what if the one we need is not implemented?
Then indeed we can define custom operations! Instead of the mapper argument, we can pass a macro. This macro expects a subgts as input, and must output a data point, in the form [ value ]
or [ tick lat lon elev value ]
or [ tick value ]
(or [ NULL ]
for no data point).
For example, the following mapper returns a boolean indicating whether all data points in the sliding window are located inside a geographical area:
<%
DUP $shape GEO.WITHIN
<% 0 ATINDEX true 4 SET %>
<% 0 ATINDEX false 4 SET %>
IFTE
%>
'is.within.shape' STORE
[ $gts $is.within.shape -10 s -10 s 0 ] MAP
#10 You can use MACROMAPPER to define precisely a custom mapper
Tips #9 is convenient, but sometimes we need more information about the sliding window (for instance, if we need to know from which tick the window has been created).
In that case, we can use the function MACROMAPPER
. This function wraps a macro into a mapper, but instead of expecting an input sub GTS like in tip #9, this macro expects a fully defined window consisting of the following items:
[tick_of_computation,[gts_classes],[label_maps],[ticks],[latitudes],[longitudes],[elevations],[values]]
Note that this input syntax is similar to that of MACROREDUCER
, thus why [gts_classes]
and [label_maps]
are lists.
#11 If the custom mapper changes the output tick, do not forget to specify it
By default, if the macro of our custom operation returns a tick, it won't be taken into consideration, and the tick of the output data point will be the one from which the window was created. This behavior is the default one to prevent altering accidentally the time reference.
Hopefully, when the tick base has to change, MAP
has the hidden overrideTick parameter that allows that by setting it to true
. It is the seventh positional argument, for example:
[ $gts $customMapper $pre $post 0 1 true ] MAP
#12 You can compute multiple output GTS in a single pass with a custom mapper
To save looping multiple times through the ticks, or for convenience, we can output more than one GTS.
It is possible to do that whether we are in the case of tip #9 and #10.
Instead of pushing back a list, the macro must push back a map:
{
'res_1': [tick, lat, lon, elev, val], // or [val], or [tick, val]
'res_2': [tick, lat, lon, elev, val],
...
}
Here, res_1
and res_2
are the GTS classnames in the output list.
Conclusion
That's all for the tips on the MAP
framework.
If you read through all of them, congratulations! You have become a MAP
power user.
How many of these tips have you already applied?
Do you see more tips to add to this list? Do not hesitate to tell us on Twitter or on Slack!
Read more
Thinking in WarpScript – Fetch data within a geographical area
An AIS data set for the 2022 Ocean Hackathon
The COVID-19 Pandemic and the art of Geo Time Series
Machine Learning Engineer