Compare data hour to hour, day to day

Comparing every sunday data in a given timezone is not trivial. With WarpScript language, this is a pretty straightforward time series analytics.

If you already know about TIMEMODULO and TIMESHIFT functions, BUCKETIZE and REDUCE frameworks, if you master timezone problems, this article is NOT for you.

Level 1: compare data day to day

Customer: I want a mean of hourly power consumption of last 20 days, hour by hour. I mean, do a mean of data at h, h-24, h-48, h-72, and so on. There might be missing data points. Can I do this in WarpScript?

Me: Sure, it could be done in a few lines, give me your data!

20 days with a few missing points

The first thing you need to do is to make sure to align your data. 1 point per hour, the mean of data between h and h+59minutes. Then just use TIMEMODULO, to create splits of one day

// align timestamps, take first gts in the list
[ $OneMeter20dGTS bucketizer.mean 0 1 h 0 ] BUCKETIZE 0 GET 

//create a list of gts with a split attribute, one per day
1 d  'split' TIMEMODULO 
After TIMEMODULO, you have 20 GTS, one per day. The violet one as some missing data points.
Analyze your electrical consumption from your Linky and Warp 10

Doing a mean hour by hour is a REDUCE operation, with reducer.mean.exclude-nulls. The exclude-nulls allow some missing data in the series.

// compute the mean of every points for each tick
[ SWAP [] reducer.mean.exclude-nulls ] REDUCE
3 lines of WarpScript…

The WarpScript and the wrapped test data is playable on WarpStudio, just click this link.

Level 2: Compare data Sunday to Sunday

Customer: I also have a Sunday effect I want to isolate. Power Consumption is higher on Sunday for some customers. Is it possible to have series for one week day only, in Paris timezone?

Me: It is slightly more complex to do because you need to play with ->TSELEMENTS, but as it is a very useful function, we put it on WarpFleet. If you're running Warp 10 2.0, it is really easy. Please show me your data!

Customer: we haven't such data, please add 2.0 to every values on Sunday, Paris Timezone.

Create data

As an experimented WarpScript programmer, you guess that it could be done in a single mapper. For each value in a single value window, read the day from the timestamp, add 2.0 if day is Sunday:

[
  $OneMeter50dGTS
  <%
    'i' STORE
    <% $i 0 GET 'Europe/Paris' ->TSELEMENTS 8 GET 7 == %> //if day == 7
    <%
      $i [ 3 7 ] SUBLIST FLATTEN  //original data
      $i 7 GET 0 GET 2.0 + //get the value, add 2.0 on value
      4 SET //change the value
    %>
    <%
      $i [ 3 7 ] SUBLIST FLATTEN //push original data
    %> IFTE
  %> MACROMAPPER
  0 0 0 
] MAP 

Again, a fully playable example is available on this WarpStudio snapshot. Don't forget to select the right timezone in the DataViz!

Don't forget to select the timezone
Read our guest post from Benjamin Somers, to learn how the ResEl association uses Warp 10 for its day-to-day activities, from user QoS management to server monitoring & energy optimization.

One line split by day

Now you have some fake data, time to cut them by day. On Warp 10 2.0, WarpFleet repository is activated by default. It means that Warp 10 will look online for macros not found locally. Here we will use @senx/cal/bydayofweek

$OneMeter50dGTS 'Europe/Paris' @senx/cal/bydayofweek
Output of senx/cal/bydayofweek macro

Ok, we can timesplit each series with a one-day silent period. The output will be a list of list of GTS, one per day:

1 d 1 'split' TIMESPLIT 
List of list of splits

Timeshift with timezone

Now, we must shift all GTS to the same day, in order to REDUCE them in a mean. TIMEMODULO could do the job… But it won't take into account the timezone. For each series, we must look at the day of the year in the timezone, and shift it in the past accordingly. Applying a function on a list to produce a list could be done with LMAP:

<%
  DROP
  'gts' STORE
  $gts $gts FIRSTTICK 'Europe/Paris' ->TSELEMENTS 7 GET -1 * d TIMESHIFT
%> LMAP
Nearly the same as TIMEMODULO ?

Result is really close to my first TIMEMODULO example, but:

  • It takes timezones into account
  • Look at the GTS labels: they are labelled with .dayofweek.

The .dayofweek allow to use REDUCE on a class of equivalence. The final result just need one more line:

[ SWAP [ '.dayofweek' ] reducer.mean.exclude-nulls ] REDUCE
'mean by day' RENAME

The playable example is available here.

†

WarpScript step by step animation !

Conclusion

Comparing data day by day, year by year, is a very common questions from our customers to extract trends in their data. We know the thinking path is not trivial. I hope this article will help you to think in WarpScript !

Explore the other posts about Thinking in WarpScript.