Compare data hour to hour, day to day

If you already know about TIMEMODULO and TIMESHIFT functions, BUCKETIZE and REDUCE frameworks, if you master timezone problems, this article is NOT for you.

Level 1: day to day comparison

Customer: I want a mean of hourly power consumption of last 20 days, hour by hour. I mean, do a mean of data at h, h-24, h-48, h-72, and so on. There might be missing datapoints. Can I do this in WarpScript?

Me: Sure, it could be done in a few lines, give me your data!

20 days with a few missing points

The first thing your need to do is to make sure to align your data. 1 point per hour, the mean of data between h and h+59minutes. Then just use TIMEMODULO, to create splits of one day

[ $OneMeter20dGTS bucketizer.mean 0 1 h 0 ] BUCKETIZE 0 GET  // align timestamps, take first gts in the list
1 d  'split' TIMEMODULO //create a list of gts with a split attribute, one per day
after TIMEMODULO, you have 20 GTS, one per day. The violet one as some missing datapoints.

Doing a mean hour by hour is a REDUCE operation, with reducer.mean.exclude-nulls. The exclude-nulls allow some missing data in the series.

[ SWAP [] reducer.mean.exclude-nulls ] REDUCE // compute the mean of every points for each tick
3 lines of WarpScript...

The WarpScript and the wrapped test data is playable on WarpStudio, just click this link.

Level 2: Sunday to Sunday comparison

Customer: I also have a Sunday effect I want to isolate. Power Consumption is higher on Sunday for some customers. Is it possible to have series for one week day only, in Paris timezone?

Me: It is slightly more complex to do because you need to play with ->TSELEMENTS, but as it is a very useful function, we put it on WarpFleet. If you're running Warp 10 2.0, it is really easy. Please show me your data!

Customer: we haven't such data, please add 2.0 to every values on sunday, Paris Timezone.

Create data

As an experimented WarpScript™ programmer, you guess that it could be done in a single mapper. For each value in a single value window, read the day from the timestamp, add 2.0 if day is sunday:

[ $OneMeter50dGTS 
  <% 'i' STORE
    <% $i 0 GET 'Europe/Paris' ->TSELEMENTS 8 GET 7 == %> //if day == 7
    <%
      $i [ 3 7 ] SUBLIST FLATTEN  //original data
        $i 7 GET 0 GET 2.0 + //get the value, add 2.0 on value
      4 SET //change the value
    %>
    <%
      $i [ 3 7 ] SUBLIST FLATTEN //push original data
    %> IFTE
 %> MACROMAPPER 0 0 0 ] MAP 

Again, a fully playable example is available on this WarpStudio snapshot. Don't forget to select the right timezone in the DataViz!

don't forget to select the timezone

One line split by day

Now you have some fake data, time to cut them by day. On Warp 10 2.0, WarpFleet repository is activated by default. It means that Warp 10™ will look online for macros not found locally. Here we will use @senx/cal/bydayofweek

$OneMeter50dGTS 'Europe/Paris' @senx/cal/bydayofweek
Output of senx/cal/bydayofweek macro

Ok, we can timesplit each series with a one day silent period. The output will be a list of list of GTS, one per day:

1 d 1 'split' TIMESPLIT 
List of list of splits

Timeshift with timezone

Now, we must shift all GTS to the same day, in order to REDUCE them in a mean. TIMEMODULO could do the job... But it won't take into account the timezone. For each series, we must look at the day of the year in the timezone, and shift it in the past accordingly. Applying a function on a list to produce a list could be done with LMAP:

<%
  DROP
  'gts' STORE
  $gts  $gts FIRSTTICK 'Europe/Paris' ->TSELEMENTS 7 GET -1 * d TIMESHIFT
%> LMAP
Nearly the same as TIMEMODULO ?

Result is really close to my first TIMEMODULO example, but:

  • It takes timezones into account
  • Look at the GTS labels: they are labelled with .dayofweek.

The .dayofweek allow to use REDUCE on a class of equivalence. The final result just need one more line:

[ SWAP [ '.dayofweek' ] reducer.mean.exclude-nulls ] REDUCE
'mean by day' RENAME

The playable example is available here.

†

WarpScript step by step animation !

Conclusion

Comparing data day by day, year by year, is a very common questions from our customers to extract trends in their data. We know the thinking path is not trivial. I hope this article will help you to think in WarpScript !

Share