Fast & Curious: Measuring Execution Time

Fast & Curious: Measuring Execution Time

When you write WarpScript™, you should always aim at writing clean and understandable code. That being said, you should also keep in mind that your code may have to analyze millions of points, or even billions with Spark. The way you model your data and how your algorithms work can have a tremendous impact. The execution of your code can possibly never end at all due to an inefficient data model.

So you may ask: How can I easily choose between two different implementations? How can I find where to improve my code? Well, there are functions in WarpScript to help you with that: CHRONOSTART, CHRONOEND, TIMED, and CHRONOSTATS. They will be available in the 2.1 version but you can already try them on the Sandbox!

With CHRONOSTART and CHRONOEND you can start and stop a named timer. If you reuse the same named timer, the total time is accumulated between each start and end. Each call to CHRONOSTART is counted too. When you call CHRONOSTATS, generally at the end of your script, you have for each timer the number of calls and the cumulative time spent in nanoseconds between start and end for each named timer. To make things even easier, we added TIMED which creates a copy of a macro and adds the calls to time it.

Simple A/B Testing

With these functions, we can compare several pieces of code to find the best approach in term of execution time. Here we present a simple microbenchmark to find the fastest way to sum two Geo Time Series™ (GTS) with 100,000 random values each. We all agree that results from microbenchmarks must be taken with care, especially as we don't use tools like JMH to avoid common pitfalls with benchmarking. Nevertheless, these kind of information can help you chose between two different implementations.

// Benchmark data
NEWGTS 
1 100000
<% s NaN NaN NaN RAND ADDVALUE %>
FOR
'gts1' STORE

NEWGTS
1 100000
<% s NaN NaN NaN RAND ADDVALUE %>
FOR
'gts2' STORE

[ $gts1 $gts2 ] 'lgts' STORE
// End data

// Version A: REDUCE
<% [ $lgts [] reducer.sum ] REDUCE 0 GET %>
'A' TIMED
'A' STORE

// Version B: +
<% $gts1 $gts2 + %>
'B' TIMED
'B' STORE

1 10000
<% 
  DROP 
  @A DROP
  @B DROP
%>
FOR

CHRONOSTATS

This scripts took approximately 145s to run and outputs:

[
  {
    "A": {
      "total_calls": 10000,
      "total_time": 113401508459
    },
    "B": {
      "total_calls": 10000,
      "total_time": 29891906553
    }
  }
]

So what does this tell us? It certainly seems that the B version is faster than version A: A took approximately 1m55 for 10,000 executions while B took only approximately 30s for the same number of executions. This is because REDUCE creates intermediary GTSs to compute the result while + does not.

In this case the result is pretty obvious but in some cases it may not, for example because it may vary greatly depending on the data which is manipulated. This should always be considered as an hint and not the invariable truth. As a word of warning, you must be always careful to mix calls and think of potential underlying optimizations, like for instance function calls on sorted GTS versus non-sorted ones.

Finding Bottlenecks

Doing microbenchmarks may be fun but optimizing each line of your WarpScript is not the way to go. You should mainly care about optimizing when you notice your job is too slow. Even then, you have to find where it is slow. Timed macros are too coarse to do the job, this is the time to use CHRONOSTART and CHRONOEND.

In the following script we will do a simple and, to be honest, totally useless job of computing the mean of every 100 data points in a random GTS. The goal is to find which is the part in the macro that takes the most time to execute.

<%
  'factor' STORE

  'data' CHRONOSTART
    NEWGTS 
    0 999999
    <% s NaN NaN NaN RAND $factor * ADDVALUE %>
    FOR
    'gts' STORE
  'data' CHRONOEND

  'modulo' CHRONOSTART
    $gts 100 s 'mod' TIMEMODULO
  'modulo' CHRONOEND

  'shift' CHRONOSTART
    <%
      DROP
      DUP FIRSTTICK -1 *
      TIMESHIFT
    %>
    LMAP
  'shift' CHRONOEND

  <%
    [ SWAP [] reducer.mean ] REDUCE
  %> 'reduce' TIMED EVAL // You can use this alternate version
%>
'rand_reduce' STORE

1 @rand_reduce DROP
42 @rand_reduce DROP
1337 @rand_reduce DROP

CHRONOSTATS

Which outputs:

[
  {
    "reduce": {
      "total_calls": 3,
      "total_time": 311338643
    },
    "data": {
      "total_calls": 3,
      "total_time": 2177168254
    },
    "shift": {
      "total_calls": 3,
      "total_time": 24920181
    },
    "modulo": {
      "total_calls": 3,
      "total_time": 221328093
    }
  }
]

By decomposing each part of the rand_reduce macro, it clearly becomes obvious that the data creation part takes almost 80% of the total time. If this macro must be optimized, this is the first place to look at. Using MAKEGTS may help, reusing a GTS instance will lead the best improvement.

Timing Overhead

If you find that time is critical, you will wonder what is the impact of timing your macros. Nothing's easier, you just have to time a timed macro:

<% %> 'content' TIMED 'timer' TIMED
'empty' STORE

1 10000000
<% DROP @empty %>
FOR

CHRONOSTATS
'stats' STORE

$stats 'timer' GET 'total_time' GET
$stats 'content' GET 'total_time' GET
-
$stats 'timer' GET 'total_calls' GET
/

Which outputs:

[904]

So the average time taken by each timing is less than 1 microsecond. As a consequence, you should avoid timing macros in production that are called in "big" loops, mappers, reducer or such. Everything else should be fine!

Take Away

Running WarpScript™ on billions of points is easy but you must avoid some pitfalls for your code to be fast enough. The CHRONOSTART, CHRONOEND, TIMED and CHRONOSTATS functions will help you with optimizing your scripts. Keep in mind that obscure WarpScripts may look faster but rethinking your data model will certainly make them a lot faster while still being understandable. If your code uses a lot of variables, you may find that POPR, PUSHR, VARS, and ASREGS may help.

Share