WarpScript for Pythonists

WarpScript for Pythonists

Pythonists can benefit from using WarpScript in Python. In this post, we explain why, when, and how to do that.

Some of the contents in this article are taken from the talk I gave at a PyData meetup recently. The slides are available here.

If it's the first time that you hear about WarpScript™, I suggest you read this post first.

Why and when do you need WarpScript?

Python users already have the Pandas library to work with time-series, so when would they need WarpScript?

  • WarpScript supports pickle (using ->PICKLE and PICKLE-> functions). This means that data can flow efficiently between Python and WarpScript.
  • WarpScript has built-in functions to manipulate data and meta-data coming from time-series databases. For example, multi-way grouping and computing the mean series in each group can be done with one line of code: [ $gts [ 'key1' 'key2' ] reducer.mean ] REDUCE ].
  • WarpScript library is specialized in time-series (and geo time series) and contains more than 900 functions which were written to answer common practical use cases, from time and geo manipulation to graphical content generation and more. Not reinventing the wheel will gain you time!
  • Some functions overlap between WarpScript and pandas. For example BUCKETIZE with .resample() and MAP with .rolling(), but they differ enough to justify using WarpScript version of the function in practical cases (for example when there are missing data).
  • The same WarpScript can be executed either on a single server, or can be distributed with PySpark. See the doc here.
  • WarpScript doesn't need a Warp 10 platform. You can use it for its library, or process any input source on-the-fly. For example, it can transform any Hadoop input format at loading time.
  • You can include WarpScript macros from a trusted remote repository easily. Just use the syntax @repo/my/macro in your WarpScript to use a remote macro.

How to use WarpScript in Python

Using WarpScript in Python can be done in just a few steps.

Method 1: From a Jupyter notebook

Just pip install the extension and load it in your notebook.

%bash pip install warp10-jupyter
%load_ext warpscript

Now you are good to use the %%warpscript cell magic. The --local/l flag is used to tell that you are using the WarpScript library locally. If you want it to be connected to a Warp 10 platform, you can specify the --address and --port on which its Py4J gateway runs (see this post for more information).

%%warpscript --local --stack stack
'Hello world of WarpScript!'
top:  'Hello world of WarpScript!'

The WarpScript execution environment is stored under the variable stack. It will be reused in subsequent %%warpscript cells, or you can also use it to directly execute WarpScript code stack.exec("some-warpscript-code").

Method 2: Not from a Jupyter notebook

Note that the same package also provides functions to execute WarpScipt code outside of a notebook.

import warpscript #pip install warp10-jupyter

stack = warpscript.newLocalStack() # or newStack(adress, port, auth_token)

stack.exec('Hello world of WarpScript!')
...

Note that .exec() executes one-line statements and .execMulti() executes multi-line strings.

Method 3: With the Py4J library

If you want more control on your interaction with the stack and the JVM (for example for using a specific Warp 10 version, for using WarpScript extensions or simply other libraries from the Java world), you can do what precedes using the Py4J library directly.

With this method, you need a Warp 10 jar first. You can download one from bintray, then untar it:

wget https://dl.bintray.com/senx/generic/io/warp10/warp10/X.Y.Z/warp10-X.Y.Z.tar.gz
tar xvzf warp10-X.Y.Z.tar.gz

Now, launch a Py4J gateway. This gateway is responsible for creating a stack (the environment which executes WarpScript code). If you want to be connected to a Warp 10 platform, connect to its gateway rather than launching one (see this post for more information).

from py4j import launch_gateway, JavaGateway, GatewayParameters
import warpscript # optional import (in warp10-jupyter package), this overrides methods for printing stack and GTS objects

port, token = launch_gateway(enable_auth=True,die_on_exit=True,classpath='warp10-2.1.0/bin/warp10-2.1.0.jar')
gateway = JavaGateway(gateway_parameters=GatewayParameters(port=port, auto_convert=True, auth_token=token))

Specify a WarpScript configuration and create the stack:

default_conf = {}
default_conf['warp.timeunits'] = 'us'
default_conf['py4j.stack.nolimits'] = 'true'
entry_point = gateway.jvm.io.warp10.Py4JEntryPoint(default_conf)
stack = entry_point.newStack()

You can now play with it!

stack.exec('Hello world of WarpScript!')
...

Conversions

The gateway already automatically converts usual objects: numbers, lists, dicts, strings, bytes ...

For larger objects, the principled way to transfer them between Python and WarpScript is to use the pickle representation.

For example, in what follows we transfer some data to WarpScript:

import pickle

stack.push(pickle.dumps(ticks))
stack.push(pickle.dumps(values))

Now we use a WarpScript function (here it is TIMESPLIT):

%%warpscript --local --stack stack --not-verbose
[ 'ticks' 'values' ] STORE
$ticks PICKLE-> [] [] [] $values PICKLE-> MAKEGTS

1 d 2 'piece' TIMESPLIT

VALUES ->PICKLE

... and we retrieve data back in Python:

result = pickle.loads(stack.pop())

Additional tips

  • In a Jupyter notebook, you can use NOOP (no operation function) to initiate a stack:
%%warpscript --local --stack stack
NOOP
Local gateway launched on port 40641
Creating a new WarpScript stack accessible under variable "stack".
  • You can load macros from any trusted remote repository.

For example, we can set the list of trusted repository:

trusted_repos = stack.getAttribute('warpfleet.repos')
if trusted_repos is None:
    trusted_repos = []
trusted_repos.append('https://raw.githubusercontent.com/randomboolean/shareable/master/')

stack.setAttribute('warpfleet.repos', trusted_repos)

And then call a macro from this repo:

%%warpscript --local --stack stack
2 @mc2/RANDNORMAL
top: 	-0.9514475048807272
2: 	-0.6635116324344921
  • You can use the --not-verbose/-v flag if your stack contains big pickle objects to avoid the notebook representing them below the executed cell
  • Stack objects have the same public methods as defined in WarpScript source code. Of notable use are .peek(), .pop(), .get(int), .push(), .getAttribute(key), setAttribute(key, value), .depth()... you can list them all with dir(stack).
  • The --stack/s flag uses stack as default, so --stack stack is in fact not needed.
  • The --local/l flag is only needed when the stack is initiated. After that, the stack variable given as argument (or default one) is reused.

Conclusion

In this post, we reviewed why and when to use WarpScript in Python and how to do it. Happy WarpScripting in Python!

You may also be interested to read the previous post on the Py4J plugin for Warp 10, on the post that presented the notebook extension, and on the related page from the official documentation.

Share