WarpScript in Jupyter notebooks

Discover how to use WarpScript in a Jupyter notebook for doing data science on time series data.

tutorial jupyter

In this blog post, we'll talk about how to use WarpScript in a Jupyter notebook. WarpScript is the data programming language provided by the Warp 10 platform. An introduction to WarpScript is available on this link.

First, we review rapidly the method that consists in sending HTTP POST requests. Then, we explain the principled method to use WarpScript in Jupyter. This method allows going back and forth between WarpScript and Jupyter, while keeping the state of the WarpScript stack. Finally, we will show how to convert efficiently a list of Geo Time Series (GTS) into a Pandas DataFrame.

In what follows, we use Python 3.6.

1. Using POST requests

The first way to use WarpScript in a notebook is by hitting the /exec endpoint of a Warp 10 platform.

Example

Let us assume that you have a WarpScript file named tmp.mc2. If not, then you can write one within a Jupyter notebook using the magic %%writefile. For example, in the following cell, we create a GTS with the function MAKEGTS:

Then, in the next cell, we send it via an HTTP POST request.

Finally, the result of the WarpScript can be used in the notebook by parsing the JSON accordingly. For instance, let us plot that.

Pros

  • It does not require any particular configuration of the Warp 10 platform. You just hit its /exec endpoint, like with any other WarpScript.

Cons

  • You need to deserialize the WarpScript stack from JSON and parse it to assign the result to a Python variable.
  • Should you need to use a Python variable on the WarpScript stack you have to recreate it using WarpScript code.
  • Once the cell is executed, the WarpScript stack is not kept in memory.

The next method alleviates these cons. It allows accessing WarpScript objects that are on the stack directly from Python. Hence, it allows going back and forth between the Python interpreter and the WarpScript stack.

Read more: Matrix Profile of a Time Series to discover Patterns

2. Using the Warp10-Jupyter extension

You can install this Jupyter extension with pip install warp10-jupyter.

Description

This extension contains the cell magic %%warpscript. It works by asking a Warp 10 platform to create a new WarpScript™ execution environment (a stack) with which the notebook will interact.

It does this by connecting to a gateway launched by the Py4J library (e.g. the same library used by pySpark). This gateway can either be started by a Warp 10 platform (with the Py4J plugin), or by the Warp10-Jupyter extension itself.

If connecting to the gateway of a Warp 10 platform, note that the FETCH, FIND, and FINDSTATS functions won’t work, unless the property egress.clients.expose of the Warp 10 platform is set to true.

To use a local gateway (i.e. that is not connected to a Warp10 platform), you can use the syntax %%warpscript --local (release 0.4+ of the extension).

Example

To enable this magic in your notebook, you can do:

Then, let us try a basic WarpScript.

What happened here ?

The python interpreter has started a connection with a Warp 10 platform, which by default is a located at localhost:25333. Then, a new WarpScript stack has been created. It is accessible in this notebook. For example, the magic %whos acknowledges it

We can then interact with the stack. For example, we can use the pop method to extract the top of the stack.

We can use peek() or get(0) to retrieve the list that was left on top of the stack. In the next cells, we retrieve this list to see that we can modify it in Python.

Similarly, in Python, we can use variables that were stored by the stack by using load().

Arguments

This cell magic has multiple optional arguments. You can invoke its doc with %%warpscript?.

How are objects converted between Java and Python?

Under the hood, this extension uses the Py4J protocol and its automatic conversions, which are supported for basic classes e.g. primitive types, lists, sets, maps, iterators and byte arrays. You can also define your own conversions, but we won't detail that in this blog post. For more information, see here.

Therefore, GTS are (obviously) not automatically converted from the WarpScript stack to the Python interpreter. This is the subject of the next section.

3. Conversions between Geo Time Series and DataFrame

Let us start a new notebook. For convenience, we will use the alias %%w for %%warpscript.

Also, we will store the WarpScript stack in the python variable swith the inline argument --stack / -s.

From GTS to DataFrame

To convert a list of GTS into a Pandas DataFrame, we will first store its content into a map of lists and pickle it on the stack using ->PICKLE. The result is an array of bytes that is converted efficiently from Java to Python by Py4J. Then, we unpickle it and feed a new DataFrame with it.

The Java part is done by the macro ListGTStoPickledDict that can be obtained here. This macro takes as arguments a list of GTS, followed by a boolean indicating whether to keep the labels or not.

Let us show a quick example.

Then, we do the Python part of the conversion, as in the cell that follows.

timestampsrandGTS.latrandGTS.lonrandGTSrandTSstringTS
036000000000.0398770.8986500.340894NaNNaN
172000000000.2538460.4604490.3593120.422285NaN
2108000000000.2907110.6820140.863143NaNNaN
3144000000000.4118540.9027130.0014400.822429NaN
4180000000000.5098330.2612090.235570NaNNaN
5216000000000.3869190.2257720.0066090.279210NaN
6252000000000.3345310.5595150.638833NaNNaN
7288000000000.0684980.0739610.9265050.415326a string
8324000000000.3458700.7657270.021555NaNNaN
9360000000000.7942930.5462480.7124940.481021NaN
10396000000000.5217950.5598170.579806NaNNaN
11432000000000.6066930.6097490.5106580.325373a string
12468000000000.0628640.8866990.639813NaNNaN
13504000000000.8005820.2152580.0095980.145950NaN
14540000000000.1915010.5640570.315020NaNNaN
15576000000000.2720960.8078980.9217990.974948a string
16612000000000.2078820.9259990.177088NaNNaN
17648000000000.1189500.0060780.0628280.378946NaN
18684000000000.8065780.9946620.240190NaNNaN
19720000000000.7018070.3571260.1969030.837751a string

It is important to note that GTS of a list can have different ticks, hence the macro ListGTStoPickledDict also fills missing ticks with NaN. Indeed, this is just an example but for real cases it is good practice to align the ticks of the GTS beforehand by using WarpScript (for example, see BUCKETIZE or FILL).

From DataFrame to GTS

The inverse conversion can be done similarly. To illustrate this, let us revert randGTS into a GTS.

Conclusion

We have seen that we can use WarpScript in Jupyter and even interact with the stack. We presented the Warp10-Jupyter extension with some short examples. The source code is available at https://github.com/senx/warp10-jupyter.