In this blog post, we’ll talk about how to use WarpScript™ in a Jupyter notebook. WarpScript™ is the data programming language provided by the Warp 10 platform. An introduction to WarpScript™ is available on this link.
- 1. Using POST requests
- 2. Using the Warp10-Jupyter extension
- 3. Conversions between Geo Time Series™ and DataFrame
First, we review rapidly the method that consists in sending HTTP POST requests. Then, we explain the principled method to use WarpScript™ in Jupyter. This method allows to go back and forth between WarpScript™ and Jupyter, while keeping the state of the WarpScript™ stack. Finally, we will show how to convert efficiently a list of Geo Time Series™ (GTS) into a Pandas DataFrame.
In what follows, we use Python 3.6.
1. Using POST requests
The first way to use WarpScript™ in a notebook is by hitting the /exec endpoint of a Warp 10™ platform.
Let us assume that you have a WarpScript™ file named tmp.mc2. If not, then you can write one within a Jupyter notebook using the magic
%%writefile. For example, in the following cell, we create a GTS with the function
Then, in the next cell, we send it via an HTTP POST request.
Finally, the result of the WarpScript™ can be used in the notebook by parsing the JSON accordingly. For instance, let us plot that.
- Does not require any particular configuration of the Warp 10™ platform. You just hit its /exec endpoint like with any other WarpScript™.
- You need to deserialize the WarpScript™ stack from JSON and parse it to assign the result to a Python variable.
- Should you need to use a Python variable on the WarpScript™ stack you have to recreate it using WarpScript™ code.
- Once the cell is executed, the WarpScript™ stack is not kept in memory.
The next method alleviates these cons. It allows to access WarpScript™ objects that are on the stack directly from Python. Hence, it allows to go back and forth between the Python interpreter and the WarpScript™ stack.
2. Using the Warp10-Jupyter extension
You can install this Jupyter extension with
pip install warp10-jupyter.
This extension contains the cell magic
%%warpscript. It works by asking a Warp 10™ platform to create a new WarpScript™ execution environment (a stack) with which the notebook will interact.
It does this by connecting to a gateway launched by the Py4J library (e.g. the same library used by pySpark). This gateway can either be started by a Warp 10™ platform (with the Py4J plugin), or by the Warp10-Jupyter extension itself.
To use a local gateway (i.e. that is not connected to a Warp10 platform), you can use the syntax
%%warpscript --local (release 0.4+ of the extension).
To enable this magic in your notebook, you can do:
Then, let us try a basic WarpScript™.
What happened here ?
The python interpreter has started a connection with a Warp 10™ platform, which by default is a located at
localhost:25333. Then, a new WarpScript™ stack has been created. It is accessible in this notebook. For example, the magic
%whos acknowledges it
We can then interact with the stack. For example we can use the pop method to extract the top of the stack.
We can use
get(0) to retrieve the list that was left on top of the stack. In the next cells, we retrieve this list to see that we can modify it in Python.
Similarly, in Python we can use variables that were stored by the stack by using
This cell magic has multiple optional arguments. You can invoke its doc with
How are objects converted between Java and Python?
Under the hood, this extension uses the Py4J protocol and its automatic conversions, which are supported for basic classes e.g. primitive types, lists, sets, maps, iterators and byte arrays. You can also define your own conversions, but we wont detail that in this blog post. For more information, see here.
Therefore, GTS are (obviously) not automatically converted from the WarpScript™ stack to the Python interpreter. This is the subject of the next section.
3. Conversions between Geo Time Series™ and DataFrame
Let us start a new notebook. For convenience, we will use the alias
Also, we will store the WarpScript stack in the python variable
swith the inline argument
--stack / -s.
From GTS to DataFrame
To convert a list of GTS into a Pandas DataFrame, we will first store its content into a map of lists and pickle it on the stack using ->PICKLE. The result is an array of bytes that is converted efficiently from Java to Python by Py4J. Then, we unpickle it and feed a new DataFrame with it.
The Java part is done by the macro ListGTStoPickledDict that can be obtained here . This macro takes as arguments a list of GTS followed by a boolean indicating whether to keep the labels or not.
Let us show a quick example.
Then, we do the Python part of the conversion as in the cell that follows.
It is important to note that GTS of a list can have different ticks, hence the macro ListGTStoPickledDict also fills missing ticks with NaN. Indeed this is just an example but for real cases it is good practice to align the ticks of the GTS beforehand by using WarpScript (for example, see BUCKETIZE or FILL).
From DataFrame to GTS
The inverse conversion can be done similarly. To illustrate this, let us revert randGTS into a GTS.
We have seen that we can use WarpScript™ in Jupyter and even interact with the stack. We presented the Warp10-Jupyter extension with some short examples. The source code is available at https://github.com/senx/warp10-jupyter.
SHM is a Warp 10™ extension that allow to keep data in RAM. It speeds up data scientist job !
In this blog post, we review the essential frameworks available in WarpScript. They simplify greatly usual time series processing.
Machine Learning Engineer