Learn how to index spatio temporal data in Warp 10™ so you can query your series by geographic area and time range.
When we started the company behind Warp 10 six years ago, the challenge we were trying to solve was to store, index and query data coming from moving sensors.
This led us to create a data model called Geo Time Series which merges the readings from each sensor with the location of the sensor at the time of the reading. So Geo Time Series, or GTS for short, are both a time series and a three-dimensional track merged into a common data structure. These data structures are at the core of the Warp 10 Storage and Analytics Engines.
Early versions of Warp 10 included a component called GeoDirectory. It was later removed, as using Geo Time Series for doing spatio-temporal indexing of other Geo Time Series was deemed simpler and more efficient.
This post will explain how you can do spatio-temporal indexing of your data using only Warp 10.
The spatio-temporal indexing process is based upon observation of Geo Time Series which cross (have datapoints in) spatio-temporal cells. So we first need to define characteristics of the spatio-temporal cells we will use. Spatio-temporal cells determine the spatial and temporal resolutions at which the data will be indexed. The spatial resolution is the resolution of the HHCode cells we consider. The temporal resolution is the time during which we observe data in the given geo cells.
The choice of spatial and temporal resolutions depends on the type of moving objects you are indexing. If you are tracking planes which fly at 900 km/h, they cover 1 km every 4 seconds. So in a 5-minute window, a plane can cover 75 kilometers of distance. If you chose 5 minutes as your temporal resolution, it is probably wise to choose 8 as your spatial resolution to have cells measuring 75 km per 150 km at the equator. Another element to consider is the number of cells, at spatial resolution N, the surface of the globe is covered by 22N cells. Our spatio-temporal indexing process will create up to one index Geo Time Series per geo-cell. So in the case of our aircraft, at resolution 8 we could end up with up to 22*8 = 65,536 GTS depending on where planes were spotted.
Here is a summary of the various resolutions, the associated resolution in meters and the number of cells needed to cover the Earth surface at each of them.
+----------------------------+----------------------------+ | Resolution | Scale | Number of Cells | | (HHCode level) | in meters | for Global Coverage | +----------------------------+----------------------------+ | 2 | 10,000 km | 16 | | 4 | 2,500 km | 256 | | 6 | 625 km | 4,096 | | 8 | 156 km | 65,536 | | 10 | 39 km | 1,048,576 | | 12 | 10 km | 16,777,216 | | 14 | 2.5 km | 268,435,456 | | 16 | 600 m | 4,294,967,296 | | 18 | 150 m | 68,719,476,736 | | 20 | 40 m | 1,099,511,627,776 | | 22 | 10 m | 17,592,186,044,416 | | 24 | 2.5 m | 281,474,976,710,656 | | 26 | 60 cm | 4,503,599,627,370,496 | | 28 | 15 cm | 72,057,594,037,927,936 | | 30 | 4 cm | 1,152,921,504,606,846,976 | | 32 | 1 cm | 18,446,744,073,709,551,616 | +----------------------------+----------------------------+
It is obvious that you should not choose too fine a resolution if you are tracking objects which cover a large part of the globe. But when tracking objects within a small area, fine resolutions are OK.
Spatio Temporal Indexing Process
Spatio-temporal indexing is a multistep process. We need to load the data to index, identify the geo-cells which have datapoints, build values for the indexing Geo Time Series and then store those values back in Warp 10.
In order for the indexing to work, we need to attach to each Geo Time Series we intend to index a numerical id. This id must be less than 64 bits, it will be stored as an attribute of the Geo Time Series.
The first step is to load the data to index from Warp 10. If you have several related Geo Time Series, with identical associated positions, you do not need to index them all. You can simply use the same numerical id for all related Geo Time Series and only load one of the Geo Time Series from each related set.
You should load data for a time range which is a multiple of your temporal resolution. If loading data for more than one time interval, you will need to apply the rest of the process for each time interval considered by using a loop, for example.
Once the data is loaded, we will consider a single time interval here. We need to identify which geo-cells contain datapoints and for each geo-cell which Geo Time Series have datapoints in it.
This process is really building an inverted index with postings lists where terms are geo-cells and documents are Geo Time Series.
This process is very easy as we at SenX provide a macro on our WarpFleet repository which does just that! The macro is
@senx/geo/geoindex, it expects as parameters the list of Geo Time Series to index, the name of the attribute containing the numerical id, the spatial resolution, and the tick to generate, usually the beginning or end of the time range being indexed. It outputs a list of index Geo Time Series named
senx.geoindex with a label named
cell which contains the geo-cell.
Persisting the index is as simple as storing the Geo Time Series generated by the call to
The indexing process can be automated by creating a runner in the directory of your Warp 10 instance. This script should periodically fetch data to index, create the index values and store them back in Warp 10. The indexing of recent data can even be incremental during the current time range at the index temporal resolution, so you can query those data shortly after they were stored.
Spatio Temporal Querying
With the index Geo Time Series generated above, it is possible to answer spatio-temporal queries. These queries include both a geographical area and a time range. The geographic area is a GEOSHAPE generated from either WKT or GeoJSON as described in our post Working with Geo Data, but it needs to be processed by
GEO.OPTIMIZE before it can be used for querying the spatio-temporal index to ensure that the GEOSHAPE only contains cells at or below the HHCode resolution of the index. Once this is done, the regular expression produced by
GEO.REGEXP can be used as a selector for the
cell label of the
senx.geoindex index Geo Time Series. The time range of the
FETCH should take into consideration the temporal resolution and the choice of ticks for the index GTS datapoints, typically either the beginning or end of temporal resolution time ranges.
When the index GTS datapoints have been retrieved, the
@senx/geo/geofind macro can be called with the list of Geo Time Series as parameter. This macro is available in SenX WarpFleet repository and will return a list of numerical ids observed in the selected spatio-temporal cells. This list of ids can then be used for querying the original data that was indexed. We highly recommend that you use the
REOPTALT function to generate an optimized regular expression from the list of ids, this can speed up the identification of matching Geo Time Series.
This post has walked you through the process of creating a spatio-temporal index for Geo Time Series data. This simple process opens a world of possibilities for your applications, and we really hope you will be creative using it! Please tell us what you build using this technique, we would love to know.
Parse and transform values on the fly using WarpScript macros as they are pushed to your Warp 10 instance. Discover this feature via protobuf serializ...
It is time go back to the basis. Extract data within a geographical area.
Explore UFO sightings datasets and find correlations with US military bases location.
Co-Founder & Chief Technology Officer