Working with GEOSHAPEs: code contest results

Discover the results of the GEOSHAPE code contest about a car’s GPS track on Route 66. Step by step, explore the data and see how to find the answers.

Working with GEOSHAPEs: code contest results.

Following Mathias’ article about GEOSHAPES, I wrote a little code contest to practice. Here is the result.

Questions to solve

Given the Route 66 geoshape and the GTS from the datalogger:

  • How many kilometers did this car on the Route 66?
  • Its fuel consumption approximation is 8 (liters/100km) × (speed (km/h) / 80) +1. One liter of fuel releases 2392g of CO2. How much CO2 did the car release while driving on the Route 66?

Step 1: explore data

It seems obvious, but I met people that try to manipulate data without having a glance first. This step is very important: look at data interval, extrapolate a rough result from what you can guess. The answer to the first question cannot be 2000 kilometers, for example

In the contest, I provided a WarpStudio snapshot (a link to a WarpScript). You can click on it, then click on dataviz, and tick the map view.

dataviz map
Looking at the map scale, there is around 300km, maximum.

You can see that there is some weird things in the input data: sometimes, the car is on the road, sometimes it is on a parallel road that is not part of the given geoshape. Fine. You have to do with that, do not try to fix input data.

You also see that it is not a one-way travel. Where does the drive start? You can SHRINK the input to keep only the first 20 ticks for example:

track starts somewhere in Williams, go east to Flagstaff, then back to west

What is the frequency of the data? To find out the delta between each tick, you can use mapper.tick and, then convert it to seconds.

In the code, I chain 3 MAP operations:

  • replace values by the tick, for each tick
  • compute delta between two values (note the window size)
  • divide everything by STU (my Warp 10 is configured in a microsecond, so STU returns 1 000 000)

Interval in second between each tick

Interesting result: there are some pauses in the GPS record. Around 1 hour of pause sometimes. In my mind, as a WarpScript user, I think “TIMESPLIT will be useful”.

If I remove values greater than 100 seconds by adding another mapper :

I can see the GPS datalogger records position every 10sec or so, with a worst-case of 50sec without any data.

So, to sum up:

  • The car often leaves the road. The input data are not perfect, the road sometimes splits into different lanes which are not covered by the geoshape. Anyway.
  • The system records one datapoint every 10s, sometimes with a 50s lag.
  • The driver took a few naps in the meantime.

Step 2: keep data ON the road

This is a very straightforward task:

  • Keep datapoints that are within the geoshape.
  • Split in multiple continuous gts (continuous = less than 1 minute between two timestamps)
  • Keep splits with more than one datapoint

Here I used MOTIONSPLIT, that can do much more than TIMESPLIT. But TIMESPLIT does the same job here.

There are three big relevant splits, and a few others that sometimes match the road

The input data is far from perfection, but anyway, I keep all 7 splits.

Step 3: kilometers on the road

Between each datapoint, we can compute the number of meters. Then sum all these little distances to get the total number of kilometers traveled.

  • mapper.hdist does the hard part of the job: compute precisely the number of meters between each datapoint. The distance between each datapoint is now the value for every output GTS.
  • MERGE take the list of “distance gts” and recreate only one GTS
  • bucketizer.sum, used with only one bucket, does the job to sum every value.

The output is a GTS named “totalDistanceTraveled”, with only one datapoint that contains the total distance traveled on the route 66: 79.8km

Alternative solution: As Nicolas did, you can also use a mapper with a huge window (MAXLONG for pre and post parameter):

This is perfectly correct and will give you the same result. As a mapper is an aggregator, you can also use it in a BUCKETIZE operation, this will also give you the same result.

Step 4: CO2 emissions

The emissions are directly linked to the vehicle speed. So the first step is to compute the instantaneous speed. Thanks to WarpLib, there is already a mapper to do this:

  • mapper.hspeed compute the speed between each datapoint (note the mapper window is 2, with pre = 1 and post = 0). The value of each datapoint became the instantaneous speed (in SI units, so m/s)
  • The second line just multiplies every datapoint by 3.6 to get km/h units.

If you display the output, you have the speed for all the input splits:

(note: the driver respects the speed limit)

The upper code is not perfect: the first datapoint is 0 km/h, because the mapper window width is only one datapoint on the first datapoint of each splits. (This could be fixed with STRICTMAPPER)

Data exploration: the speed is around 120 km/h. Not a big surprise on this road.

Up to the formula given, consumption is roughly 13 liter/100km. There are around 80km, so the total amount of fuel burned should be around 13×0.8 = 10.4 liters.

At this point, we have the distances $distances and the speed $speedKmh for each datapoint.

For each datapoint, we can apply the given formula to compute the number of liters burned between each tick.

For each tick, I will compute the amount of liters burned by the engine, then sum all these data. Just the way I did for the total distance, with bucketizer.sum.

The output is a GTS named “totalLiters”, with only one datapoint that contains the total amount of fuel burned on the route 66: 10.05 liters. (=24kg of CO2)

The full code is available as a playable snapshot here. You can insert STOP where you want, or try to improve it!

And the winner is…