An AIS data set for the 2022 Ocean Hackathon

As part of the 2022 Ocean Hackathon, SenX provides the contestants with an AIS data set covering more than 2 years and containing 500 billion data points.

An AIS data set for the 2022 Ocean Hackathon

The city of Brest is behind a now global event called Ocean Hackathon. During a long week-end, teams from around the world compete to deliver projects related to the maritime environment. To help the contestants, data providers make various data sets available during the event so the ideas can collide with reality.

AIS data are among the provided data sets, but during previous editions, teams found it challenging to make use of that data in such a short period of time. They reported having a hard time structuring the data into a way that makes it easy to query and integrate in whatever application they are building. So for this edition of Ocean Hackathon, we decided to provide access to a large AIS data set that is easy to query so teams can focus on their project and not worry about the underlying plumbing.

The data set

At SenX we operate a receiving station part of the AISHub network. As a station operator, we have access to the global data stream produced by all the stations. We store these data in a Warp 10 instance and periodically generate HFiles with monthly data.

We make available data covering the period spanning from February 2020 to November 2022, with the exception of the 2022 summer when we paused data collection momentarily. Those data represent around 30 billion ship observations extracted from AIS messages 1, 2, 3, 4, 5, 9, 18, 19, 21, 24, and 27. Each observation usually contains the following 12 fields:

  • timestamp
  • MMSI
  • latitude
  • longitude
  • message type
  • position accuracy
  • Special maneuver indicator
  • Navigation status
  • Rate of turn
  • Course on ground
  • Heading
  • Speed on ground

and for messages 5, 21, 24 and 27 the following additional fields:

  • IMO
  • dimensions as starboard/port/bow/stern delta from AIS station implantation onboard the ship
  • ETA
  • callsign
  • ship type
  • ship name
  • draught
  • destination

So in total, we provide around 500 billion pieces of information for 30 billion spottings of several hundred thousand AIS stations.

The data set has been indexed from February 2020 to March 2022, so it can be queried by MMSI, by geographic zone and by time range. Data outside this time range can only be queried by MMSI.

The storage footprint of this data set and its accompanying indices is less than 240 Gb, thanks to the efficiency of HFiles.

Querying the data set

We make the data set freely available to whoever wants to interact with it during the 2022 Ocean Hackathon, we only request that you explicitly cite SenX as the provider of the data with a link to either our website or this article.

In order to access the data you will need a token. You can request such a token by joining the Warp 10 Lounge on Slack and requesting it in the community channel with the id of your Ocean Hackathon project and a short description of what you intend to do with the data.

Once you have such a token, you can query the data set as described in the following sections. You will need basic knowledge of HTTP, JSON and a little of Warp 10's own query and analytics language WarpScript. Further information can be found on the Warp 10 site and this blog.

Access to the data is simplified by a set of macros which will be described below.

Identifying MMSIs

Access to the actual observations is done by MMSI, so one of the first tasks you need to perform is identifying the MMSI you are interested in.

This can be done via various ship registries or ship trackings services such as Vessel Finder or Marine Traffic, or you may use a macro we provide which allows you to identify MMSIs which were observed in a given geographic zone during a specific period.

Defining geographic zones

The geographic zones used within Warp 10 are specified using a format known as WKT. You can generate WKT via this online editor.

Zones are not limited to bounding boxes, they can be freely combined via set operations such as union, intersection and differences. You can learn more in this post.

The ->WKT function can produce WKT from various formats, so it may prove useful if you have your zone description in an alternate format.

Finding MMSIs spotted in a zone

The macro @aishub/fetch can be used to identify which MMSIs were observed in a zone. The macro is called by submitting the following script using the HTTP POST method to the URL

  'token' '@TOKEN@' // The token you were provided for Ocean Hackathon
  'wkt' 'POLYGON((-5.358985311200019 49.590998020558544,-6.490577108075019 49.032333481058025,-6.930030233075019 47.80010585901511,-6.732276326825019 47.22129625286147,-5.029395467450019 47.48180809464544,-5.413916951825019 48.78682847044892,-4.644873983075019 49.183366572735224,-5.358985311200019 49.590998020558544))'
  'end' '2021-07-14T23:59:59.999999Z' TOTIMESTAMP
  'start' '2021-07-14T00:00:00.000000Z' TOTIMESTAMP

You can test out the above code in our web IDE WarpStudio, simply insert your token instead of @TOKEN@.

The result is a JSON list containing the list of MMSIs which were observed in the specified area. Note that the area is approximated as a set of cells with a width of roughly 2.5 km, so you may be returned some MMSIs which were slightly out of the specified area.

Fetching observations

If instead of fetching MMSIs you would like to retrieve actual ship observations, simply add a data entry to the parameter map passed to aishub/fetch with the list of what you would like to retrieve. The possible values are pos for ship positions and info for ship information.

The following returns observations and information for all ships spotted in part of the Ushant traffic separation scheme:

  // The token you were provided for Ocean Hackathon
  'token' '....TOKEN....'

  // Ushant rail
  'wkt' 'POLYGON((-5.358985311200019 49.590998020558544,-6.490577108075019 49.032333481058025,-6.930030233075019 47.80010585901511,-6.732276326825019 47.22129625286147,-5.029395467450019 47.48180809464544,-5.413916951825019 48.78682847044892,-4.644873983075019 49.183366572735224,-5.358985311200019 49.590998020558544))'

  'end' '2021-07-14T23:59:59.999999Z' TOTIMESTAMP
  'start' '2021-07-14T00:00:00.000000Z' TOTIMESTAMP
  'data' [ 'pos' 'info' ]

Test this code using this snapshot.

If you explicitly specify a list of MMSIs under the mmsi key, only those will be considered. As an example, visualize all tracks from some SNSM lifeboats based in Brittany during the summer of 2021.

If wkt is specified, only the observations actually in the specified area will be returned.

The result is a list of Geo Time Series whose data points have the timestamp, latitude and longitude of the spottings. The associated values are either booleans set to true or a STRING with a JSON structure containing the ship information.

Fetching parameters

To extract individual parameters you need to call the @aishub/params macro after the call to @aishub/fetch. This will produce Geo Time Series for each available parameter:

GTS Class name Parameter Format Default value
sog Speed on Ground LONG in tenth of knots 1023
cog Course on Ground LONG in tenth of degrees 3600
rot Rate of Turn LONG, see ROT formula 128
hdg Heading LONG in degrees 511
id AIS message id (type) LONG N/A
navstatus Navigation Status LONG, see Table 7, Navigation Status 15
mi Special Maneuver Indicator LONG, see Table 8, Maneuver Indicator 0
posacc Position accuracy LONG, a value of 1 indicates a DGPS-quality fix with an accuracy of < 10ms. 0, the default, indicates an unaugmented GNSS fix with accuracy > 10m. 0

You can try this macro in this snapshot which extracts the parameters from all observations of the lifeboat Amiral Amman.

Converting Geo Time Series to a data frame

If you wish to post-process the data in tools such as Python or R which work on data frames, you can call the @senx/todf macro to convert a list of Geo Time Series into a list of rows. Each row will have the following columns:

  • class
  • labels
  • attributes
  • ts
  • latitude
  • longitude
  • elevation
  • l_value (value if type is LONG, null otherwise)
  • d_value (value if type is DOUBLE, null otherwise)
  • b_value (value if type is BOOLEAN, null otherwise)
  • s_value (value if type is STRING, null otherwise)
  • bin_value (value if type if BYTES, null otherwise)

You may then parse the JSON and convert it to the native dataframe format of your tool, for example, pandas dataframes.

When using Python, you may add a BOOLEAN parameter to the call to @senx/todf which, if true, will pickle the content prior to returning it. You may then unpickle it instead of parsing the JSON.

Experiment with this macro in this snapshot.


You now know how to interact with the AIS data set offered by SenX during the 2022 Ocean Hackathon. We hope this will help you complete your project.

If you have any queries regarding the efficiency of AIS or ADS-B data management, please reach out to us via email.