Learn what a Time Series Database is and why you should seriously consider adding one to your technology stack.
DB-Engines repeats it every month, Time Series databases is the fastest growing category in the database market. But what exactly is a Time Series Database?
A number of Open Source projects, but also companies have come to light in the recent years with technologies collectively called Time Series Databases. Is this simply a trend, a new name for existing databases, or is there really something special about those data?
Categories of databases
Databases fall into broad categories: relational, document, column stores, graph, and now Time Series.
With some effort, we can use any database engine for solving any problem. The challenge is how you model your data and how much pain you are ready to endure when it comes to performance. Because regardless of how generic a database engine pretends to be, there are always trade-offs which were made. And some data access patterns, whether read or write, are a better fit than others for a given engine.
This is what gave birth to document stores. The need to access complex structures made of different fields. They would have required multiple JOIN operations in a relational database. The document oriented databases basically removed the needs for those costly JOINs. So much that document stores are the most frequent type of NoSQL databases.
The graph databases were also created because of specific access patterns. They introduced the notion of relationship and the possibility to query on those. There again, using other types of databases for this purpose could do the trick. But you would have to tweak the model and the query language quite a bit.
Time Series databases follow the same path
Those databases were created to address the challenges created by sensors. They push data around the clock, possibly at very high frequency and from an ever growing number of devices. Existing database technology were used at first, and they worked well up to a certain level.
This level could seem very high when it comes to data generated by humans. But it is very rapidly reached when dealing with machine data. Very soon the sole process of creating a time based index would overwhelm the servers. This index is mandatory when dealing with time series. It would also bring them to a performance level too low for any purpose.
These are the findings that led multiple teams to develop solutions taylored for time series data
The famous time series databases, which do not suffer from neither the flow, the number of series nor the size of the historical datasets they can deal with. The purpose of time series databases is to deal magnificently with data indexed by time that will rarely (if ever) be updated.
As time series databases matured, their query capabilities evolved from simple query languages such as SQL or SQL like to more complete data flow languages such as the recent Flux or the more advanced WarpScript. The purpose of those languages is to enable you to perform complex analytics as close as possible to where the data is. That is why they are embedded in the time series database engines.
|Learn more: When do you need a Time Series Database?|
A unique challenge
Graph databases were created to solve graph related problems which were hard to deal with using traditional databases. In the same way time series databases were created to address the unique challenges of time series data.
But if you look closely, apart from some solutions which were built from scratch, time series databases were built on top of existing technologies, whether relational or column oriented.
So when you conclude that you will store time series data in Cassandra, PostgreSQL, MongoDB, HBase or Accumulo, think about this decision for a second. Try to understand why time series databases were built. If you don't, you will quickly find yourself rewriting yet another time series database of your own.
Because out of the box, the technology you have chosen cannot efficiently handle time series data. To get an idea of what you will have to do, view this video by Cisco. You will discover how they had to tweak MongoDB for time series data.
You can have a bias or preference towards one of those database technologies I just mentioned. Maybe your ops team is already familiar with one. But you need to do a little more homework. You need to identify which time series database was built on the technology you chose rather than think you can do without a TSDB.
Which Time Series Database?
So if your heart beats for Cassandra, have a look at KairosDB. If HBase is your backend of choice, look at OpenTSDB or Warp 10. Accumulo has your favors? Try Timely. If PostgreSQL is your ideal candidate, have a look at TimeScaleDB, if Riak makes you tick, look at DalmatinerDB. Or look at a time series database built from scratch, such as InfluxDB or kdb+. If you want a hosted solution, there are also plenty of options, from Microsoft Time Series Insights to the recent Amazon Timestream.
Remember that if you do not seriously consider selecting a Time Series Database to add to your technology stack, you will ultimately end up writing your own. And this is probably neither what you intended to do in the first place nor what you really want to do!
|Read more about how AIM45 collects and analyzes data from ocean races.|
Logs, all those memories could be lost in time, like tears in the rain. Learn how Warp 10 and its UDP Plugin and WarpFleet Resolver can be used to collect, store and analyze log messages in GELF format.
This is a very detailed HowTo that covers maven/bintray publication, a necessary step to publish Warp 10 extensions.