Discover the essential criteria to take into consideration before choosing a Time Series Database.
What is a Time Series?
To understand the value of a time series database, you need to know that time series data differ from more classical data by several aspects:
- The rate at which they are produced. Most of the time, it is machines rather than humans that produce time series data. So data points are added at high frequency (up to several kHz for some industrial applications) and without interruption.
- The total volume to manage. It is often necessary to retain for several years the produced data, for legal reasons but also for example to allow the training of predictive models.
- The analyses performed on time series data go beyond simple aggregations and summary statistics, to be closer to signal processing.
A specific database to manage Time Series data
Time series databases were created to address those specific issues, offering high ingestion performance, scalability, and advanced analytics capabilities.
But how to compare different solutions to find out which one will meet your needs?
Will you choose an Open Source or a proprietary one? Can you trust online rankings or benchmarks put together by third parties?
Just because a solution is popular doesn’t mean it’s the one for you. Some elements that characterize a solution may be important to some but not to you, and vice versa.
|Read more about when do you need a time series database.|
What criteria should you consider for choosing a Time Series Database?
So to help you make the best possible choice based on your needs, we’ve listed 12 essential criteria that you should consider.
In the field of time series databases, there are two types of data models. The first comes from the world of relational bases. It is centered around the notion of table, i.e. a set of named and typed columns. The advantage of this model is that it is easily understandable. The second model, the most frequent in time series solutions, is based on metadata associated with each series.
These metadata are very often sets of key/value pairs allowing context to be associated with series. Depending on the solutions, these metadata bear the name of class, metric, labels, tags, or attributes.
This second type of modeling requires initial adaptation. But it offers much greater flexibility. Indeed, it is possible to add time series with new metadata sets without having to alter an existing table model or add new tables. If your uses are related to IoT and industry, you should favor this model.
Time series data frequently relate to sensitive equipment or equipment belonging to persons whose privacy must be protected. As such, it is therefore important that access to the solution under consideration benefits from access control and authorization mechanisms making it possible to limit the scope (series, time ranges) accessible to a user.
So you should check that the implementation of these mechanisms does not degrade the performance of the solution. Often the results announced in benchmarks correspond to a situation where these control mechanisms are deactivated.
Another security feature to study is the possibility for the studied solution to encrypt the data and/or metadata. Again, you will have to evaluate the impact on performance.
Unlike conventional databases, time series databases often offer more advanced data manipulation functions.
It is important to assess the extent of these analysis possibilities. This allows to determine which gaps you will need to fill by using third-party components, or even the development of a dedicated application. Ideally, the functions offered by the candidate solution as well as the possibilities of extending it should be sufficient to cover all the functional requirements.Just because a time series database is popular doesn't mean it's the one for you. Click To Tweet
It is also necessary to pay attention to the possible too strong specialization of the analysis environment of a solution. You might encounter difficulties to transpose it to a use case other than that for which it was initially designed. Pay attention to analysis environments offering a syntax close to SQL. This can be limiting for more advanced analytical needs than simple aggregations.
If we ignore the use cases related to real-time monitoring of systems, the vast majority of uses of time series solutions require at one time or another integration with existing components, proprietary systems, frameworks of batch data processing to create datasets that you could use to train models, flow processing systems to allow on-the-fly analysis or third-party data sources.
The ability of the chosen solution to integrate in this way will limit the additional development efforts to make. And it will lower the overall cost of ownership of the solution.
We generally choose a solution with a goal to use it in production. So it is important to take into account the operational characteristics when choosing. The reference architectures of each solution under consideration will therefore be studied to determine the possible deployment modes, the administration and update procedures, and the possibilities of migration from one mode to another.
You should know the redundancy and the sharding of the solution in order to be able to better anticipate the ramp-up.
And you must carefully look at the backup and restore procedures in order to plan the operation of the solution over a long period of time.
Test with your data. Simulate a real read/write load, and try every candidate. All the solutions will work with 10 000 datapoints, but not every solution will scale up to billions with a long history. Just simulate fake data for the use case you plan to have in a few years and do some real performance tests.
Test to write far in the past, test data overwrite, test with years of history. If you are in IoT, simulate network problems, offline devices, and other real-life issues. Scaling must be anticipated.
If it does not scale as expected, ask for help from the companies behind the databases. Your data model may not fit every database, or you may have configuration problems.
Find all of the 12 criteria to take into consideration in the White Paper « Guide for choosing a Time Series Database ».
In this guide you will also find:
- Anti-pattern to avoid when implementing a time series based solution
- Should you use DB-Engine to choose your time series database?
- A table to fill to compare features of the solution you are considering.
|The white paper can be downloaded here for free (🇺🇸 or 🇫🇷).|
Discover FLoWS the functional lineage of WarpScript. FLoWS brings enormous value during the first steps with Warp 10. It makes the more than a thousand WarpLib functions available without having to adapt to a syntax new to many.
Les séries temporelles offrent une nouvelle approche qui peut simplifier la gestion des données de santé, laquelle résulte aujourd'hui d'un empilement historique de strates et de silos.