Sharing engineering data across a big company is a big challenge. Pushing your data in a time series database is an elegant solution!
Any resemblance to situations in any company is purely coincidental.
Time series? What are we talking about?
Every kind of data which is function of the time:
- Sensor data are time series.
- A bank account balance is a time series.
- The position of a moving object is a time series.
- A security portal which reads a user-id produces a time series.
- Every metric you collect from a live system is a time series.
|Read more about what is a time series database.|
First time I hear about a database for that. Who invented this kind of database?
One of the first industry which needed to store and analyze lots of time series was the computing industry. Imagine you monitor CPU load, hard disk free space, network load, and a few other metrics. You collect these 10 metrics every second, for a datacenter. There are 100 000 servers in the datacenter. It means you have to analyze one million datapoints per second. You need a tool to handle this.
Why would I need a database?
I’m working in an automotive company, we store our time series in files on a shared disk. One million datapoints per second is common during our engineering validations. Why would I need a database?
Did you ever try to analyze all the data you gathered for one type of vehicle, in a given country? It would be great to find the vehicles which have this connectivity problem on the coolant temperature sensor. Please, find all the discontinuities in this sensor data, and tell me which vehicle is the worst.
Sharing engineering data across a big company is a big challenge. Pushing your data in a time series database is an elegant solution! Click To Tweet
Well, of course I have to open every file, make a program to do this. Matlab can handle that. I can do it in a day or so.
Now, find all the acquisitions of vehicles that were on the Paris Ring Road with a speed over 60 kph for more than 3 minutes. Then, find where in the world the engine coolant temperature of your latest 2.0-liter gasoline engine rises over 112°C for more than 30 seconds.
The engine type is not easily accessible. But I may put it in the filename for example. I have to cross with another database. For tooling, I know Python has some functions to do this. Data scientists use this, I am going to ask them a few tweaks.
What about CSV files?
You just said Matlab or Python. It means you are able to read every kind of file produced by people in your company? Inca files? Canalyser files?
Well, there is some way to convert them into CSV files I could read with Matlab or Python. The only problem is the size of CSV, I need a few hundred free GBs. And this has to be done manually by someone who has the license. This task will take ages.
I didn’t even ask you to do this on all the data produced by all the people in your company…
It won’t be possible. It is very hard to exchange data from one tool used by one service with another tool in another service. To save time, we sometimes put two different acquisition systems in a vehicle. But it would be very great to be able to do that in the future!
To do so… You need a time series database
Imagine your future:
- Everyone in your company pushes its acquisition in the same database, together with the car serial number, and maybe a mission name.
- The database index is connected to your central serial number database. Selecting data by engine type or car color is easy.
- You select all the engine coolant temperature of every car between two dates in one line.
- You run your analysis program. Looking elsewhere in the database is a one-line change.
- If you need to industrialize the data processing and run it on several machines, you do not need to rewrite the analysis program.
|Do you work in industry? This article may interest you: Our vision of Industry 4.0|
It provides outliers and pattern detection, advanced geographic functions, and all the statistical and mathematical tools you need. Warp 10 and WarpScript can easily scale up from one machine to a cluster. And you can also use WarpScript directly from within Python code so your data scientists can also benefit from its features.
Seules les technologies de séries temporelles (à l'exemple de Warp 10) sont en mesure d’apporter des réponses pérennes et performantes aux questions relatives au traitement des données des objets connectés de santé.
During the Vendée Globe 2020, Team Malizia monitored its yacht and gave access to this data. We jumped on the occasion to make some nice visualizations and analysis.