Any resemblance to situations in any company is purely coincidental.
Time series? What are we talking about?
Every kind of data which is function of the time:
- Sensor data are time series.
- A bank account balance is a time series.
- Position of a moving object is a time series.
- A security portal which reads a user id produces a time series.
- Every metric you collect from a live system is a time series.
First time I hear about a database for that. Who invented this kind of database?
One of the first industry which needed to store and analyze lots of timeseries was the computing industry. Imagine you monitor CPU load, hard disk free space, network load, and a few other metrics. You collect these 10 metrics every second, for a datacenter. There are 100 000 servers in the datacenter. It means you have to analyze one million datapoints per second. You need a tool to handle this.
I’m working in an automotive company, we store our time series in files on a shared disk. One million datapoints per second is common during our engineering validations. Why would I need a database?
Did you ever try to analyze all the data you gathered for one type of vehicle, in a given country? It would be great to find the vehicles which have this connectivity problem on the coolant temperature sensor. Please, find all the discontinuities in this sensor data, and tell me which vehicle is the worst.
Well, of course I have to open every file, make a program to do this. Matlab can handle that. I can do it in a day or so.
Now, find all the acquisitions of vehicles which where on the Paris Ring Road with a speed over 60 kph during more than 3 minutes. Then, find where in the world the engine coolant temperature of your latest 2.0 liter gasoline engine rises over 112°C during more than 30 seconds.
The engine type is not easily accessible. But I may put it in the filename for example. I have to cross with another database. For tooling, I know Python has some functions to do this. Data scientists use this, I am going to ask them a few tweaks.
You just said matlab or python. It means you are able to read every kind of file produced by people in your company? Inca files ? Canalyser files?
Well, there is some way to convert them into CSV files I could read with matlab or python. The only problem is the size of CSV, I need a few hundred of free GBs. And this has to be done manually by someone who has the license. This task will take ages.
I didn’t even ask you to do this on all the data produced by all the people in your company…
It won’t be possible. It is very hard to exchange data from one tool used by one service with another tool in another service. To save time, we sometimes put two different acquisition systems in a vehicle. But it would be very great to be able to do that in the future!
To do so… You need a time series database. Imagine your future:
- Everyone in your company pushes its acquisition in the same database, together with the car serial number, and maybe a mission name.
- The database index is connected to your central serial number database. Selecting data by engine type or car color is easy.
- You select all the engine coolant temperature of every car between two dates in one line.
- You run your analysis program. Looking elsewhere in the database is a one line change.
- If you need to industrialize the data processing, and run it on several machines, you do not need to rewrite the analysis program.
Warp 10™ is able to fulfill all these requirements. WarpScript™ is our analysis language. It provides you with over 900 functions to handle time series. It provides outliers and pattern detection, advanced geographic functions, and all the statistical and mathematical tools you need. Warp 10™ and WarpScript™ can easily scale up from one machine to a cluster. And you can also use WarpScript™ directly from within Python code so your data scientists can also benefit from its features.
Electronics engineer, fond of computer science, embedded solution developer.