Migration of 38TB of data from a Warp 10 server to another. How can you do this? Is it more complex than migration of a 38TB SQL database?
The migration of a database from a server to a new one can be painful.
Recently, this article describing how to migrate a 40 TB SQL database to another SQL database caught my attention.
It took the author huge efforts to migrate what looks like time series (HAProxy traffic logs). Using Warp 10 in the same situation, migration would have been a lot simpler.
For years, you are storing data on a Warp 10 standalone version. You reached the limit of the disk space, there is just a few TB remaining on 40 TB. You do not want to delete anything. For this application, you just have one token.
host: oldserver Warp 10 port: 8080 read token: OldReadToken
You bought 10 new servers, and you decided to jump to Warp 10 distributed version. Everything is ready for production.
ingress host: newserver Warp 10 port: 8080 write token: NewWriteToken
1/ Redirect your traffic log output to the new server.
2/ If there are not on the same network, set a tunnel between the old server and the new one. The most bandwidth the better.
3/ Launch curl:
curl -H "X-Warp10-Token: OldReadToken" \ "http://oldserver:8080/api/v0/fetch?start=1970-01-01T00%3A00%3A00.000Z&stop=2020-11-16T00%3A00%3A00.000Z&selector=~.*%7B%7D&format=text&dedup=false" \ | curl -T - -H "Transfer-Encoding: chunked" -H "X-Warp10-Token: NewWriteToken" \ "http://newserver:8080/api/v0/update"
Basically, read all from 1970 to now, and push it to the new server.
You do not need any extra disk space. You do not need a lot of RAM either, thanks to chunking.
During my tests between two old computers on my local network, with 7 GB in LevelDB directory, it took 50 minutes to output 36 GB, at a 12 MB/s speed. Wi-Fi was not the limit.
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 36.3G 0 36.3G 0 0 12.2M 0 --:--:-- 0:50:33 --:--:-- 11.2M
On a gigabit network, the limit may be your old server output. Depending on your hardware, you will surely reach 50 MB/s.
|Small test||Serious one|
|Time||50 minutes||~45 days|
In my test dataset, there is a lot of Geo Time Series, with location and elevation. Your data may be denser, and it will take less than 45 days with just a curl command. You may cut time in half and use two threads to do the migration. With spinning disks, more thread will increase seeking time anyway.
Now, read the article that retains my attention… 40 TB SQL database to move: 11-month project. Tons of specific scripts. Lots of development hours. 8437 words in the blog article. 40 min estimated time to read the article!
Q: I have several tokens on my database!
A: Assuming you also recreated tokens on your new server, iterate on them in a curl. If your new server shares the same secrets, the tokens will be the same, and migration will be easier. If you are familiar with tokens, you can create some super read/write tokens too.
Q: I want to migrate Warp 10 standalone to another Warp 10 standalone, not to a distributed one!
A: Copy all Warp 10 files from the old server to the new one using Rsync. Then stop the old server. Run rsync again (it should take just a few minutes). Start Warp 10 on the new server. Enjoy.
Q: I don't want any service interruption!
A: Contact us. There are some solutions.
Q: Bandwidth is an issue for me. Can't we compress text?
A: Sure you can! Pipe output to GZIP before piping to the second curl, and add the correct content-type header to the second curl:
-H "Content-Type: application/gzip".
I feel your disappointment. Just a little 600 words article. There is not even a complex WarpScript or a huge shell script to explain. I am sorry. The migration of Warp 10 to Warp 10 is just too simple.
Use Warp 10 for tests and qualification of "compatible" laptop batteries.
Use WarpScript™ to Simplify Time Series analytics. Robusts solutions exists to work with Time Series. Do not reinvent the wheel!
Comparing Flux and Warp 10 WarpScript on simple analytics use case. Keep in mind WarpScript was designed for timeseries analytics !
Electronics engineer, fond of computer science, embedded solution developer.