Deploying a Distributed Warp 10 instance

The Distributed version of Warp 10 coordinates multiple processes on multiple servers. This article explains the deployment process.

Deploying a Distributed Warp 10 instance

Introduction

With its 3.0 release, the Warp 10 platform comes in multiple flavors. Most projects can start with the standalone version and then migrate to a standalone+ deployment when usage grows. But when outgrowing what a standalone+ instance can handle, you need to come to play in the big league and deploy a distributed instance of Warp 10.

This blog post will show you how this can be done and will give you hints at how you can scale a distributed.

Preparing the installation

The distributed version of Warp 10 is designed to scale to millions of writes and reads per second, billions of series, and trillions of data points. To achieve such a high level of performance it relies on external components, namely Kafka for message passing between the distributed elements and FoundationDB for data persistence.

Deploying a distributed Warp 10 instance therefore requires that those external components be available whether as managed services or as tailor deployments.

ZooKeeper deployment

The first external component that is required is a ZooKeeper ensemble. This is needed to synchronize the Kafka brokers and for the registration of Warp 10 Directory instances.

Deployment of ZooKeeper is rather straightforward. For testing purposes, a single ZooKeeper server can be set up. And for production environments, it is recommended to run at least three servers to ensure fault tolerance.

For the rest of this article, we will assume that your ZooKeeper ensemble is available at zka.zkb.zkc.zkd:2181.

Unless you have specific requirements, we advise you to select the latest stable release of ZooKeeper, 3.7.1 at time of writing.

Kafka deployment

The next component which is needed is a Kafka cluster. The role of Kafka is to act as a buffer between the various distributed daemons of Warp 10.

As for ZooKeeper, a single broker can be deployed for testing purposes but a multi-broker infrastructure is highly recommended for production environments.

Warp 10 3.0 embeds version 3.x of the Kafka client so a broker in any compatible version will do. We recommend you use a recent stable version of Kafka to avoid potential bugs. Version 3.5.0 is known to work.

Configure Kafka to use the ZooKeeper ensemble you just deployed, and add the following line to the server.properties file:

zookeeper.connect = zka.zkb.zkc.zkd:2181

Kafka topics

There are a few important configurations to take into account. The first one is that the log cleanup policy for all Warp 10 topics should be set to delete and not to compact, this is to ensure the daemons see all messages.

The second important one is to define a number of partitions coherent with the level of parallelism you plan to achieve. Topic consumers cannot outnumber the topic partition count, so plan accordingly.

With that in mind, you need to create the following topics for Warp 10:

data
metadata
runner
throttling

We recommend you use a minimum of 16 partitions per topic to ensure you can scale your instance when needed without having to repartition the topics.

The Kafka cluster is assumed to be reachable at ka.kb.kc.kd:9092 for the rest of this article.

FoundationDB

As for standalone+ instances, distributed Warp 10 deployments use FoundationDB as their persistence layer.

In order to deploy a FoundationDB cluster you need to install the foundationdb-clients and foundationdb-server packages from a FoundationDB release on the machines of the cluster. Warp 10 3.0 supports FoundationDB 7.1, so choose a release 7.1.x from the release page. At the time of writing, release 7.1.33 has an issue with the packages provided for Ubuntu, so you may pick 7.1.31 instead to spare you some headache.

The foundationdb-server package should be deployed on all nodes running FoundationDB and the foundationdb-clients one on nodes running FoundationDB and the nodes where Warp 10 daemons are running.

The configuration of FoundationDB as a cluster of multiple machines is not covered in this article. We assume you have a working deployment of FoundationDB at this point.

Initialize the database

Once your FoundationDB instance is up and running, you need to initialize the database which will store the data for Warp 10. This is done by issuing the following statement in the fdbcli utility:

configure new single ssd-redwood-1-experimental

The recommended storage engine for Warp 10 is ssd-redwood-1-experimental as it provides prefix compression which lowers the storage footprint of data. Note that this engine is no longer considered experimental despite experimental being part of its name, this suffix should be dropped in the next release as Snowflake has now tested it extensively.

Of course, if your FoundationDB deployment supports data replication you should replace single with whatever replication level you want to use.

If you intend to use the FoundationDB for storing data from multiple Warp 10 instances or if you want to be able to assess the storage consumption of various users of your distributed deployment you need to configure the FoundationDB cluster to require the use of tenants. This is done by issuing the following command in fdbcli:

configure tenant_mode=required_experimental

Deploying a Warp 10 distributed instance

The easiest way to deploy a distributed instance is to first validated that everything works with a single node assuming all the roles.

We assume Warp 10 has been unarchived in WARP10_HOME. Initialize a distributed deployment by issuing:

WARP10_HOME/bin/warp10.sh init distributed

You then need to copy the fdb.cluster from any of your FoundationDB cluster node (located in /etc/foundationdb/fdb.cluster) into the WARP10_HOME/etc directory and to add a few lines to the Warp 10 config file WARP10_HOME/etc/conf.d/99-init.conf:

### Name the Warp 10 instance
warp10.instance = distributed-test

### Define how to reach ZooKeeper
zk.quorum = zka.zkb.zkc.zkd:2181

### Define how to reach Kafka
kafka.bootstrap.servers = ka.kb.kc.kd:9092

### Declare the components to run
components = ingress,directory,store,egress,runner

### Declare the tenants to use for FoundationDB
directory.fdb.tenant = distributed-test
store.fdb.tenant = ${directory.fdb.tenant}
egress.fdb.tenant = ${store.fdb.tenant}

Note that tenants distributed-test must be created on FoundationDB using fdbcli:

createtenant distributed-test

You can now launch the Warp 10 instance using:

WARP10_HOME/bin/warp10.sh start

Testing your instance

Now that your instance is running, you can test it. For that, you will need read and write tokens. Demo tokens can easily be generated using the following command:

WARP10_HOME/bin/warp10.sh tokengen WARP10_HOME/tokens/demo-tokengen.mc2 

The output of this command will be similar to:

[{
  "ident" : "ffbc7d148be0b36c",
  "id" : "DemoWriteToken",
  "token" : "....."
},{
  "ident" : "47c32ff41cd66fa9",
  "id" : "DemoReadToken",
  "token" : "....."
}]

The read and write tokens are the values associated with the key token in both JSON maps. We will refer to those tokens as READ and WRITE.

To test writing data to your Warp 10 instance, issue the following command:

curl -H 'X-Warp10-Token: WRITE` --data-binary '0// test{} 42' http://127.0.0.1:8080/api/v0/update

And to read the data point you just wrote, run:

curl -g 'http://127.0.0.1:8080/api/v0/fetch?token=READ&selector=test{}&now=0&count=1'

If you successfully saw the value 42 at timestamp 0 of GTS test{} your deployment of a Warp 10 distributed instance was successful, congratulations!

Scaling the instance

With a single daemon assuming all roles your distributed Warp 10 instance is not yet really distributed and will therefore be limited in terms of performance.

In order to increase performance, you will need to launch multiple instances of some of the roles. For a start, you can launch two instances of the ingress, directory, store and egress components.

Launching multiple instances of roles requires that some Kafka group ids be modified in the configuration.

Configuration key Component Comment
store.kafka.data.groupid store Use the same group id for all instances
directory.kafka.metadata.groupid directory Use a separate group id for each instance (unless setting up shards, in which case one per shard)
ingress.kafka.metadata.groupid ingress Use a different group id per instance (if used)
ingress.kafka.throttling.groupid ingress Use a different group id per instance (if used)

For egress instances, parallel scanners can be enabled, see the recommended configuration for the standalone+ version.

There are many more configurations you can tweak to scale a distributed Warp 10 instance, but those changes are usually needed only when your traffic gets quite intense.

Conclusion

This article walked you through the process of deploying a distributed Warp 10 instance. As part of the 3.0 release we tried to simplify the process as much as possible, but any distributed infrastructure has some inherent complexity.

In case you need help you can join the Warp 10 community on the Lounge or reach out to SenX to work with our professional services team.