Deploying resource intensive analytics services

When deploying analytics services we can tumble into resource allocation issues. In this article we cover some Warp 10 built-in mechanisms that solve them.

Deploying resource intensive analytics services

With Warp 10, deploying a service based on time series analytics is so simple. All we need is to encapsulate a program in a macro and put it on a findable path under the server’s macro folder root.

The service can be anything that can be executed inside a WarpScript macro – that is, anything that is computable. Indeed, Warp 10 has the ability to call any system subprogram using the CALL function.

Then, to call the deployed service, it’s just a matter of hitting the WarpScript (or FLoWS) endpoint, for example using WarpStudio – or with any HTTP client, with the macro execution:

[ someArguments ... ] @path/to/my/macro

Fine then. For most macros, there is nothing more to do. It is deployed and ready to use. Some are even scheduled to run automatically if put in the right folder.

Automatic resource allocation

The matter is about macros that are resource-intensive, the ones that can be very time-consuming or require lots of CPU / GPU power or RAM usage. In a multi-tenant environment, being able to administrate the available resources is crucial, or else a few resource-hungry threads can take a server hostage.

Hopefully, in most situations, for example, if a macro is based only on functions of the WarpLib core, then Warp 10 will already take care of that management. Configuration parameters set limits within a WarpScript execution to constrain it within bounds deemed reasonable.

Read more about macros in Warp 10

Fine-tuned resource allocation

Sometimes, it could be desirable to be able to deploy services that are resource-intensive. For example, the concurrent EVAL extension allows implementing multi-threaded scripts; and another example is the CALL function that allows to execute any subprogram.

The default resource allocation mechanism is based on a queue. For example, the CALL function asks a parameter for defining a maximum number of threads that can be launched simultaneously by the subprogram. When a user is trying to call a service implemented with CALL, if that maximum capacity is reached, then the execution is put on wait in a queue and will resume when available.

It is also worth to point out that the macro’s execution time can be boxed with the TIMEBOX function. And the maximum runtime can be administrated on a per-user basis using token’s capabilities.

Singleton execution

Queuing up executions can be a good solution. But sometimes it is preferable to raise an exception to notify that the service is busy. For this purpose, we can design a singleton execution pattern, by raising an error if the deployed macro is already running.

To do that, we will use the MUTEX function from the SHaredMemory extension. With MUTEX, we will share a variable across WarpScript executions that will tell if the macro is already being executed or not.

Here is an example of a deployed macro with singleton execution:

// // With a shared variable we can constrain to singleton use of the deployed macro // <% // // Try to hold a lock onto the isAlive variable or raise an exception // <% 'isAlive' SHMDEFINED <% 'The macro is already executing right now' MSGFAIL %> IFT true ‘isAlive’ SHMSTORE %> ‘my.mutex’ MUTEX // // Some hungry macro // <% ->PICKLE ->HEX ‘yummy-gpu.py’ CALL %> // // Always release the lock at this point // The TRY-finally block ensures that, even if the macro threw an error // <% %> <% <% NULL 'isAlive' SHMSTORE %> ‘my.mutex’ MUTEX RETHROW %> TRY %>

This pattern will ensure that this macro can be run only one at a time.

Execution tickets

A singleton execution can be limiting. What if we want to limit to a certain maximum of executions, and raise an error if that capacity is already met? In fact, the implementation is almost the same as in the singleton execution case, except that instead of checking for the existence of a shared variable, we can keep track of a counter. This can be seen as a number of available execution tickets.

Here is what adapting the previous code gives:

// // We keep track of a pool of 8 execution tickets // <% <% // // We define the maximum number of concurrent runs to allow // here we choose 8 // 'tickets' SHMDEFINED NOT <% 8 'tickets' SHMSTORE %> IFT // // We check if a ticket is available. If not, we raise an exception // ‘tickets’ SHMLOAD 0 > <% 'No more available running slot. Wait for a run to finish.' MSGFAIL %> IFT // // We reserve a ticket // ‘tickets’ SHMLOAD 1 – ‘tickets’ SHMSTORE %> ‘my.mutex’ MUTEX // // Some hungry macro // <% ->PICKLE ->HEX ‘yummy-gpu.py’ CALL %> // // Release the ticket and rethrow an error eventually // <% %> <% <% 'tickets' SHMLOAD 1 + 'tickets' SHMSTORE %> ‘my.mutex’ MUTEX RETHROW %> TRY %>

Hiding the resource intensive part

In the previous examples, the hungry macros are explicitly defined in the same script. But if they were instead called with the syntax @path/to/my/macro. Then we should make sure that it is not possible to guess the path (for example with a randomly generated subfolder name), or else it would be possible for a user of the service provided by that macro to call it directly without the encapsulating mechanism we just discussed.

By the way, the same applies for the executable file called by CALL, and in the case of a function (for example if the hungry component is a function from an extension), one can use the shadow extension.

Conclusion

Warp 10 has many built-in features to support a multi-tenant environment. Some features are automatic, while some need some fine-tuning, as we cover some in this article.

In many ways, Warp 10 is much more than a time series database. The matter of this blog post is just another example.