February 2024: Warp 10 release 3.2.0

Warp 10 3.2.0 has just been released, paving the way to WarpScript debugging & profiling, and mitigating a JDK 17 related issue.

February 2024: Warp 10 release 3.2.0

Warp 10 version 3.2.0 was released on February 29, 2024. This is an important release for multiple reasons even though the changes are barely visible from the outside.

This version brings several bug fixes for issues encountered in some specific cases. And it brings two other more significant changes which deserve their own section within this blog post.

Download version 3.2.0 here.

Bug fixes

For users of the distributed version of Warp 10 with multiple Directory shards, a situation encountered when you manage series by hundreds of millions or even by billions like the folks at Clever Cloud, this new release patches NullPointerExceptions which could be thrown when producing HFiles. When deleting data, the /api/v0/delete endpoint would not honor the activeAfter and quietAfter parameters when performing a dry run, this is now solved.

On the WarpScript side of things, the function UNBUCKETIZE.CALENDAR has been fixed to handle empty buckets correctly.

The configuration files of your Warp 10 instance are now read in the numeric order of the prefixes appearing before the first -, so 999-xxx.conf will be placed after 99-xxx.conf.

Paving the way for the Trace Plugin

Working with WarpScript code (any code, as a matter of fact) can prove cumbersome when your program does not behave as you intended. For those situations, tooling is important so you can better understand how your code is executed and identify the issue as fast as possible.

In 2019, we started working on a module for debugging WarpScript code. This work was put on hold on various occasions because we had higher priorities, but last year we decided it was high time this became a high priority too. And here we are, this tool, for now named the trace plugin, is almost finished.

It should be available by the end of Q1 2024 to our customers and will be deployed on the Warp 10 sandbox within the next two weeks.

The tool will allow you to do step-by-step execution of your WarpScript code, visualizing and interacting with the intermediate states. It will also allow you to profile your WarpScript code to identify where the time is spent and possibly help you optimize your code.

This feature is not directly present in this release but relies on internal changes, namely statement wrapping, which are introduced by version 3.2.0. This means that Warp 10 3.2.0 will be the lowest version supported by the trace plugin.

Stay tuned for exciting news about that great tool!

Staying sane when using JDK 17!

Lastly, release 3.2.0 adds a check to determine if Warp 10 is run using Java 17 and if so ensures that specific configuration is applied to mitigate a nasty issue that only happens with this version of Java.

Shortly after we rolled out Warp 10 3.0 beta0, we had a chat with a user who encountered surprising errors while starting Warp 10.

To better understand the context, Warp 10 assigns IDs to Geo Time Series based on a computation performed on their class name and set of labels. This computation is performed extensively at various locations within the code base and it is therefore very important that it be fast.

The most expensive part is the computation of the labels-related ID. You can read the code for method labelsId and see for yourself that this deals with UTF-8 decoding in a rather optimized way to avoid allocating too many objects in the process. This code dates back from the very beginning of Warp 10 and has changed only slightly over the years. This part of the code is considered very robust, being probably one of the most called methods within the code base and having been used in production without any issues.

The IDs that are computed for each GTS are attached to their metadata and used as part of the storage keys for both metadata and data persistence. For safety purposes, when reading the metadata from persistent storage at Directory startup, the IDs are recomputed and the values are compared with those persisted on disk. If differences are detected, error messages like this one are emitted with Warp 10 possibly aborting depending on its configuration.

Given the role of those IDs in the coherency of the data, you understand that their correct computation is very important. That is why the error that was reported by that early user of Warp 10 3.0 was considered very seriously. We immediately tried to understand where it was coming from.

Extensive testing we performed did not allow us to reproduce the error in a deterministic manner. The only cases when it appeared was when using version 17 of the JDK. This random nature of the occurrence of the issue led us to think it might be related to pseudo-random processes within the JVM, namely the GC (Garbage Collector), the code inlining, and the JIT (Just In Time compiler). Bumping the heap configuration did not solve the issue, so the JIT or code inlining were the last suspects we had on our list.

We did multiple runs with various settings for both the inlining and the JIT compiler, varying the threshold at which they would kick in. This seemed to confirm we were spot on. When inlining and JIT were not used, the problem never appeared! We ran other tests with a slightly modified version of the labelsId method code and realized that the issue was due to an incorrect write-back of data to a buffer. To put it in simpler terms, this issue led to computing the hash on unintended content. This was bad.

We read the code of the JDK extensively for the methods we were calling within the labelsId method and decided to add a call to a method in CharBuffer which did nothing in the case of our use of that class and… it made the issue go away! Without going into the details of how the JIT compiler works (our understanding of which is somehow limited) it seems the added useless call added a call site which made the JIT compiler behave differently.

This change seemed to solve the issue of our user and was included in the 3.0.0 release.

Then, we did not hear about that same issue for quite some time. But in late 2023, a similar issue was reported to us, again only related to JDK 17. That is when we introduced a configuration key to select another, slower, implementation of the labels ID computation. Setting the key (labelsid.slowimpl) to true solved the issue as the code which was triggering the JIT issue was no longer executed.

And that brings us to the 3.2.0 release. In order to avoid having users bumping into that JDK 17 related issue, we changed the behavior of Warp 10 so when run using the JDK 17, it requires the labelsid.slowimpl configuration to be set to true. We could have forced that configuration but decided that it was better to warn our users that the use of JDK 17 could trigger an issue we were aware of, and had a way to circumvent, but that there may be other occurrences no one had reported that could still exist.

So if you run Warp 10 with JDK 17, the launch will fail if you did not set labelsid.slowimpl to true in the configuration.

Takeaways

Our final recommendation, in the face of such a rare and complex, yet unexplained, issue is to avoid JDK 17 and go with either JDK8, JDK 11 or JDK 21. As to the question you may want to ask about whether or not we filed an issue, the answer is no. We did not open an issue for the JDK as we do not have a 100% reliable way of reproducing the issue 🙁

Have you encountered any issues, or do you have suggestions for new features? Join the discussion in the Warp 10 Lounge! You can also contact our sales team if you would like SenX to help you directly. And lastly, stay tuned for the upcoming release of the Trace Plugin!

Download version 3.2.0 here.