Warp 10™ is known for its ability to handle hundreds of millions of Geo Time Series™. This post will take you on a journey across the components and functions involved in the management of those series metadata.
Meet the Directory
Each Geo Time Series™ has attached metadata. The combination of class and labels uniquely identifies a series. There is also a secondary set of key value pairs called attributes which are not part of the identifier. Finally there is a metadata tracking when a series was last updated.
The component of Warp 10 in charge of keeping track of those metadata is called the Directory. In the standalone version of Warp 10 it is included in the single process of each instance. In the distributed version of Warp 10, it is a component which can be run all by itself and can be replicated and sharded.
The role of the Directory is to identify which Geo Time Series™ match search criteria. Those criteria are regular expressions on class and label/attributes values and optionally a criterion on the last activity timestamp. The Directory returns the list of matching Geo Time Series™ which can then be used for fetching data.
A series of WarpScript™ functions interact with the Directory, the rest of this post will detail the usage of each of them.
FIND function will retrieve a list of Geo Time Series™ matching some criteria on class and labels/attributes. The result is a list of GTS with no values.
FIND function only interacts with Directory, leading to very short response time.
Note that the structure of your regular expressions may impact the performance of the Directory search. If you have regular expression criteria for alternate values (i.e. a construct of the form
A|B|C) with lots of alternatives, we encourage you to use the
REOPTALT function which will produce an optimized regular expression from a list of values.
METASET function behaves much like
FIND but will produce an object named a Metaset which can later be used as a parameter to
FETCH. A Metaset contains credentials, restrictions on timestamps and a list of Geo Time Series, all encrypted in an opaque structure.
Using a Metaset you can give access to your data without giving out your credentials (token). You can also limit the time range which can be fetched.
Lastly, for the cases where the Directory search is less performant, using a Metaset can speed up things since it contains all of the GTS metadata and its use does not require a Directory search.
Sometimes you need to learn things about your Geo Time Series. The
FINDSTATS function will identify matching Geo Time Series and will return some statistics about them.
Those statistics contain estimations of the number of GTS, of classes and of label values. If the number of matching classes or labels is above some thresholds then detailed statistics will not be returned.
This last function will identify matching Geo Time Series and return various informations.
FINDSETS will analyze the matched GTS and output the list of classes and for labels and attributes a map with the list of values associated with each label or attribute.
This function is very useful for providing autocompletion in front end applications, displaying drop down menus with possible values for class or labels.
You now know that the Warp 10 Directory is responsible for tracking all known Geo time Series™ and answer search queries.
You have also discovered functions related to GTS discovery.
We hope you will get a new look at your data using these functions.
SHM is a Warp 10™ extension that allow to keep data in RAM. It speeds up data scientist job !
Archiving your Time Series data into a service such as S3 is something you may want to do as your data volumes grow. Learn how the S3 WarpScript™ extension can help you do just that.
Co-Founder & Chief Technology Officer