Monday, June 04, 2012

Kyua gets its own blog

For the last couple of weeks, I have been pondering the creation of a Kyua-specific blog. And, after a lot of consideration, I have finally taken the plunge. Say hello to Engineering Kyua!

From now on, all Kyua-related posts (as well as ATF posts) will go to the new blog. I recommend you to subscribe to Engineering Kyua's Atom feed right now to not miss a beat!  If you care enough about Kyua, that is...

I may still post Kyua-related stuff in here once in a while, but you should assume that all news and, in particular, weekly status reports will be sent to the new blog.

"Why?" Well, The Julipedia is supposed to be (and always has) my personal blog. Looking back at all the recent posts, they almost univocally are about Kyua and there is no personal content in them. In respect for the readers of this blog (who may not care about Kyua at all) and in order to attempt to give Kyua a more definite identity, it makes sense to move the posts to their own blog.

Also, by having a blog dedicated to Kyua, I will not feel uncomfortable about publishing weekly status reports again. I previously felt that they were adding too much noise to this blog, and is the main reason behind why I stopped posting them at some point. Weekly reports have their value, mostly to keep myself focused and to allow outsiders to know what the project is up to (particularly in a world of DVCSs, where code changes may be kept private for weeks at a time).

And you may wonder: "will you continue to post content here?" Sure I will, but I need ideas (suggestions welcome)! Today's social ecosystem makes it difficult for me to decide whether a post belongs in a blog, in Google+, in Twitter... and updating them all at once to provide the same content is pointless.

Here is my take: for most of the irrelevant stuff that one may want to share at a personal level (photos, videos, arbitrary thoughts), social networks seem to provide a better platform. The blog seems a place more suited for short essays that should be indexable and be accessible by users across the web; for example, these include how-tos, technical explanations for a particular concept, or opinion articles. And, finally, Twitter seems like the place to throw pointers to longer articles elsewhere and very short opinion comments. I think this summarizes pretty well what my current "practices" around these systems follow. And, as you can deduce, this also explains (as you have experienced) why the blog gets fewer content than ever because most things are better suited for a social network.

Saturday, June 02, 2012

Exposing a configuration tree through Lua

In the previous post, I discussed the type-safe tree data structure that is now in the Kyua codebase, aimed at representing the configuration of the program. In this post, we'll see how this data structure ties to the parsing of the configuration file.

One goal in the design of the configuration file was to make its contents a simple key/value association (i.e. assigning values to predetermined configuration variables). Of course, the fact that the configuration file is just a Lua script means that additional constructions (conditionals, functions, etc.) can be used to compute these values before assignment, but in the end all we want to have is a collection of values for known keys. The tree data structure does exactly the latter: maintain the mapping of keys to values, and ensuring that only a set of "valid" keys can be set. But, as a data structure, it does not contain any of the "logic" involved in computing those values: that is the job of the script.

Now, consider that we have the possible following syntaxes in the configuration file:

simple_variable = "the value"
complex.nested.variable = "some other value"

These assignments map, exactly, to a tree::set() function call: the name of the key is passed as the first argument to tree::set() and the value is passed as the second argument. (Let's omit types for simplicity.) What we want to do is modify the Lua environment so that these assignments are possible, and that when such assignments happen, the internal tree object gets updated with the new values.

In order to achieve this, the configuration library modifies the Lua environment as follows:
  • The newindex metatable method of _G is overridden so that an assignment causes a direct call to the set method of the referenced key. The key name is readily available in the newindex arguments, so no further magic is needed. This handles the case of "a = b" (top-level variables).
  • The index metatable method of _G is overridden so that, if the indexed element is not found, a new table is generated and injected into _G. This new table has a metatable of its own that performs the same operations as the newindex and index herein described. This handles the case of "a.b = c", as this trick causes the intermediate tables (in this case "a") to be transparently created.
  • Each of the tables created by index has a "key" metatable field that contains the fully qualified key of the node the table corresponds to. This is necessary to be able to construct the full key to pass to the set method.
  • There is further magic to ensure that values pre-populated in the tree (aka default values) can be queried from within Lua, and that variables can be set more than once. These details are uninteresting though.
At the moment, we deny setting variables that have not been pre-defined in the tree structure, which means that if the user wants to define auxiliary variables or functions, these must be declared local to prevent calling into the _G hooks. This is quite nice, but we may need to change this later on if we want to export the standard Lua modules to the configuration files.

Tuesday, May 29, 2012

Type-safe, dynamic tree data type

The core component of the new configuration library in Kyua is the utils::config::tree class: a type-safe, dynamic tree data type. This class provides a mapping of string keys to arbitrary types: all the nodes of the tree have a textual name, and they can either be inner nodes (no value attached to them), or leaf nodes (an arbitrary type attached as a value to them). The keys represent traversals through such tree, and do this by separating the node names with dots (things of the form root.inner1.innerN.leaf).

The tree class is the in-memory representation of a configuration file, and is the data structure passed around methods and algorithms to tune their behavior. It replaces the previous config static structure.

The following highlights describe the tree class:
  • Keys are (and thus the tree layout is) pre-registered. One side-effect of moving away from a static C++ structure as the representation of the configuration to a dynamic structure such as a tree is that the compiler cannot longer validate the name of the configuration settings when they are queried. In the past, doing something like config.architecture would only compile if architecture was a valid structure defined... but now, code like config["architecture"] cannot be validated during the build.
    In order to overcome this limitation, trees must have their keys pre-defined. Pre-defining the keys declares their type within the tree.  Accesses to unknown keys results in an error right away, and accesses to pre-defined keys must always happen with their pre-recorded types.
    Note that pre-defined nodes can, or cannot, hold a value. The concept of "being set" is different than "being defined".
  • Some nodes can be dynamic. Sometimes we do not know what particular keys are valid within a context. For example, the test_suites subtree of the configuration can contain arbitrary test suite names and properties within it, and there is no way for Kyua (at the moment) to know what keys are valid or not.
    As a result, the tree class allows defining a particular node as "dynamic", at which point accesses to any undefined keys below that node result in the creation of the node.
  • Type safety. Every node has a type attached to it. The base configuration library provides common types such as bool_node, int_node and string_node, but the consumer can define its own node types to hold any other kind of data type. (It'd be possible, for example, to define a map_node to hold a full map as a tree leaf.)
    The "tricky" (and cool) part of type safety in this context is to avoid exposing type casts to the caller: the caller always knows what type corresponds to every key (because, remember, the caller had to predefine them!), so it knows what type to expect from every node. The tree class achieves this by using template methods, which just query the generic internal nodes and cast them out (after validation) to the requested type.
  • Plain string representations. The end user has to be able to provide overrides to configuration properties through the command line... and the command line is untyped: everything is a string. The tree library, therefore, needs a mechanism to internalize strings (after validation) and convert them to the particular node types. Similarly, it is interesting to have a way to export the contents of a tree to strings so that they can be shown to the user.
With that said, let's see a couple of examples. First, a simple one. Let's create a tree with a couple of fictitious nodes (one a string, one an integer), set some values and then query such values:

config::tree tree;

// Predefine the valid keys.
tree.define< config::string_node >("kyua.architecture");
tree.define< config::int_node >("kyua.timeout");

// Populate the tree with some sample values.
tree.set< config::string_node >("kyua.architecture", "powerpc");
tree.set< config::int_node >("kyua.timeout", 300);

// Query the sample values.
const std::string architecture =
    tree.lookup< config::string_node >("kyua.architecture");
const int timeout =
    tree.lookup< config::int_node >("kyua.timeout");

Yep, that's it. Note how the code just knows about keys and their types, but does not have to mess around with type casts nor tree nodes. And, if there is any typo in the property names or if there is a type mismatch between the property and its requested node type, the code will fail early. This, coupled with extensive unit tests, ensures that configuration keys are always queried consistently.

Note that we'd also have set the keys above as follows:

tree.set_string("kyua.architecture", "powerpc");
tree.set_string("kyua.timeout", "300");

... which would result in the validation of "300" as a proper integer, conversion of it to a native integer, and storing the resulting number as the integer node it corresponds to. This is useful, again, when reading configuration overrides from the command line as types are not known in that context yet we want to store their values in the same data structure as the values read from the configuration file.

Let's now see another very simple example showcasing dynamic nodes (which is a real-life example from the current Kyua configuration file):

config::tree tree;

// Predefine a subtree as dynamic.
tree.define_dynamic("test_suites");

// Populate the subtree with fictitious values.
tree.set< config::string_node >("test_suites.NetBSD.ffs", "ext2fs");
tree.set< config::int_node >("test_suites.NetBSD.iterations", 5);

// And the querying would happen exactly as above with lookup().

Indeed, it'd be very cool if this tree type followed more standard STL conventions (iterators, for example). But I didn't really think about this when I started writing this class and, to be honest, I don't need this functionality.

Now, if you paid close attention to the above, you can start smelling the relation of this structure to the syntax of configuration files. I'll tell you how this ties together with Lua in a later post. (Which may also explain why I chose this particular representation.)

Monday, May 28, 2012

Rethinking Kyua's configuration system

In the previous blog post, I described the problems that the implementation of the Kyua configuration file parsing and in-memory representation posed. I also hinted that some new code was coming and, after weeks of work, I'm happy to say that it has just landed in the tree!

I really want to get to explaining the nitty-gritty details of the implementation, but I'll keep these for later. Let's focus first on what the goals for the new configuration module were, as these drove a lot of the implementation details:
  • Key/value pairs representation: The previous configuration system did this already, and it is a pretty good form for a configuration file because it is a simple, understandable and widespread format. Note that I have not said anything yet about the types of the values.
  • Tree-like representation: The previous configuration schema grouped test-suite specific properties under a "test_suites" map while it left internal run-time properties in the global namespace. The former is perfect and the latter was done just for simplicity. I want to move towards a tree of properties to give context to each of them so that they can be grouped semantically (e.g. kyua.report.*, kyua.runtime.*, etc.). The new code has not changed the structure of the properties yet (to remain compatible with previous files), but it adds very simple support to change this in the shortcoming future.
  • Single-place parsing and validation: A configuration file is an external representation of a set of properties. This data is read (parsed) once and converted into an in-memory representation. All validation of the values of the properties must happen at this stage, and not when the properties are queried. The reason is that validation of external values must be consistent and has to happen in a controlled location (so that errors can all be reported at the same time).
    I have seen code in other projects where the configuration file is stored in memory as a set of key/value string pairs and parsing to other types (such as integers, etc.) is delayed until the values are used. The result is that, if a property is queried more than once, the validation will be implemented in different forms, each with its own bugs, which will result in dangerous inconsistencies.
  • Type safety: This is probably the trickiest bit. Every configuration node must be stored in the type that makes most sense for its value. For example: a timeout in seconds is an integer, so the in-memory representation must be an integer. Or another example: the type describing the "unprivileged user" is a data structure that maps to a system user, yet the configuration file just specifies either a username or a UID.
    Keeping strict type validation in the code is interesting because it helps to ensure that parsing and validation happen in just a single place: whenever the configuration file is read, every property will have to be converted to its in-memory type, and this means that the validation can only happen at that particular time. Once the data is in memory, we can and have to assume that it is valid. Additionally, strict types ensure that the code querying such properties uses the values as intended, without having to do additional magic to map them to other types.
  • Extensibility: Parsing a configuration file is a very generic concept, yet the previous code made the mistake of tying this logic with the specific details of Kyua configuration files. A goal of the new code has been to write a library that parses configuration files, and allows the Kyua-specific code to define the schema of the configuration file separately. (No, the library is not shipped separately at this point; it's placed in its own utils::config module.)
With all this code in place, there are a bunch of things that can now be easily implemented. Consider the following:
  • Properties to define the timeout of test cases depending on their size (long-standing issue 5).
  • Properties to tune the UI behavior: width of the screen, whether to use color or not (no, there is no color support yet), etc.
  • Properties to configure how reports look like "by default": if you generate reports of any form frequently, it is very likely that you will want them to look the same every time and hence you will want to define the report settings once in the configuration file.
  • Hooks: one of the reasons for using Lua-based configuration files was to allow providing extra customization abilities to the user. Kyua could theoretically call back into Lua code to perform particular actions, and such actions could be explicitly stated by the user in the form of Lua functions. Neither the current configuration code nor Kyua has support for hooks, but the new implementation makes it rather easy to add them.
And that's all for today. Now that you know what the current code is trying to achieve and why, we will be able to look at how the implementation does all this in the next posts.

Saturday, May 26, 2012

Kyua's configuration system showing its age

A couple of years ago, when Kyua was still a newborn, I wrote a very ad-hoc solution for the parsing and representation of its configuration files. The requirements for the configuration were minimal, as there were very few parameters to be exposed to the user. The implementation was quick and simple to allow further progress on other more-important parts of the project. (Yep, quick is an euphemism for dirty: the implementation of the "configuration class" has to special-case properties everywhere to deal with their particular types... just as the Lua script has to do too.)

As I just mentioned in the previous paragraph, the set of parameters exposed through the configuration file were minimal. Let's recap what these are:
  • Run-time variables: architecture and platform, which are two strings identifying the system; and unprivileged_user, which (if defined) is the name of the user under which to run unprivileged tests as. It is important to mention that the unprivileged_user is internally represented by a data type that includes several properties about a system user, and that it ensures that the data it contains is valid at all times. The fact that every property holds a specific type is an important design requirement.
  • Test suite variables: every test suite can accept arbitrary configuration variables. Actually, these are defined by the test programs themselves. All of these properties are strings (and cannot be anything else because ATF test programs have no way of indicating the type of the configuration variables they accept/expect).
Because of the reduced set of configurable properties, I opted to implement the configuration of the program as a simple data structure with one field per property, and a map of properties to represent the arbitrary test suite variables. The "parser" to populate this structure consists on a Lua module that loads these properties from a Lua script. The module hooks into the Lua metatables to permit things like "test_suites.NetBSD.timeout=20" to work without having to predeclare the intermediate tables.

Unfortunately, as I keep adding more and more functionality to Kyua, I encounter additional places where a tunable would be appreciated by the end user (e.g. "disallow automatic line wrapping"). Exposing such tunable through a command-line flag would be a possibility, but some of these need to be permanent in order to be useful. It is clear that these properties have to be placed in the configuration file, and attempting to add them to the current codebase shows that the current abstractions in Kyua are not flexible enough.

So, why am I saying all this? Well: during the last few weeks, I have been working on a new configuration module for Kyua. The goals have been simple:
  • Have a generic configuration module that parses configuration files only, without any semantics about Kyua (e.g. what variables are valid or not). This ensures that the implementation is extensible and at the right level of abstraction.
  • Be able to get rid of the ad-hoc parsing of configuration files.
  • Allow defining properties in a strictly-typed tree structure. Think about being able to group properties by function, e.g. "kyua.host.architecture"; this is more or less what we have today for test-suite properties but the implementation is a special-case again and cannot be applied to other tunables.
And... I am pleased to say that this code is about to get merged into the tree just in time for Kyua 0.4. In the next few posts, I will explain what the particular design constraints of this new configuration system were and outline a little bit its implementation. I think it's a pretty cool hack that mixes C++ data structures and Lua scripts in a "transparent" manner, albeit you may think it's too  complex. The key part is that, as this new configuration module is not specific to Kyua, you might want to borrow the code/ideas for your own use!

Monday, April 02, 2012

Kyua gets nicer console messages

For the last couple of weeks, particularly during a bunch of long flights, I have been improving the command-line user interface of Kyua by implementing controlled line wrappings on screen boundaries: messages that are too long to fit on the screen are preprocessed and split into multiple lines at word boundaries. This affects informational messages, error messages and, specially, the output of the built-in help command.

I originally got this idea from Monotone and later implemented it into ATF but, when writing Kyua's code, I decided to postpone its implementation until a later stage. Reusing the code from ATF was not "nice" because the API of the formatting code was quite nasty, and reimplementing this feature during the initial stages of Kyua felt like a waste of time.

However, because controlled line wrapping is crucial to having readable built-in help messages, I have had to do this eventually and the time finally came.

The ATF approach

Why did I say that the ATF code for line wrapping was quite nasty? The main reason is that the printing of messages was incredibly tied to their wrapping. All the code in ATF that prints a message to the screen has to deal with the line wrapping itself, which involves dealing with too many presentation details.

For example, consider the help routine that prints the table of options and their descriptions. This routine has to calculate the width of the longest option first and then, for every option, output its name, output some padding, and output the description properly refilled so that subsequent lines are properly arranged with respect to the previous one. While this may not sound too bad in writing, it actually is in code.

Furthermore, because all this formatting logic is spread out throughout the code, there is no way to perform decent unit testing. The unit testing did some basic tests on input text, but could not validate that more complex constructions were working right.

The Kyua approach

In the Kyua codebase, I decided to take a more declarative and functional approach. Smaller, pure building blocks that can be combined to achieve more complex constructions and that can be easily tested for correctness individually or in combination.

I started by implementing a simple function that takes a paragraph and reformats it to any given length. This simple function alone gives full flexibility to the caller to decide how to later merge this reformatted text with other text: e.g. place a line prefix or bundle such text inside a table cell.

The next step was to implement tables. Code wishing to print, e.g. the collection of options/commands along their descriptions only cares about declaring a table of two columns and N rows; why should it bother about properly lining up the two columns and printing them? It doesn't, hence the table approach. With tables, the caller can just decide which particular column needs to be wrapped if the table does not fit on the screen, and allow the formatting code to do this. Plus, having this higher level constructs means that we can eventually print the textual reports in a nicer, tabulated way (not done yet).

And the last step was to mix all these higher level constructs into the console frontend class. This class (the ui) knows how to query the width of the terminal and knows how to fit certain kinds of text and/or tables within such width. For example, error messages are not tables: they are messages prefixed with the command name; only the message has to be reformatted if it does not fit while the rest of the text has to flow after the command name. Or, for tables, the maximum width of the terminal determines how wide the table can be and thus how much one of its columns has to be refilled.

Getting this whole thing right working has proven to be extremely tricky and I'm sure there are still quite a few rough edges to be discovered. That said, it has been fun enough :-)  But, after this experience, I certainly don't want to imagine the pain that the writers of HTML/CSS renderers have endured... this text-based table-rendering is trivial compared to what web browsers do!

Tuesday, March 13, 2012

Kyua generates its first public HTML report

Lately, three long trips (5 hours in a bus, and 6 and 10 hours in two planes) have allowed me to work on the long-promised HTML reporting feature of Kyua. The result of these three trips is, effectively, the ability to generate HTML reports for specific test actions!

The current results are extremely rudimentary (they lack tons of would-be-useful information) and not that aesthetically pleasing. However, the database already records enough information to make these reports more useful and pretty, so "all that is left" is coming up with the necessary code to extract such information in an efficient way and spending time creating a visually-nicer appearance. None of these are as trivial as they sound, but I prefer to work one step at a time (i.e. coming up first with a very rough draft and improve it later) rather than keeping the feature private until it is "perfect".

Without further ado, you can take a look at the report of the execution of the NetBSD test suite. This output comes from my NetBSD-current virtual machine, and I've set up a couple of cron jobs to keep it up to date. (If the "action X" in the title does not increase periodically, you will know that something is broken on my side. I found that the VM already crashed, so I don't know for how long it will run now after a restart...)