Sunday, October 30, 2005

Delta modulation

In a previous post, I outlined how the Pulse Code Modulation (PCM) works (it was not the aim of the article, but it contains this information). Let's now see another modulation technique used for voice signals, the Delta Modulation (DM).

The DM codifies each signal sample using a single bit, which is determined from the previous sample's value and the current one. The bit specifies whether the new sample is higher than the previous one, hence only describing the variation in the information (and not its contents). The resulting set of bits can be drawn as a staircase that approximates to the original signal, being 1 a rise of the stair and a 0 a descent of it.

Remember that the number of samples per second is determined by the Nyquist theorem.

For example, consider a standard 4KHz voice channel. You need to take 8000 samples per second to preserve the original signal. Given this and using the DM, you can codify the signal with 8000 bps or 8 Kbps. OTOH, if you were using PCM with 256 levels (8 bits each), the same signal could occupy 8000 * 8 bps or 64 Kbps.

It is clear that the DM needs to transmit a lot less data than PCM. But, as you can imagine, DM doesn't provide any quality: it does not adapt well to signals with sudden changes (such as a shouting voice), although it might work for monotonous voices. On the opposite side, PCM with 256 levels provides very good quality.

IIRC, DM was once used in the Japanese public telephony network but was soon dropped due to the poor results. (Sorry, couldn't find a link to demonstrate this.)

Saturday, October 29, 2005

Java: Dynamic class loading

One of the things I like most about Java is its ability to load classes on demand. Given its interpreted nature, the virtual machine can detect when a class is not yet loaded and bring it to memory transparently; think of this as page faults and the MMU handling them.

Even more, you can ask it to load a specific class based on its name, which allows you to design powerful abstract models painlessly. Of course, you can do the same thing with C or C++, but things are trickier — dynamically loadable modules are not easy to deal with. (You could also use link sets.)

As an example, let's consider (a simplification of) a set of generic classes for unit testing that I've written as part of a university project:
  • Test: An abstract class to implement unit tests. Each class in the project will have another class dedicated to testing, which will inherit this one. For example, given a Foo class, there will be a TestFoo that does whatever is appropriate to ensure that Foo works.
  • Tester: A generic driver that, based on commands given in the standard input, loads the appropriate test class, runs its unit tests and verifies the results based on prerecorded (correct) output values.
The thing is: how do you get a Tester class that is completely independent of, say, TestFoo? Ideally, Tester should only know the Test class and nothing else. Here is where dynamic class loading comes to help.

The Tester class receives a class name (Foo in our example) on the program's standard input, among a set of arguments that describe the unit test to be executed. Based on this name, it tries to load its corresponding testing class and, if it succeeds, it executes the appropriate methods. This is only possible because all testing classes have Test as the common ancestor.

As a code example, assume a className variable that holds the name read from standard input and a opName variable that has the name of the operation to be tested:
String className, opName;
/* Read className and opName from stdin. */
try {
Class c = Class.forName("Test" + className);
Test t = (Test)c;
t.execute(opName);
} catch (Exception e) {
/* Catching Exception for simplicity. */
}
It is up to your imagination how you use this feature, but, as you can see, it is extremely useful.

Friday, October 28, 2005

USB mouse, pbbuttons and lack of udev

When I have to do intensive "graphical" work on the iBook, I often connect an external USB mouse because I find the touchpad uncomfortable. However, I've been suffering a problem for a long time: the pbbuttons daemon did not recognize it, so it didn't detect activity and shut off the screen every now and then.

I was so tired of this annoying behavior that I sat down and looked for the problem. It bothered me a lot because it used to work correctly in a previous installation. I'm running Debian and udev is not enabled; this last thing is the most likely difference between the two installations.

Looking at pbbuttons' manual page, I found that it uses the /dev/input/event* devices to detect peripheral activity. I looked at those files and there were four of them (generated by MAKEDEV); strange, I thought; everything looks correct. So I did the obvious thing after reading pbbuttonsd.conf(5): added autorescan=yes in /etc/pbbuttonsd.conf. No luck; it kept working incorrectly.

The next thing was to check whether the daemon was using the devices or not. I did: lsof | grep pbbuttons as root and, as expected, it had all the appropriate devices open. Hmm... it reads from the devices but doesn't detect mouse activity? Maybe there is something wrong in the kernel? (No, fortunately ;-)

My next test was a cat /dev/input/event? >/tmp/a for each device, move the external mouse while the command was running and see if /tmp/a had a size different than 0 afterwards. As you can expect, for the event[0-3] devices (the standard ones), I saw no output. Aha! That had to be the cause.

I created a new device:

# cd /dev/input
# mknod event4 c 13 68
# chmod 660 event4


ensured that it received mouse events using the trick above and restarted pbbuttons. Voila; it detected the mouse again. Maybe I should enable udev and this could have worked automatically...

Thursday, October 27, 2005

GNU Zebra

This semester, I'm taking a course (PIAM) about Internet protocols at the university. One of its main subjects are Internet routing protocols, of which we've seen RIP and OSPF as IGPs and BGP as the EGP.

In order to see their functionality in action, we set up a set of machines to simulate the Internet and/or isolated LANs, using the protocols mentioned above. You can imagine that we needed many routers to get this working (around 20), as all the class had to practise the same thing at once. But the routers that implement these protocols (specially BPG) are very expensive, so this is not affordable.

Instead, we use regular PCs running Linux to act as real routers. There is a wide variety of software to do this, but we used the GNU Zebra application. Zebra is a free routing software that runs on top of Unix systems and implements all the protocols mentioned above.

What I found very interesting, though, is that GNU Zebra is controlled through an IOS-like interface. It accepts most of the IOS commands verbatim, despite some of them are a bit different. Also note that this is not restricted to the routing protocols only: you can even set up the network interfaces and routing tables using this interface so it can be used to hide the underlying OS to anyone who already knows IOS.

Wednesday, October 26, 2005

SoC: Map and results

Quoting Google Code news:
As the remaining awards are being distributed and the t-shirts are being prepped to ship out, we thought we'd give you an idea of the both the global scope and the hard work done by the students and mentors during the Summer of Code. We've put together a map and a list of projects for your examination. Please note that not all of the projects are listed quite yet, but we wanted to share some info with the people who follow this site.
It has taken a while, but the projects and results are finally announced. There certainly is a lot of interesting stuff in the list :-)

Tuesday, October 25, 2005

Remembering Freedows OS

Yesterday's MINIX post made me remember the Freedows OS project; I talked about it with a friend recently, who didn't know about its existence, so hence this post.

Freedows was an operating system that aimed at providing multi-platform emulation. It was born in 1996 and was based on a microkernel architecture. It had to have servers for different popular operating systems, such as Windows, Linux and DOS (IIRC), allowing the user to run applications from each of these seamlessly.

It is a pity that it was abandoned in 2002, possibly because its high ambitions and expectations. It seemed to be quite popular at its time, as I recall articles about it in generic computer magazines. What is worse, the web site disappeared together with the code (I guess it is still somewhere else for historical reasons, though). Check out the Wikipedia article for more details.

Fortunately, we now have ReactOS, which is coming along nicely and also plans to incorporate non-Windows subsystems.

BTW, a quick Google search shows a Linux distribution from Brazil whose name is Freedows. Don't get confused, as it is completely unrelated.

Monday, October 24, 2005

MINIX 3 published

As seen in the MINIX news site, the 3rd major release of this operating system has been released today. As seen in the new web site:
MINIX 3 is a new open-source operating system designed to be highly reliable and secure. It is based somewhat on previous versions of MINIX, but is fundamentally different in many key ways. MINIX 1 and 2 were intended as teaching tools; MINIX 3 adds the new goal of being usable as a serious system on resource-limited and embedded computers and for applications requiring high reliability.
The fact that it is based on a microkernel and published under a BSD-like license have raised my interest in this new version — yes, I like the microkernel design more than the monolitical one (and this is one thing I don't like from NetBSD).

I've already tested the downloadable live CD and looks promising. It's extremely fast, but obviously less functional than a complete BSD or Linux system (at least for now). I still remember when I tried Minix 2.something on a 386SX 16Mhz laptop; it really made a difference compared to Linux (using a Slackware 3.4).

I guess we can expect nice things from this new code-base.

SoC: Payment received

Being part of Planet SoC, I think it is a good idea to post this: I've just received Google's cheque for my Summer of Code 2005 tmpfs project! I'm happy :-)

Unfortunately, due to some tax issues, Google has withhold a 30% of the original payment. I hope to be able to ask for a refund the next year...

Sunday, October 23, 2005

Sending mail from the command line with Mutt

During the migration to Blogger, I used the post-by-mail service to ease the move of all posts. I downloaded all the old ones into my computer and then automatically sent a mail for each of them to the appropriate posting address.

However, this was not easy. As all the posts were in HTML format, I needed to tell the mailer to send a multipart message with a text/html part. After many attempts, NetBSD's mail(1) command proved to be insufficient so I had to look for another mailing utility to do the same thing. (Note that I know few things about the mail protocol, so I can be missing something.)

My first thought was that Mutt could help. Indeed, if I composed an empty mail and attached an HTML file, it did what I wanted. The problem came when I had to automate this from a script... The first attempt was something like:

mutt -s "$(cat mail/$f.subject)" -a mail/$f.html address@example.org

Ok, this worked, but having to type [Enter][Enter], then :wq and then y for each message was not automatic. The first thing I solved was the save and edit part from the editor. Instead of using vi(1), I asked Mutt to use touch(1):

EDITOR=touch mutt -s "$(cat mail/$f.subject)" -a mail/$f.html address@example.org

This solved the :wq part but I still had to type [Enter][Enter]y for each post. How to solve it... I searched a bit (I mean, Googleed a bit) and found that giving /dev/null as the standard input to Mutt was enough to silence it. So the command ended being as:

EDITOR=touch mutt -s "$(cat mail/$f.subject)" -a mail/$f.html address@example.org </dev/null

Still, it'd be nice if the standard mail(1) utility was able to send complex messages...

Saturday, October 22, 2005

Blog migrated to Blogger; welcome!

As I outlined a week ago, I was considering the migration of my blog (jmmv's weblog) from Livejournal to Blogger... and I finally did it. Therefore, welcome to the new site!

Before continuing, update your subscriptions! You can find the link to the new Atom feed in the Basics section on the sidebar (don't know how to do RSS yet). And forget about the older site; it will no longer be updated (though the contents will remain there for a long time).

As you can see, the name of the blog has changed to The Julipedia. See this page for more information on its etimology and a bit of history.

How did I do the migration? It was not easy; I was able to automate several things, but ended up doing a lot by hand. Of course, I could have spent a lot of time writing and creating a script that did the whole migration, but it was not worth it. Here is a brief outline of what I had to do:
  1. I exported all the messages from the old site to XML documents using Livejournal's export utility. This tool works on a month basis, so I had to go month by month (easy).
  2. With the XML files at hand, I created a little XSLT stylesheet that extracted the subject and body parts of each post and converted them to a plain text file with extremely simple delimiters.
  3. Then I created a Perl script that took this plain text file and wrote a pair of files for each post: one holding the subject and one the body, doing some HTML fixes along the way.
  4. With all these little files at hand, I used Blogger's post-by-mail feature to automatically post all my messages (by using a little shell loop and mutt). I will talk about this on another post.
  5. Unfortunately, this did not preserve timestamps, so I manually fixed them all. Big ew: 181 posts to fix by hand.
  6. At last, I migrated all user comments, also by hand. At first, I thought there were very few of them, so it seemed like a painless task. But in the end, there were a lot (which is a good thing :-).
    All my posts were migrated using my account. Posts made by other Livejournal users were migrated by using their user name and link to their blog. Posts made by other people were posted as anonymous.
    I also added a little note on each post with the exact date in which it was originally posted, for archiving reasons.
And that is it. It has not been fun, but it is done. I hope you will enjoy the new site as much as I like it!

About The Julipedia

The Julipedia is Julio M. Merino Vidal's personal blog; this was born on June 22nd, 2004, and was previously known as jmmv's weblog. On October 22nd, 2005, jmmv's weblog was officially migrated from Livejournal to Blogger due to multiple reasons (beeing the ability to control visits the most important one).

The transition was an ideal moment to rename the blog to give it an identity of its own, and hence The Julipedia was born. This name was invented by Brainstorm, a friend of mine, who uses to call me by this name. It is composed of two parts, as you can see: Juli, which is the Catalan spelling of my first name, and pedia, which comes after Wikipedia. I believe this is because I often answer his Unix related questions quickly :-)

I hope you like the new look and structure of the blog; it should be easier to navigate than before. And, of course, be welcome to the blog if you are new to it!

(The aim of this post is to be linked from the Basics column on the sidebar. Its contents can change without further notice.)

How to contact Julio M. Merino Vidal

If you would like to suggest an idea for a future post, please use the Suggestion box. I will receive a notification when you add a note to that post, so rest assured that I will read it.

If you need to contact me personally, you can do it by sending an e-mail to my personal address. If your message is related to NetBSD in some way, you can use my NetBSD address. I will ignore technical questions sent to these (see above).

(The aim of this post is to be linked from the Basics column on the sidebar. Its contents can change without further notice.)

Suggestion box

I am open to suggestions from my readers — that is, you — for future posts. Keep in mind that these are mere suggestions: I reserve the right to talk about those that I find interesting and omit those that don't. Note that this post's comments might be deleted in the future to leave room for newer ones.

Here is a non-exclusive list of topics I am inclined to cover:
  • General programming questions — specially if they are focused on C++.
  • General Unix questions — specially if they target BSD systems.
  • General portability questions.
  • pkgsrc internals and questions about the packages I maintain (mostly GNOME).
  • NetBSD internals (to some extent; I will be glad to explain what I have learned so far but I do not know much yet).
  • Questions about my own software projects.
Please note that this is not meant to be a technical support forum. If you need help using any of the projects listed above, please refer to their documentation and/or public mailing lists.

Yes, I took the idea of a suggestion box from The Old New Thing ;-)

(The aim of this post is to be linked from the Basics column on the sidebar. Its contents can change without further notice.)

Sunday, October 16, 2005

GNOME 2.12.1 hits pkgsrc

As usual, the latest stable version of the GNOME Platform and Desktop, 2.12.1, has been integrated into pkgsrc. It has been a tough job due to all the affected packages and comes a bit late, compared to all the previous updates, but I hope you'll enjoy it.

Please see the official announcement for more information.

Saturday, October 15, 2005

Blogger

Two years ago or so, I registered a weblog at Blogger, which only lasted a week (don't bother looking for it; it was deleted). At that time, I didn't like that site much, specially because it lacked a very important feature: integrated support for comments. Yes, you could use external utilities/sites to host comments, but that was complex.

However, a few minutes ago, I discovered that they now have comments support; yay! I spent some time surfing their site and found it very intuitive. Also, I registered a new weblog (whose name I won't unveil yet, but a friend of mine will know which it is ;-) just to see which features they support now.

And all I can say is that it seems better than Livejournal... Here are some reasons:

  • Blogger applies the weblog style to all pages: posts and comments. Here at Livejournal, only the front page gets the look you design, but all comment ones have a standard, non-customizable and ugly look (I don't know about paid accounts).
  • As regards the style, Blogger lets you customize it completely, editing the HTML code directly. Livejournal does not let you do this unless you have a paid account.
  • Blogger addresses are shorter and easier to remember: http://<blogname>.blogspot.com/ vs. http://www.livejournal.com/users/<user>.
  • The blogger interface seems to be more intuitive than Livejournal's one.
  • Blogger lets you post images on your blog (without external links).
  • Blogger lets you post by e-mail.
  • Neither Livejournal nor Blogger have support to track blog statistics. However, Blogger lets you add statistics trackers from other sites, which is a plus.
  • My beloved Drivel supports both systems, so easier migration from this POV.

So... I'm seriously considering migrating this blog there, because the reasons mentioned above. However, there are two things against this idea: people will have to update their address (though I don't think there are many readers out there ;) and, worst of all, I don't know if it's possible to easily migrate all the existing posts to the new site (and this is a must).

Anyone has comments on this? Any counter-arguments or suggestions? :-)

Edit 18:08: The statistics item was added.

Friday, October 14, 2005

Articles: Lightweight web serving with thttpd

Several months ago I started writing an article about setting up and using the lightweight web server thttpd, focusing on the NetBSD operating system. I finally decided to finish it a month ago and submitted it for publication. You can now read it on-line at ONLamp.

Hope you find it useful!

Wednesday, October 12, 2005

C++: Constructors and global data

As a general rule of thumb, accessing global data from within class constructors is a dangerous operation and ought to be avoided. The C++ specification does not describe the order in which global variables are initialized, so there are high chances that your constructor accesses uninitialized data, producing unexpected resultsn (e.g., crashes in some circumstances).

It is important to remark this. This behavior is undefined, so it can change across compilers and/or architectures. If your code relies on it, it's broken: something that apparently works on one machine may not work on another one.

Let's see an example. First of all, consider an Integer class that wraps an integer (like Java's one) and a Foo class with a not-yet-specified constructor:

class Integer {
int m_value;

public:
Integer(int v) { m_value = v; }
int get_value(void) { return m_value; }
};

class Foo {
public:
Foo(void);
};

Up to this point, everything is correct. Now, given that both classes are usable, we declare two global variables, one of each type:

static Integer Global_Integer(5);
static Foo Global_Foo;

As these are objects, the C++ runtime will call their constructors when initializing them, but we can't predict the order in which this will happen.

Now we define Foo's constructor body, which accesses and prints Global_Integer's value:

Foo::Foo(void)
{
std::cout << Global_Integer.get_value() << std::endl;
}

With this done, if you add an empty main function and run the program, you should see an integer; it will be probably be 5 (at least with GNU G++ 4.0.1 on a Linux/powerpc box), but it may not. Whichever value you get, try reversing the variable declaration lines, defining Global_Foo first and see if the results change (I get 0 in this box).

Clear? OK, this example is extremely simple, but imagine if this same structure was split among multiple files and classes in a large project... You couldn't easily predict what happens under the hood.

Saturday, October 08, 2005

Monotone: Using mini-branches to apply patches

The Monotone VCS provides the concept of mini-branches. A mini-branch is a lightweight branch created inside a formal branch whenever a commit causes "conflicts" with the actual contents of the repository. For example, if your working copy is not up to date and you commit something, you will create a new head within the branch (that is, a mini-branch), that you will later need to (possibly manually) merge with the other head to remove the divergence.

Mini-branches can be used to easily apply externally-provided patches to your software project. Consider the following "collapsed" revision subgraph:

      tag: foo-2.1        tag: foo-3.0
| |
A -> B -> C -> D -> E -> F -> G -> H -> I -> J

As you can see, some development happened in revisions A, B and C, at which point the program was considered stable and the 2.1 release was made. Some time later, and after lots of changes, revision G was tagged as 3.0 and that release was made.

One of this project's users notices a bug in the 2.1 version, tracks it down and fixes it. For whatever reason, he cannot update to 3.0 to see if his changes work with the latest version, so he decides to submit his fix as a patch against 2.1 to the mainstream developers.

So, how do they handle the patch? It will doubtfully apply cleanly to their current code-base, which is far past 3.0. Of course, they can inspect it, adjust it and apply it directly to revision J, but all this process won't be tracked anywhere. Users and developers could later be confused when looking at the original patch and the patch that was really applied — "why were those changes done?".

Here is where mini-branches come to help. The developers ask Monotone to check out a clean copy of C (the same that the user had), ensuring that the patch will apply cleanly. At that point, they apply the fix and commit it to the tree, thus "storing" the original patch file in it. As a result, the revision tree could look like:

      tag: foo-2.1        tag: foo-3.0
| |
A -> B -> C -> D -> E -> F -> G -> H -> I -> J
K

As you can see, the repository now has two heads (J and K) in the same branch (which can be inspected using monotone heads). J is a lot farther than K in terms of development, but that doesn't matter to the VCS system. Note that, at this point, the revision K carries the code in 2.1 plus the changes submitted by the user verbatim; they still haven't been modified to apply to J, and J is not affected at all by that commit.

Once this is done, and after inspecting why the patch does not apply, the developer decides to merge the heads (monotone merge), thus creating a new revision L that holds all J's code plus the fix added in K:

      tag: foo-2.1        tag: foo-3.0
| |
A -> B -> C -> D -> E -> F -> G -> H -> I -> J -> L
\ /
\ /
K -----------------------------'

Voila! There is now a single head, L, which holds all your code plus the fix sent by the user. Furthermore, the repository has kept track of all the patching process, storing the original and the modified versions of the changes.

Note that this has assumed that revisions are marked as tags rather than as formal branches. Of course, a similar process could be followed if each version was on its own branch (as done with any other VCS).

Thursday, October 06, 2005

Games: Half-Life 2

Half-Life 2's Game Of The Year edition was published past Friday. I finally bought it (have been waiting for this since I threw away my illegal copy several months ago); this edition is cheaper than the original game and comes with some goodies.

And just a few minutes ago, I completed it :-) (had started with my old saved game, which was almost at the end). All I can say is that the game is really stunning. If you like FPSs, this is The One.

There is a lot of variety in the game, as opposed to Doom 3, which I find quite repetitive. Each chapter has its own style and suits the story well. There are also multiple weapons and, as you know, the physics are really well done.

Now, it's time to play it again at a higher difficulty level. I think I'll wait until I renew my video card, though. And also, time to play Half-Life: Source, no matter I've already finished the first game three times :-)

Monday, October 03, 2005

C++: Templates and the ABI

In the previous post, we saw why inlined code is dangerous in a public API. Unfortunately, I carelessly put templates in the same bag in an attempt to generalize the idea too much, giving an incorrect impression of how they work. A reader (fellow_traveler) spotted the mistake and here is this new post to clarify what's up with templates and an ABI... if one can relate the two concepts at all.

C++ makes intensive use (or abuse?) of templates to achieve genericity; one can quickly notice this in the STL, where each container is parametrized based on one or more types. Other libraries, such as Boost, go further and use templates in a lot of situations where one couldn't have ever imagined so.

In order to understand what goes on with templates, let's remind how they work. A template defines a piece of code (be it a class or a function) that is parametrized by a type given by the developer. The template does not exist as binary code, because, simply put, that is impossible: it lacks type information to be compiled.

Here is a trivial function that returns the sum of two objects whose type is defined by Type; we will use it to illustrate some examples below. Also, and to focus on the ABI, let's assume that this code is part of a public library; therefore, it must be placed in a header file (e.g., foo.hpp) to be useful to other users (if it were in a foo.cpp file, it'd simply be private and not usable outside that file).

template <class Type>
Type
add(const Type& p1, const Type& p2)
{
return p1 + p2;
}

When the template is used in someone else's code — in other words, it is instantiated — the compiler grabs the template's source code, fills in the parametrized gaps with the type given by the developer and creates the final object code. For example, given:

int foo = add<int>(2, 3);
float bar = add<float>(2.4, 3.5);

The compiler gets the verbatim add function's code from the header file, replaces Type with int, generates the object code for the resulting function and stores it alongside the user's binary. The same happens with the float instance. Notice how the binary code is not in the library where the template came from, and also notice that the user's binary has gained two new functions, one for each instantiation.

So what happens? Templates cannot take advantage of (binary) shared libraries. Whenever the code in a template changes, the template's user is forced to rebuild his code (if he wants to get the new changes, of course). Imagine that there was a security bug (or any other serious bug) in the template's code: you'd need to make sure to rebuild all its uses to fix the issue, something well-known by users of static binaries.

Of course, the library developer could explicitly instantiate some common types in his library's binary so that the user needn't duplicate the code. This could work in some cases, but as he cannot predict what types will the developer use, this is not a complete solution.

Other developers create templates in a two-layered design. The public template is a very thin wrapper over a private class that achieves genericity by using void * types. This way, the public template is unlikely to change, and the developers can safely change their internal code without requiring external rebuilds. I think I saw this in the STL itself, or maybe in QT, cannot remember.

Summarizing: as my reader said, it makes no sense to talk about templates and ABIs, because a template never has an ABI. It is only an API that, once compiled in third-party code, becomes part of it. I'm now wondering how Java 1.5's templates work or if they suffer from these issues too...

Sunday, October 02, 2005

C++: Inlined code and the ABI

There are many development libraries that provide inline functions (or macros) as part of their public API. This is often done for efficiency reasons, although some times it's done because developers don't know the consequences of doing such things (this last thing is just a guess, but it can perfectly happen). Providing such inlined functions breaks the whole idea of encapsulation and shared libraries. Let's see why.

Consider the following simple class:

/* In foo.h. */
class foo {
int m_value;

public:
int get_value(void) { return m_value; }
void set_value(int v);
/* ... */
};

/* In foo.c; this is _not_ inlined. */
void
foo::set_value(int v)
{
m_value = v;
}

Now imagine that this class belongs to a shared library, say libbar.so.1.0. Given this, our Joe user does this in his code, which is perfectly legal:

foo a;

a.set_value(5);
/* ... do whatever with 'a' ... */
int b = a.get_value();

When this code is compiled, the compiler replaces the call to foo::get_value() with the method's code, avoiding a function call, a return and all the stack set up; all the action takes place in the user's code, not in the library. Typically, getting a value from a structure means reading a concrete position of memory within it, described by its offset from the beginning. OTOH, the call to foo::set_value() is correctly made into a regular function call inside the shared library's text.

Some time later, the libbar developers decide to change the internal representation of the foo class for whatever reason. According to the encapsulation principle used in object oriented designs, they should be able to, after all. Let's suppose they add a new integer before the m_value field, called m_id. Unwillingly, the developers have just changed the ABI of their library and, if they don't take care to update the library's major number, seriuos problems will arise. But, why?

Our Joe user again sees a new release of libbar, say 1.1, so he rebuilds and updates it in his machine, replacing libbar.so.1.0 with libbar.so.1.1; these two libraries typically share the same soname, libbar.so.1, because they are compatible in theory. According to how shared libraries work, he oughtn't rebuild his application.

The set_value() call will continue to work correctly because the application will call the new function in the updated shared library. However, the execution of get_value() will be broken; oops! Remember the sample code shown above? It was compiled as an offset within the class, which is now different! This getter will return an incorrect value, no matter what he does. He'll be forced to rebuild his application to adjust to the new ABI.

Conclusion: be very careful when defining inlined methods and macros. If you need to fix a mistake or modify the internal representation of your code in the future, you will be unable to. Personally, I avoid inlined code in all public interfaces, despite this introduces a small performance degradation; however, they are perfectly fine for internal code.

It's a pity that careless C++ developers make so intensive use of such inlined code. BTW, note that although this has focused on C++, the same is true for, e.g., C99, which provides an inline keyword.

Edit (Oct 3rd): Based on this reply, I've removed some (really minor) references to templated code from the article; they certainly didn't belong here.

Saturday, October 01, 2005

New utility: verifypc

After five months or so of not touching the code, I've finally cleaned up my verifypc utility and imported it into pkgsrc. Its purpose is to sanity check the dependencies of a given package based on its calls to the pkg-config program.

For more information see the announcement in the tech-pkg@ mailing list or this past post in which I detailed the idea.