The problem manifested itself as segmentation faults pretty much everywhere, and I could trace such crashes down to pieces of code like the following, of which the C code of ATF is full of:
void
foo_fmt(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
foo_ap(fmt, ap);
va_end(ap);
}
void
foo_ap(const char *fmt, va_list ap)
{
char buf[128];
vsnprintf(buf, sizeof(buf), fmt, ap);
... now, do something with buf ...
}
The codebase of ATF provides _fmt and _ap variants for many functions to give more flexibility to the caller and, as shown above, the _fmt variant just relies on the _ap variant to do the real work.Now, the crashes that appeared from the code above seemed to come from the call that consumes the ap argument, which in this case is vsnprintf. Interestingly, though, all the tests in other platforms but Linux x86_64 worked just fine, and this included OpenSolaris, other Linux distributions, some BSDs and even different hardware platforms.
As it turned out, you cannot blindly pass ap arguments around because they are not "normal" parameters (even though, unfortunately, they look like so!). In most platforms, the ap element will be just an "absolute" pointer to the stack, so passing the variable to an inner function calls is fine because the caller's stack has not been destroyed yet and, therefore, the pointer is still valid. But... the ap argument can have other representations. It'd be an offset to the stack instead of a pointer, or it'd be a data structure that holds all the variable parameters. If, for example, the ap argument held an offset, passing it to an inner function call would make such offset point to "garbage" because the stack would have been grown due to the new call frame. (I haven't investigated what specific representation is x86_64 using.)
The solution is to use the va_copy function to generate a new ap object that is valid for the current stack frame. This is easy, so as an example, we have to rewrite the foo_ap function above as follows:
void
foo_ap(const char *fmt, va_list ap)
{
char buf[128];
va_list ap2;
va_copy(ap2, ap);
vsnprintf(buf, sizeof(buf), fmt, ap2);
va_end(ap2);
... now, do something with buf ...
}
This duplication of the ap argument pointing to the variable list of arguments ensures that ap2 can be safely used from the new stack frame.
6 comments:
It seems that we will have to change some code in GMAC... Portability sucks :-S
Hmm, can you quote chapter and verse? My cursory reading of 7.15 Variable arguments <stdarg.h> doesn't seem to confirm your claim. To the contrary, #3 discussion of va_list seems to imply that that it is totally legal to pass ap as an argument and access it inside the called function. That access does "invalidate" ap in the caller, though.
uwe: Thanks for the information. No, I don't have any proof for what I said. This is a problem I found a long time ago and it looks like I drew the wrong conclusions altogether.
The good thing is that I can still reproduce the problem in Fedora by removing all the va_copy calls in the ATF code, so I'll see what the root cause of this is.
In the meantime, I have added a note to the article saying that it's plain wrong.
So... the standard may say that passing va_list objects around is fine, but the Linux manual page disagrees (surprise?)). And, actually, this is the behavior I'm observing while debugging the problem in Fedora x86_64: the type of a local va_list object is *different* from the value received on a called function. Things seem to work fine in a trivial test program, but they don't in the ATF code (which seems valid).
Here is what the Linux manual page has to say about va_copy:
An obvious implementation would have a va_list be a pointer to the stack frame of the variadic function. In such a setup (by far the most common) there seems nothing against an assignment
va_list aq = ap;
Unfortunately, there are also systems that make it an array of pointers (of length 1), and there one needs
va_list aq;
*aq = *ap;
Finally, on systems where arguments are passed in registers, it may be necessary for va_start() to allocate memory, store the arguments there, and also an indication of which argument is next, so that va_arg() can step through the list. Now va_end() can free the allocated memory again. To accommodate this situation, C99 adds a macro va_copy(), so that the above assignment can be replaced by
va_list aq;
va_copy(aq, ap);
...
va_end(aq);
Each invocation of va_copy() must be matched by a corresponding invocation of va_end() in the same function. Some systems that do not supply va_copy() have __va_copy instead, since that was the name used in the draft proposal.
jmmv: the passage you quote is correct and I don't see anything in it that would say that you must use va_copy to pass your ap to another function. The standard says:
7.15 Variable arguments
[#3] ... If access to the varying arguments is desired, the called function shall declare an object (referred to as ap in this subclause) having type va_list. The object ap may be passed as an argument to another function; if that function invokes the va_arg macro with parameter ap, the value of ap in the calling function is indeterminate and shall be passed to the va_end macro prior to any further reference to ap.*)
*) It is permitted to create a pointer to a va_list and pass that pointer to another function, in which case the original function may make further use of the original list after the other function returns.
uwe: Yeah, but a 'va_list' argument is not a pointer. I don't know at this point -- the observed behavior is weird and I can't reproduce it yet with a simpler test case. I'll keep looking.
Post a Comment