Sunday, November 23, 2008

Linux audio sucks

because of bugs in drivers, libraries, and applications. Here are some of them. They are quite real, because they interfere with my not-so-unrealistic requirements (play music through headphones connected to the onboard HD audio chip, make some noise with the PC speaker when someone calls me via SIP, and copy sound from my TV tuner to the HD audio chip when I watch TV).
  • In a driver. Take, for example, saa7134-alsa (a kernel module that allows recording audio from SAA713X-based TV tuners). It advertises support for the following sample rates: 32 kHz and 48 kHz. What it fails to mention is that it doesn't support recording at 48 kHz except for the case of "original SAA7134 with the MIXER_ADDR_LINE2 capture source". In all other cases (e.g., SAA7133 with the MIXER_ADDR_TVTUNER capture source), it gives out 32 kHz samples, but mislabels them as 48 kHz. Thus, applications such as MPlayer have to be configured to use only the 32 kHz sample rate. Not a big problem if such configuration is possible (i.e.: if not using PulseAudio together with HAL), but still a bug.
  • In the ALSA library. Today I discovered that the PC speaker at home no longer plays ring tones in SIP clients. It did work before, but I can no longer find a working revision. While testing this sound device with speaker-test, I found this floating point exception inside the ALSA library.
  • In applications. The most frequent bug is that an application doesn't have a way to use an arbitrary ALSA device, and instead has a drop-down box with pre-defined choices. Such applications often cannot work with Bluetooth headsets (that I don't have), FireWire audio cards (that I don't have either), and through non-default PulseAudio devices (and this has bit me when I tried to use Linphone with PulseAudio - I couldn't configure it to use the PC speaker through PulseAudio). What's even worse is that HAL endorses this faulty "drop-down box" enumeration scheme.

Sunday, November 16, 2008

Strange backtrace

Some time ago I had to debug a strange crash. It was in a multithreaded program and manifested itself only on FreeBSD i386. The code (with all the needed declarations included, seemingly irrelevant details removed, and everything renamed) looks like this:


#include <cstdio>
#include <cstring>
#include <cerrno>
#include <fcntl.h>
#include <unistd.h>
#include <exception>


class system_error : public exception
{
public:
system_error() throw() :error_text(strerror(errno)) {}
virtual const char* what() const throw() { return error_text; }
private:
const char* error_text;
};

class strange_thing
{
public:
strange_thing(); // fills in some useful defaults
private:
// lots of implementation details
};

class strange_container
{
public:
strange_container();
~strange_container() { if (fd != -1) close(fd); }
void play_with_strange_thing(const char* filename);
private:
int fd;
};

strange_container::strange_container()
: fd(-1)
{
play_with_strange_thing("test.file");
}

void strange_container::play_with_strange_thing(const char* filename)
{
fd = open(filename, O_CREAT | O_TRUNC | O_RDWR, 0777);
if (fd == -1)
throw system_error();
strange_thing ss;
/* here goes some code that uses ss and fd */
}

// well, actually it is not the main function, but something buried in a thread
int main(int argc, char* argv[])
{
strange_container c;
return 0;
}



The test file is not created, and the segfault looks like this:

[aep@bsd1 ~/crashtest]$ gdb ./a.out
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
(gdb) run
Starting program: /usr/home/aep/crashtest/a.out
[New LWP 100043]
[New Thread 0x28301100 (LWP 100043)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x28301100 (LWP 100043)]
strange_container::play_with_strange_thing (this=0x28306098,
filename=0x8048c33 "test.file") at crashtest.cpp:57
57 fd = open(filename, O_CREAT | O_TRUNC | O_RDWR, 0666);


At this point, everything looks valid, including the "this" pointer. By all applicable logic, the program just cannot segfault by calling open() with valid parameters. So I started adding debugging printf() statements. The statement just before the call to play_with_strange_thing() worked fine, and none of the statements inside play_with_strange_thing() worked. Moreover, when I added a printf() as the very first line of strange_container::play_with_strange_thing() and ran gdb on the result, it showed this printf() in the backtrace!

So, I didn't believe my eyes. I thought (wrongly) that printf() and buffering somehow interacts with the segfault, and thus invented a different mechanism to find out whether a certain line of code was reached by the program. Namely, I replaced all my debugging printf() calls in strange_container::play_with_strange_thing() with throwing exceptions, with the intention to remove them one-by-one:


void strange_container::play_with_strange_thing(const char* filename)
{
throw system_error(); // (1)
fd = open(filename, O_CREAT | O_TRUNC | O_RDWR, 0777);
throw system_error();
if (fd == -1)
throw system_error();
throw system_error(); // (2)
strange_thing ss;
throw system_error(); // (3)
/* here goes some code that uses ss and fd, err... throws system_error() */
}


This worked. I knew for sure that point (2) was reached, and point (3) was't. So there is something bad with creation of the strange thing that, however, doesn't cause gdb to complain about its constructor.

So, I had to take another look into the implementation of the strange_thing class. The issue was actually with the huge size of the object (several megabytes)! So, no wonder that it overflowed the thread stack. You can reproduce the crash on your own system by replacing "lots of implementation details" with "char c[2000000];", implementing the strange_thing default constructor, and running with a low-enough "ulimit -s" setting.

As the reason of the crash became known, it was an easy matter to fix properly, by not creating huge objects on the stack.