You are here

Serial port programming under Linux: when legacy comes back to haunt you

Programming serial ports on Linux (and other POSIX-compatible systems) can be tricky, because a lot of legacy is involved here. As a general resource, Michael R. Sweet's Serial Programming Guide for POSIX Operating Systems is an invaluable resource. What I found out the hard way is that one actually needs to carefully look at all available flags, otherwise legacy settings can come back to haunt you.

I was working on implementing a binary protocol over a serial port (the details are not important), and I tried to setup the serial port with the proper settings: make sure the baud rate is right, configure the number of data bits, whether parity is used, the number of stop bits, make sure the terminal is in noncanonical mode (for raw access), make sure flow control is configured properly, and configure the right timeouts. What I didn't do, however, was to closely look at all of the input flags available.

On the first computer (where I wrote the code) everything worked well, and the code worked as expected. When trying this on a second computer, however, I had a nasty surprise waiting for me: reading from the serial port would return wrong data in some cases. At first I suspected random bit flips, but the data modification was deterministic. Specifically, when receiving 4-byte integers, weird things would appear. As an example, the value 5901 (properly sent by the other side of the serial connection) would arrive as 5898. But instead of this happening randomly, this happened consistently. With the same USB-to-serial adapter, connected to the same counterpart, this did not happen on the first computer though - everything still worked there.

I finally figured out the issue though: the number 5901 has a binary representation of 0x0d 0x17 0x00 0x00 (in little endian; the endianness doesn't matter here though), while the number 5898 is represented by 0x0a 0x17 0x00 0x00. What happened here was that 0x0d was consistently replaced by 0x0a. The value 0x0d didn't occur that often in the protocol I was implementing (why is not important), so this only affected a small number of things. But enough to make the software break completely. What happened here was that there's an input flag for serial ports that tells the kernel to translate an ASCII carriage return (0x0d) by a new line character (0x0a) - because that can be useful when dealing with ASCII protocols. And for some reason (I still don't know why) the default on the second computer was to have this flag enabled, while the default on the first computer was to have the flag disabled. What's even worse though is that looking through the list of flags, there are also flags that do the reverse (map newlines to carriage returns), and map uppercase ASCII letters to lowercase ASCII letters. Imagine that specific problem hitting in production because on yet another system one of these flags is enabled by default.

The solution is quite simple: receive binary data over a serial port on Linux (and other POSIX systems) requires that the input flags INLCR, IGNCR, ICRNL and IUCLC are all disabled, otherwise decades-old legacy will come back to haunt you.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer