Consistent Overhead Byte Stuffing

March 26, 2011 | Embedded Software

Framing packets transmitted on a UART serial link (such as RS-232 or RS-485) is a common problem in embedded programming. The typical approach is to put framing characters at the beginning and end of the packet. If a framing character occurs in the packet, it is escaped using a technique called byte stuffing. This sucks for a few reasons, but there's a better way. The recently developed algorithm called Consistent Overhead Byte Stuffing is a great solution that should be in every embedded software engineer's toolbox.

First, a bit of background. I'm working on a project where I have to transmit data between two microcontrollers using RS-422. RS-422 is a differential signalling standard that is often used in place of RS-232 to connect two UARTs when electrical isolation and/or noise immunity are required.

The question came up, how do we assemble packets on the transmitter and detect packet boundaries on he receiver? Our packets all have the same size and can contain arbitrary binary data. My first thought was to use idle line detection. If our UART detects enough bit times without a start bit, it can give us an idle line interrupt. Because we're feeding the transmitting UART with DMA, we can guarantee that there are no gaps in the middle of the packet that would cause a false idle line to be detected.

My boss pointed out that I was relying on an implementation detail that might not be valid if we ever moved to a really cheap microcontroller. He encouraged me to come up with a solution that put the framing into the packet itself.

The traditional way to do this is with byte stuffing. You designate a character (such as STX, ASCII 0x02) as the start of frame character and another (often ETX, ASCII 0x03) as the end of frame character. If there's a possibility that ETX or STX will occur in your data, you designate a third character as the escape character (ESC, ASCII 0x1F, for example). STX, ETX, and ESC characters in the data are replaced by an ESC followed by mangled version of he character (the character XORed with 0x20 is a popular choice).

There are two problems with this system:

  • It can add a lot of overhead. In the worst case, the encoded data could be twice the size of the original data. Unless you can be sure this won't happen, you have to design your buffers and bandwidth to handle this worst case.

  • The amount of overhead is variable. If you want to use DMA or FIFO buffers to send and receive your data, dealing with variable length data can be annoying. For example, you can't reliably request an interrupt after a frame's worth of data has been received. When you're transmitting at multiple megabits per second, you really don't want to check for a complete frame after each character is received.

Returning to the story above, I was really annoyed. My boss had a good point, but byte stuffing is just a pain. I did a bunch of googling, and finally came across COBS. It has the following great properties:

  • It removes all zero bytes from the original message, allowing the zero byte to be used as a frame delimiter.

  • The worst-case overhead is just 1 byte for every 254 bytes of data, or 0.39%.

  • For messages smaller than 254 bytes, the overhead is constant (exactly 1 byte).

  • The encoding process is fairly efficient (no complicated rearranging of input data across multiple bytes).

There's sample code on the Wikipedia COBS page. I've improved the Wikipedia code in a few ways:

  • My encoder and decoder return the length of the output data, because that size varies depending on the input data.

  • My decoder checks the validity of its input data to protect against corrupted or malicious data. The original Wikipedia code would cause a buffer overflow if an overly high block length code occurred at the end of the input data.

  • My decoder doesn't put the imaginary zero byte at the end of its output.

  • My encoder and decoder use modern C99. For example, specifying the restrict qualifier on the input and output pointers means the code can be more heavily optimized (restrict is a promise that the two buffers do not overlap, which helps the optimizer eliminate unneeded loads and stores).

COBS resources:

Here's my C implementation of COBS:

/* Copyright 2011, Jacques Fortier. All rights reserved.
*
* Redistribution and use in source and binary forms are permitted, with or without modification.
*/
#include <stdint.h>
#include <stddef.h>

/* Stuffs "length" bytes of data at the location pointed to by
* "input", writing the output to the location pointed to by
* "output". Returns the number of bytes written to "output".
*
* Remove the "restrict" qualifiers if compiling with a
* pre-C99 C dialect.
*/
size_t cobs_encode(const uint8_t * restrict input, size_t length, uint8_t * restrict output)
{
size_t read_index = 0;
size_t write_index = 1;
size_t code_index = 0;
uint8_t code = 1;

while(read_index < length)
{
if(input[read_index] == 0)
{
output[code_index] = code;
code = 1;
code_index = write_index++;
read_index++;
}
else
{
output[write_index++] = input[read_index++];
code++;
if(code == 0xFF)
{
output[code_index] = code;
code = 1;
code_index = write_index++;
}
}
}

output[code_index] = code;

return write_index;
}

/* Unstuffs "length" bytes of data at the location pointed to by
* "input", writing the output * to the location pointed to by
* "output". Returns the number of bytes written to "output" if
* "input" was successfully unstuffed, and 0 if there was an
* error unstuffing "input".
*
* Remove the "restrict" qualifiers if compiling with a
* pre-C99 C dialect.
*/
size_t cobs_decode(const uint8_t * restrict input, size_t length, uint8_t * restrict output)
{
size_t read_index = 0;
size_t write_index = 0;
uint8_t code;
uint8_t i;

while(read_index < length)
{
code = input[read_index];

if(read_index + code > length && code != 1)
{
return 0;
}

read_index++;

for(i = 1; i < code; i++)
{
output[write_index++] = input[read_index++];
}
if(code != 0xFF && read_index != length)
{
output[write_index++] = '\0';
}
}

return write_index;
}