Menu

#40 COBS decoding support

awaiting-feedback
nobody
COBS (1)
5
2022-07-25
2015-12-11
No

Please consider adding support for COBS (Consistent Overhead Byte Stuffing) decoding, which seems like a simple, robus, low-overhead t way to implement packet framing, while avoiding out-of-band signaling (like idle/break or 9th bit) that can be HW-dependent. See

https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing

for an explanation of the protocol, including sample encoding / decoding routines (in C).

Thanks for providing a very usueful product!

Discussion

  • Simon Bridger

    Simon Bridger - 2016-01-07

    Might be possible to add the encoding and decoding.

    However I am not seeing how this would be actually used in a terminal with respect to data entry of the packets and how the packets would be displayed on the screen.

    How do you visualise it working?
    What sort of data is in the packets and how would it be entered?
    How long are these packets?

    Is anyone else interested?

     
    • Andy Sapuntzakis

      I only have a need for decode, in which case a COBS "Display As" option would work. Instead of the input stream, each decoded packet would get its own line. Data that failed parsing could be displayed raw, in italics, or a different color.

      BTW, wouldn't "Display As" be more flexible as a drop-down list? Looking at 3.0.0.30, it seems like there's an option below "Hex CSV" that's getting chopped.

      For encode, I would suggest something similar for the "Send" tab. Replace the Numbers/Hex/ASCII buttons with a dropdown and TX/Send button, add COBS as a new option in the dropdown. The user would type the unencoded packet.

       

      Last edit: Andy Sapuntzakis 2016-11-22
  • Andy Sapuntzakis

    Here's some documentation I wrote up on COBS back when the I found the wikipedia page to be confusing. I've included some information on optional enhancements.

    Sources
    http://conferences.sigcomm.org/sigcomm/1997/papers/p062.pdf
    https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing

    Note that the current [Dec 2015] version of the wikipedia article provides a confusing explanation. For now, the Cheshire / Baker article is probably clearer.

    Problem
    8/10-bit serial data protocols sending 8-bit binary data create synchronization problems (P6 serial, inter-board communications, etc.) Common solutions include BREAK / IDLE detection or framing (header / checksum) patterns. BREAK and IDLE detection require low-level UART access - high-level (Windows) APIs focus on minimizing data processing overhead, so control & timing "details" are often unavailable. Framing sequence implementations often need to backtrack and resynchronize, since they are subject to false positives caused by the framing pattern appearing in the data.

    Solution
    COBS simplifies framing by encoding data so that the framing (packet delimeter) 00-byte value is unused. It does not address issues of data integrity, which could be achieved by adding a checksum prior to encoding.

    The basis of COBS encoding is that any sequence of non-zero data, followed by a 00-byte, is modified by prefixing a length byte whose value is 1 more than the number of non-zero bytes, and dropping the trailing 00-byte.

    The encoding of several such (hex byte) sequences is shown below:

    11 22 33 00 ... => 04 11 22 33 ...
    00 11 22 33 00 44 00 ... => 01 04 11 22 33 02 44 ...
    11 22 33 00 00 ... => 04 11 22 33 01 ...

    Of course, not all packets end with a 00-byte - especially after a checksum is appended - so COBS unconditionally adds a 00-byte prior to encoding, and unconditionally removes it when decoding. This added byte gives the protocol its name. The encoded example packets below include both the encoded "overhead" 00-byte and the delimeter (framing) 00-byte:

    00 => 01 01 00
    11 22 00 33 => 03 11 22 02 33 00

    Long payloads may contain runs of non-zero data which exceed the capacity of the length byte. To address this, an FF length indicates 254 non-zero bytes without a trailing 00-byte.

    Implementation
    COBS is easily implemented in software, especially when compared to bit-stuffing algorithms like HDLC, which are more suited to hardware implemenatations.

    Although a COBS decoder must be well-behaved when fed corrupt data (i.e. length bytes whose value exceeds remaining data bytes), it can probably assume that any 00-bytes will have already been interpreted as delimeters by the receiver.

    size_t cobs_decode(const unsigned char * input, size_t length, unsigned char * output)
    {
    size_t read_index = 0;
    size_t write_index = 0;
    unsigned char code;

    while (read_index < length)
    {
        // length of non-zero + trailing 0
        code = input[read_index];
    
    
        // make sure there's room (don't error on final 0 pad)
        if (read_index + code > length && code != 1)
    
            return 0;
    
    
        read_index++;
    
    
        // copy non-zero data
        for (unsigned i = 1; i < code; i++)
            output[write_index++] = input[read_index++];
    
    
        // add trailing 0 (except final 0 pad or non-zero seq > max code)
        if (code != 0xFF && read_index < length)
            output[write_index++] = 0;
    }
    
    
    return write_index;
    

    }

    Optional Features
    If a non-zero delimeter byte is desired, each COBS-encoded output byte can be XORed with the value of the desired delimeter byte value prior to transmission. The receiver would repeat the (symmetrical) XOR operation before passing the data to the COBS decoder.

    A further optimization of the COBS algorithm is zero-pair elimination. The modified algorithm works as described above, except that an E0 length indicates the (max) 223 non-zero bytes without a trailing 00-byte, and E1 thru FF lengths indicate a sequence of 0 - 30 non-zero bytes followed by two 00-bytes.

    Another optimization, called COBS/R (Reduced), replaces the last length byte with the last data byte, as long as the value of the data byte is larger than that of the length byte. The decoder is able to detect this situation since the value of the expected length byte exceeds the number of remaining (non-zero) data bytes.

     
  • Simon Bridger

    Simon Bridger - 2016-11-22

    I assume you would want this to be a DisplaysAs choice which displays Hex bytes as the format?
    Or do you you see it as a Binary Sync option that is separate from the DisplayAs choice?

    Would it need to have the optional features to be useful?

    BTW, adding the encoder/s as a CRC is probably easy. Post the encoder sample, and I may add it when next CRC work is done

     

    Last edit: Simon Bridger 2016-11-22
    • Andy Sapuntzakis

      I imagined it as a Display As option (see my shorter post in reply to your 1st question above), displaying packets as hex bytes with newline between packets, and some distinctive font or non-hex delimeter for data that couldn't be parsed.

      Would be nice to be able to enable translation for file capture as well.

      I guess Binary Sync is another possibility. One of the reasons for switching to COBS is because those kinds of sync patterns seemed to work poorly for 8-bit data.

      I don't think the optional features are needed, though the XOR might be useful / easy?

      Thanks for considering this!

       
  • Simon Bridger

    Simon Bridger - 2016-11-24

    on the send side it will be a CRC option, that fits the structure.

    Can you provide encoder code?

     
  • Simon Bridger

    Simon Bridger - 2017-06-08
    • status: open --> awaiting-feedback
     
    • Andy Sapuntzakis

      The wikipedia page above provides encoder and decoder example C code.

       
  • Andy Sapuntzakis

    on the receive side, realterm currently choose between raw (hex) data or interpreted as ASCII. COBS would add another option, where the hex stream would be shown as 0-delimited packets, 1 per line
    on the transmit side, there could be COBS-encoding of individual packets specified in the Send tab, as well as from a file where each packet is delimited by a CR/LF newline

     

Log in to post a comment.

MongoDB Logo MongoDB