Response Tokenisation
Tokenisation is very important. When working with a stream of data, as provided by TCP and UDP, you most certainly want to
    1.
    Wait for a complete response before processing
    2.
    Want to process one response at a time
Whilst this might seem to occur naturally most of the time, network contention, network errors and high data rates will trip you up. CBus is one system where I’ve often see back to back messages returned in a single IO read.
Engine ships with two tokenisers to help you break up the incoming data. The default buffered tokeniser and the more advanced abstract tokeniser.

Default Tokeniser

Usage:
1
class Example::Driver
2
# Device driver helper
3
tokenize delimiter: "\x0D"
4
end
Copied!
Options:
Option
Description
delimiter
sequence to detect the end of message. Supports strings and regular expressions
indicator
sequence to detect the start of a message
msg_length
can be used with an indicator if messages are always a fixed length
size_limit
prevents buffering from using all your memory if the end of a message is never detected
min_length
can help prevent false positives
encoding
defaults to ASCII-8BIT to avoid invalid characters when dealing with binary data
Example:
tokenize indicator: "\x02", delimiter: "\x03"
and data: "yu\x03\x02hello\x03\x02world\x03\x02how" Would have the following result:
    yu\x03 would be discarded
    hello would be returned
    world would be returned
    \x02how would be buffered

Abstract Tokeniser

The primary use case for this tokeniser is variable length messages, where length can be determined by the message contents. (commonly a length field in the header)
Usage:
1
class Samsung::Displays::MdSeries
2
tokenize indicator: "\xAA", callback: :check_length
3
4
# Called by the Abstract Tokenizer
5
def check_length(byte_str)
6
# Check for minimum length
7
return false if byte_str.bytesize <= 3
8
response = str_to_array(byte_str)
9
10
# data length byte + (header + checksum) == message length
11
len = response[2] + 4
12
13
if response.length >= len
14
# return the length of this message (any excess will be buffered)
15
return len
16
else
17
# false if the complete message hasn't arrived yet
18
return false
19
end
20
end
21
end
Copied!
Options:
Option
Description
callback
callable code, proc, lambda, method etc that will return an integer or false
indicator
sequence to detect the start of a message (string or regex)
size_limit
prevents buffering from using all your memory if the end of a message is never detected
encoding
defaults to ASCII-8BIT to avoid invalid characters when dealing with binary data

Further Reading

For a detailed overview of what these tokenisers are capable of, it is worth looking at their tests.
Last modified 1yr ago