-
Notifications
You must be signed in to change notification settings - Fork 108
Strings on Soletta flows
In order to make it easy to deal with strings on flow based programs, a family of node types (components) covering the most frequent operations over strings are provided:
- comparisons
- conversions from / to other data types
- manipulation (concatenation, replacement, upper / lower case)
- regular expressions
By default, Soletta™ Framework expects that all strings in a flow are UTF-8-encoded. For that, the pristine build relies on (and links against) ICU libraries. There's a possibility, though, of builds with no ICU. If that is the case, all nodes doing string operations will still work, given the strings on the flows are all ASCII-encoded. Soletta framework's string capabilities also involve regular expressions, that rely on the PCRE library. If that dependency is not present, Soletta will still build, but all regular expression nodes will be made useless, in that all input on them will only generate error packets as output.
Soletta has plenty of utility nodes dealing with strings on your flows, namely:
-
converter/string-to-blob
converter/string-to-boolean
converter/string-to-byte
converter/string-to-empty
converter/string-to-float
converter/string-to-int
converter/string-to-timestamp
converter/boolean-to-string
converter/byte-to-string
converter/empty-to-string
converter/float-to-string
converter/int-to-string
converter/timestamp-to-string
converter/error
The most basic string node type is the constant. Use that to generate (packets of) string constants on your flow. Plenty of Soletta nodes have string input ports that will accept those, for instance:
file(constant/string:value="/some/path") OUT -> PATH reader(file/reader)
reader OUT -> IN _(console)
When comparing some string output with an expected value, one would also make use of string constants:
... OUT -> IN[0] cmp(string/compare)
value(string/constant:value="expected_value")
value OUT -> IN[1] cmp OUT -> IN _(console)
Following that basic usage, there are pretty obvious helper nodes that will concatenate, check if is empty, calculate length and change case of string packet's contents.
Let's take a look at the nodes that check if strings start or end with others:
value(constant/string:value="some_value")
value OUT -> IN _(string/starts-with:prefix="some") OUT -> IN _(console)
would output a true boolean packet. One could also shift the comparison starting point further from the string's start, as in:
value(constant/string:value="some_value")
value OUT -> IN starts_with(string/starts-with:prefix="ome_",start=1)
starts_with OUT -> IN _(console)
also getting a true packet in the output. The string/ends-with
packet works analogously.
If one has to edit the contents of a string packet, the
string/replace
node could do the job:
value(constant/string:value="some_value")
value OUT -> IN replace(string/replace:from_string="some",to_string="other")
replace OUT -> IN _(console)
would output a string packet with other_value
as content. By
default, all occurrences of from_string
will be replaced by
to_string
, but that can be limited by the extra option
max_replace
.
Another common string operation is to retrieve a slice from it. That
is what the string/slice
node is made to:
value(constant/string:value="some_value")
value OUT -> IN _(string/slice:start=3,end=6) OUT -> IN _(console)
would output a string packet with e_v
as content. The start
and end
indices work just like Python/JavaScript—you may pass
negative values to them, in order for them to start indexing from the
string's end.
Now imagine that you want to split your string in others, e. g. get
all words in a sentence, excluding the spaces between them. That's
what the string/split
node is for:
_(constant/string:value="some value") OUT -> IN split(string/split)
split OUT -> IN _(console)
would output a string packet with some
as content. That is because
the node's default expected separator is the single space, and the
default output is the first substring splitted. You could tune this
behavior with the following options:
value(constant/string:value="some+value")
value OUT -> IN _(string/split:separator="+",index=1) OUT -> IN _(console)
now having as output a string packet with value
as content.
If string/starts-with
, string/ends-with
and string/replace
are not enough for one's search and replace needs on strings, then regular
expression nodes can be used. The search one is straight-forward:
value(constant/string:value="some value")
search(string/regexp-search:regexp="(.*)\\s(.*)",index=0)
value OUT -> IN search OUT -> IN _(console)
would output a string packet with some value
as content. If the
index
option was changed to 1
, then one would get some
as
content. That tells the rule: the first substring will always identify
the portion of the input string matched by the entire pattern. The
next one will be the first capturing sub-pattern, and so on.
The string/regexp-replace
node is that with replacing
capabilities, by use of regular expression back references:
value(constant/string:value="some value")
replace(string/regexp-replace:regexp="(.*)\\s(.*)",to="\\2")
value OUT -> IN replace OUT -> IN _(console)
would output a string packet with value
as content, i. e., fetch
the second matching group in the regular expression.
Note that all regular expressions in Soletta nodes must follow Perl 5's syntax, and that any Unicode character can be matched too. The following excerpt would work as well:
value(constant/string:value="★ ☆")
replace(string/regexp-replace:regexp="(.*)\\s(.*)",to="\\2")
value OUT -> IN replace OUT -> IN _(console)
with ☆
as content. Note, also, that one must escape regular
expression non-literals doubly, to bypass Soletta framework's fbp
parser's
character unescaping.
Besides controlled constants, there's a string node to generate 16
bytes (128 bits, v4) long UUIDs—string/uuid
. It may be useful on
all scenarios where an unique ID is required.
The next important operation one needs to do on strings is to convert from them to other Soletta primitive (package) types and vice-versa.
Let's start with the string → other types direction.
Converting a string packet to a blob one will generate a (memory) blob
with the exact bytes of the string (nul terminating byte included).
Converting a string to a boolean value will work if the input is in
the form true
or false
, in any case. Values other than that will
lead to errors in the flow. Converting a string to an empty packet will
accept any string as input, as one might expect.
Now, converting a string packet to a floating point or integer one will accept the various notations for these numbers, e. g.:
str_to_float(converter/string-to-float)
float_str1(constant/string:value="3.1415")
float_str2(constant/string:value="2.24e10")
float_str1 OUT -> IN str_to_float
float_str2 OUT -> IN str_to_float
str_to_float OUT -> IN _(console)
would output a float packets with 3.1415
and
22400000000.000000
as values, respectively. Special float
constants understood by strtod()
may as well be used, like NaN
or INFINITY
.
Converting strings to bytes must be so that the input string has the
two nibble characters in sequence, optionally preceded by 0x
:
const_byte_str(constant/string:value=0x0e)
const_byte_str OUT -> IN str_to_byte(converter/string-to-byte)
str_to_byte OUT -> IN _(console)
Finally, converting a string to a timestamp packet involves passing a timestamp format to the converter node:
time_str(constant/string:value="09/23/16 09:30:00")
converter(converter/string-to-timestamp:format="%D %T")
time_str OUT -> IN converter OUT -> IN _(console)
would output a timestamp packet with that date as value (time zone shifts may apply), because the timestamp format matches the expected one.
We now jump to the other types → string direction.
The simplest value to convert to a string is an empty packet:
empty(constant/empty)
empty_to_str(converter/empty-to-string:output_value=Empty)
empty OUT -> IN empty_to_str OUT -> IN _(console)
would output a string packet with Empty
as value.
Now if the source was a boolean packet, it would go as:
true(constant/boolean:value=true)
true_to_str(converter/boolean-to-string:true_value="not false")
true OUT -> IN true_to_str OUT -> IN _(console)
would output a string packet with not false
as value. If the
true_value
option was not used, one would get True
instead
(the default value). That is analogous for false values.
The next conversion in complexity is byte to string, that is
straight-forward—one gets a string with the two nibble characters,
prefixed with 0x
, as in 0x3e
.
Soletta has got very special nodes doing integer/floating-point to
string conversions. They accept a format string with the syntax very
similar to Python's
string.format()
,
except that we got no format recursion, no !
conversions, and the
attribute names must be one of the integer/floating-point Soletta
basic type sub-fields (val
, min
, max
, step
). Let's
illustrate with examples:
value(constant/float:value=DBL_MAX)
converter(converter/float-to-string:format_spec="{:e}")
value OUT -> IN converter OUT -> IN _(console)
would output a string packet with 1.797693e+308
as value. That
shows how if no named field is used, the default number field to be
addressed in the val
one. The others may be addressed like so:
value(constant/float:value=3.1415)
converter(converter/float-to-string:format_spec="{val:07n} ★ {min:g} ★ {max:e} ★ {step:%}")
value OUT -> IN converter OUT -> IN _(console)
would output a string packet with 03.1415 ★ -1.79769e+308 ★ 1.797693e+308 ★ 0.000000%
as value.
Converting a time stamp packet to a string goes as follows
time(timestamp/time)
converter(converter/timestamp-to-string:format="%Y-%m-%dT%H:%M:%S")
_(constant/empty) OUT -> TRIGGER time
time OUT -> IN converter OUT -> IN _(console)
that outputs a string packet with something like
2015-09-25T13:39:34
as value. Thus, the user must always select
the desired format.
The last string converter is when one wants to extract, from an error packet, the error code and the string message:
const_int(constant/int:value=10) OUT -> DIVIDEND div(int/division)
const_zero(constant/int:value=0) OUT -> DIVISOR div
div ERROR -> IN conv_error(converter/error)
conv_error MESSAGE -> IN _(console)
would output a string packet with something like Numerical argument out of domain
as contents.
If there's a scenario where lots of string packets are produced, but
the user wants to forward only the ones that do not repeat in
sequence, there's a filter node for that—filter-repeated/string
.
Just pass the string packet stream through it and the output will
contain only non-repeated values in sequence.
There may also be a need for, at some point in a flow, to be taken a
choice of where, in the possible flow paths, to forward a given string
packet (in other words, to select the packet routing). There's a node
for that—switcher/string
. In the following example we exemplify
that, be changing in a timely manner the output port of the switcher
that is going to forward its respective input packet. The source input
port is not changed (we stick to 0
), but that already gives on
idea of the node usage.
switcher(switcher/string)
accumulator(int/accumulator:setup_value=min:0|max:3)
accumulator OUT -> OUT_PORT switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[0] switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[1] switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[2] switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[3] switcher
_(timer:interval=1000) OUT -> INC accumulator
switcher OUT[0] -> IN _(console:prefix="out[0]: ")
switcher OUT[1] -> IN _(console:prefix="out[1]: ")
switcher OUT[2] -> IN _(console:prefix="out[2]: ")
switcher OUT[3] -> IN _(console:prefix="out[3]: ")
That would output something like:
out[0]: 0 (string)
out[1]: 1 (string)
out[2]: 2 (string)
out[3]: 3 (string)
out[0]: 0 (string)
out[1]: 1 (string)
...
Say, now, that one has security restrictions on a given part of a
Soletta flow and wants to isolate that part from others in a
controlled way. This is what the unix-socket/string-writer
and
unix-socket/string-reader
nodes are there for, at least for
string packet exchange. Split the string passing part of the flow of
interest with these two nodes, control the access in both said files
with any file permission or security tool of your choice, and you're
done:
timer(timer:interval=1000) OUT -> INC acc(int/accumulator)
acc OUT -> IN converter(converter/int-to-string)
socket_string_writer(unix-socket/string-writer:path=/tmp/string_socket)
converter OUT -> IN socket_string_writer
in one side, for example, and
socket_string_reader(unix-socket/string-reader:path=/tmp/string_socket)
socket_string_reader OUT -> IN _(console)
on the other side would split this integer strings sequence passing in two flows depending on that unix socket. The output would be something like:
#anon:38:32 1 (string)
#anon:38:32 2 (string)
#anon:38:32 3 (string)
#anon:38:32 4 (string)
#anon:38:32 5 (string)
#anon:38:32 6 (string)
...
All that was exposed here in terms of nodes deals with flows exchanging string data in the same system/machine. If string packet exchange is to happen by the use of the network, one would use the HTTP client/server nodes:
client(http-client/string:url=http://127.0.0.1/string)
server(http-server/string:path="/string",value="Hello from the server")
_(constant/empty) OUT -> GET client OUT -> IN _(console)
This would output a string packet with Hello from the server
as
contents. The client node also does HTTP posts, if needed.
All the samples shown so far have an initial string packet declared
explicitly. In order to aid in storing previous flow execution string
values to be used later, one can make use of the
persistence/string
node:
client(http-client/string:url=http://127.0.0.1/string)
_(constant/empty) OUT -> GET client
persist_string(persistence/string:storage="fs",name="string")
client OUT -> IN persist_string
persist_string OUT -> IN _(console)
shows a flow where a string value that's stored in a server is made to
persist for future use. Thus, even if connectivity is down, previous
values of that string will still be available. The storage
persistence node option has as value fs
, meaning to use an
ordinary file (naturally, in a file system) to store said data. There
are more complicated options for place of storage, like EFI variables
and non-volatile RAM. For those types of storage, check the
documentation for more information.