Strings on Soletta flows

In order to make it easy to deal with strings on flow based programs, a family of node types (components) covering the most frequent operations over strings are provided:

comparisons
conversions from / to other data types
manipulation (concatenation, replacement, upper / lower case)
regular expressions

By default, Soletta™ Framework expects that all strings in a flow are UTF-8-encoded. For that, the pristine build relies on (and links against) ICU libraries. There's a possibility, though, of builds with no ICU. If that is the case, all nodes doing string operations will still work, given the strings on the flows are all ASCII-encoded. Soletta framework's string capabilities also involve regular expressions, that rely on the PCRE library. If that dependency is not present, Soletta will still build, but all regular expression nodes will be made useless, in that all input on them will only generate error packets as output.

Soletta has plenty of utility nodes dealing with strings on your flows, namely:

Constant
- constant/string
String
Converter
Filter-repeated
- filter-repeated/string
HTTP Server and HTTP Client
- http-server/string
- http-client/string
Persistence
- persistence/string
Switcher
- switcher/string
Unix-socket
- unix-socket/string-reader
- unix-socket/string-writer

General string utilities

The most basic string node type is the constant. Use that to generate (packets of) string constants on your flow. Plenty of Soletta nodes have string input ports that will accept those, for instance:

file(constant/string:value="/some/path") OUT -> PATH reader(file/reader)
reader OUT -> IN _(console)

When comparing some string output with an expected value, one would also make use of string constants:

... OUT -> IN[0] cmp(string/compare)
value(string/constant:value="expected_value")
value OUT -> IN[1] cmp OUT -> IN _(console)

Following that basic usage, there are pretty obvious helper nodes that will concatenate, check if is empty, calculate length and change case of string packet's contents.

Let's take a look at the nodes that check if strings start or end with others:

value(constant/string:value="some_value")
value OUT -> IN _(string/starts-with:prefix="some") OUT -> IN _(console)

would output a true boolean packet. One could also shift the comparison starting point further from the string's start, as in:

value(constant/string:value="some_value")
value OUT -> IN starts_with(string/starts-with:prefix="ome_",start=1)
starts_with OUT -> IN _(console)

also getting a true packet in the output. The string/ends-with packet works analogously.

If one has to edit the contents of a string packet, the string/replace node could do the job:

value(constant/string:value="some_value")
value OUT -> IN replace(string/replace:from_string="some",to_string="other")
replace OUT -> IN _(console)

would output a string packet with other_value as content. By default, all occurrences of from_string will be replaced by to_string, but that can be limited by the extra option max_replace.

Another common string operation is to retrieve a slice from it. That is what the string/slice node is made to:

value(constant/string:value="some_value")
value OUT -> IN _(string/slice:start=3,end=6) OUT -> IN _(console)

would output a string packet with e_v as content. The start and end indices work just like Python/JavaScript—you may pass negative values to them, in order for them to start indexing from the string's end.

Now imagine that you want to split your string in others, e. g. get all words in a sentence, excluding the spaces between them. That's what the string/split node is for:

_(constant/string:value="some value") OUT -> IN split(string/split)
split OUT -> IN _(console)

would output a string packet with some as content. That is because the node's default expected separator is the single space, and the default output is the first substring splitted. You could tune this behavior with the following options:

value(constant/string:value="some+value")
value OUT -> IN _(string/split:separator="+",index=1) OUT -> IN _(console)

now having as output a string packet with value as content.

If string/starts-with, string/ends-with and string/replace are not enough for one's search and replace needs on strings, then regular expression nodes can be used. The search one is straight-forward:

value(constant/string:value="some value")
search(string/regexp-search:regexp="(.*)\\s(.*)",index=0)
value OUT -> IN search OUT -> IN _(console)

would output a string packet with some value as content. If the index option was changed to 1, then one would get some as content. That tells the rule: the first substring will always identify the portion of the input string matched by the entire pattern. The next one will be the first capturing sub-pattern, and so on.

The string/regexp-replace node is that with replacing capabilities, by use of regular expression back references:

value(constant/string:value="some value")
replace(string/regexp-replace:regexp="(.*)\\s(.*)",to="\\2")
value OUT -> IN replace OUT -> IN _(console)

would output a string packet with value as content, i. e., fetch the second matching group in the regular expression.

Note that all regular expressions in Soletta nodes must follow Perl 5's syntax, and that any Unicode character can be matched too. The following excerpt would work as well:

value(constant/string:value="★ ☆")
replace(string/regexp-replace:regexp="(.*)\\s(.*)",to="\\2")
value OUT -> IN replace OUT -> IN _(console)

with ☆ as content. Note, also, that one must escape regular expression non-literals doubly, to bypass Soletta framework's fbp parser's character unescaping.

Besides controlled constants, there's a string node to generate 16 bytes (128 bits, v4) long UUIDs—string/uuid. It may be useful on all scenarios where an unique ID is required.

Converting to and from string packets

The next important operation one needs to do on strings is to convert from them to other Soletta primitive (package) types and vice-versa.

Let's start with the string → other types direction.

Converting a string packet to a blob one will generate a (memory) blob with the exact bytes of the string (nul terminating byte included). Converting a string to a boolean value will work if the input is in the form true or false, in any case. Values other than that will lead to errors in the flow. Converting a string to an empty packet will accept any string as input, as one might expect.

Now, converting a string packet to a floating point or integer one will accept the various notations for these numbers, e. g.:

str_to_float(converter/string-to-float)
float_str1(constant/string:value="3.1415")
float_str2(constant/string:value="2.24e10")
float_str1 OUT -> IN str_to_float
float_str2 OUT -> IN str_to_float
str_to_float OUT -> IN _(console)

would output a float packets with 3.1415 and 22400000000.000000 as values, respectively. Special float constants understood by strtod() may as well be used, like NaN or INFINITY.

Converting strings to bytes must be so that the input string has the two nibble characters in sequence, optionally preceded by 0x:

const_byte_str(constant/string:value=0x0e)
const_byte_str OUT -> IN str_to_byte(converter/string-to-byte)
str_to_byte OUT -> IN _(console)

Finally, converting a string to a timestamp packet involves passing a timestamp format to the converter node:

time_str(constant/string:value="09/23/16 09:30:00")
converter(converter/string-to-timestamp:format="%D %T")
time_str OUT -> IN converter OUT -> IN _(console)

would output a timestamp packet with that date as value (time zone shifts may apply), because the timestamp format matches the expected one.

We now jump to the other types → string direction.

The simplest value to convert to a string is an empty packet:

empty(constant/empty)
empty_to_str(converter/empty-to-string:output_value=Empty)
empty OUT -> IN empty_to_str OUT -> IN _(console)

would output a string packet with Empty as value.

Now if the source was a boolean packet, it would go as:

true(constant/boolean:value=true)
true_to_str(converter/boolean-to-string:true_value="not false")
true OUT -> IN true_to_str OUT -> IN _(console)

would output a string packet with not false as value. If the true_value option was not used, one would get True instead (the default value). That is analogous for false values.

The next conversion in complexity is byte to string, that is straight-forward—one gets a string with the two nibble characters, prefixed with 0x, as in 0x3e.

Soletta has got very special nodes doing integer/floating-point to string conversions. They accept a format string with the syntax very similar to Python's string.format(), except that we got no format recursion, no ! conversions, and the attribute names must be one of the integer/floating-point Soletta basic type sub-fields (val, min, max, step). Let's illustrate with examples:

value(constant/float:value=DBL_MAX)
converter(converter/float-to-string:format_spec="{:e}")
value OUT -> IN converter OUT -> IN _(console)

would output a string packet with 1.797693e+308 as value. That shows how if no named field is used, the default number field to be addressed in the val one. The others may be addressed like so:

value(constant/float:value=3.1415)
converter(converter/float-to-string:format_spec="{val:07n} ★ {min:g} ★ {max:e} ★ {step:%}")
value OUT -> IN converter OUT -> IN _(console)

would output a string packet with 03.1415 ★ -1.79769e+308 ★ 1.797693e+308 ★ 0.000000% as value.

Converting a time stamp packet to a string goes as follows

time(timestamp/time)
converter(converter/timestamp-to-string:format="%Y-%m-%dT%H:%M:%S")
_(constant/empty) OUT -> TRIGGER time
time OUT -> IN converter OUT -> IN _(console)

that outputs a string packet with something like 2015-09-25T13:39:34 as value. Thus, the user must always select the desired format.

The last string converter is when one wants to extract, from an error packet, the error code and the string message:

const_int(constant/int:value=10) OUT -> DIVIDEND div(int/division)
const_zero(constant/int:value=0) OUT -> DIVISOR div
div ERROR -> IN conv_error(converter/error)
conv_error MESSAGE -> IN _(console)

would output a string packet with something like Numerical argument out of domain as contents.

If there's a scenario where lots of string packets are produced, but the user wants to forward only the ones that do not repeat in sequence, there's a filter node for that—filter-repeated/string. Just pass the string packet stream through it and the output will contain only non-repeated values in sequence.

There may also be a need for, at some point in a flow, to be taken a choice of where, in the possible flow paths, to forward a given string packet (in other words, to select the packet routing). There's a node for that—switcher/string. In the following example we exemplify that, be changing in a timely manner the output port of the switcher that is going to forward its respective input packet. The source input port is not changed (we stick to 0), but that already gives on idea of the node usage.

switcher(switcher/string)
accumulator(int/accumulator:setup_value=min:0|max:3)
accumulator OUT -> OUT_PORT switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[0] switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[1] switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[2] switcher
accumulator OUT -> IN _(converter/int-to-string) OUT -> IN[3] switcher
_(timer:interval=1000) OUT -> INC accumulator
switcher OUT[0] -> IN _(console:prefix="out[0]: ")
switcher OUT[1] -> IN _(console:prefix="out[1]: ")
switcher OUT[2] -> IN _(console:prefix="out[2]: ")
switcher OUT[3] -> IN _(console:prefix="out[3]: ")

That would output something like:

out[0]: 0 (string)
out[1]: 1 (string)
out[2]: 2 (string)
out[3]: 3 (string)
out[0]: 0 (string)
out[1]: 1 (string)
...

Strings and security concerns

Say, now, that one has security restrictions on a given part of a Soletta flow and wants to isolate that part from others in a controlled way. This is what the unix-socket/string-writer and unix-socket/string-reader nodes are there for, at least for string packet exchange. Split the string passing part of the flow of interest with these two nodes, control the access in both said files with any file permission or security tool of your choice, and you're done:

timer(timer:interval=1000) OUT -> INC acc(int/accumulator)
acc OUT -> IN converter(converter/int-to-string)
socket_string_writer(unix-socket/string-writer:path=/tmp/string_socket)
converter OUT -> IN socket_string_writer

in one side, for example, and

socket_string_reader(unix-socket/string-reader:path=/tmp/string_socket)
socket_string_reader OUT -> IN _(console)

on the other side would split this integer strings sequence passing in two flows depending on that unix socket. The output would be something like:

#anon:38:32 1 (string)
#anon:38:32 2 (string)
#anon:38:32 3 (string)
#anon:38:32 4 (string)
#anon:38:32 5 (string)
#anon:38:32 6 (string)
...

Strings over the network

All that was exposed here in terms of nodes deals with flows exchanging string data in the same system/machine. If string packet exchange is to happen by the use of the network, one would use the HTTP client/server nodes:

client(http-client/string:url=http://127.0.0.1/string)
server(http-server/string:path="/string",value="Hello from the server")
_(constant/empty) OUT -> GET client OUT -> IN _(console)

This would output a string packet with Hello from the server as contents. The client node also does HTTP posts, if needed.

Persisting strings on non-volatile memory

All the samples shown so far have an initial string packet declared explicitly. In order to aid in storing previous flow execution string values to be used later, one can make use of the persistence/string node:

client(http-client/string:url=http://127.0.0.1/string)
_(constant/empty) OUT -> GET client
persist_string(persistence/string:storage="fs",name="string")
client OUT -> IN persist_string
persist_string OUT -> IN _(console)

shows a flow where a string value that's stored in a server is made to persist for future use. Thus, even if connectivity is down, previous values of that string will still be available. The storage persistence node option has as value fs, meaning to use an ordinary file (naturally, in a file system) to store said data. There are more complicated options for place of storage, like EFI variables and non-volatile RAM. For those types of storage, check the documentation for more information.

Provide feedback

Saved searches