NAME

String::Utils - Provide some optimized string functions

SYNOPSIS

use String::Utils;

say abbrev(<yes no>).keys.sort;        # (n no y ye yes)

say after("foobar","foo");             # bar

say all-same("aaaaaa");                # "a"
say all-same("aaaaba");                # Nil
say all-same("");                      # Nil

say around("foobarbaz", "ob", "rb");   # foaz

say before("foobar","bar");            # foo

say between("foobarbaz","foo","baz");  # bar

say between-included("foobarbaz","oo","baz");  # oobarbaz

say chomp-needle("foobarbaz", "baz");  # foobar

say consists-of("aaabbcc", "abc");     # True
say consists-of("aaadbcc", "abc");     # False
say consists-of("", "abc");            # True

dd expand-tab("a\tbb\tccc",4);         # "a   bb  ccc"

say has-marks("foo👩🏽‍💻bar");             # False
say has-marks("fóöbar");               # True

say is-lowercase("foobar");            # True
say is-lowercase("FooBar");            # False
say is-lowercase("");                  # True

say is-sha1 "foo bar baz";             # False

say is-uppercase("FOOBAR");            # True
say is-uppercase("FooBar");            # False
say is-uppercase("");                  # True

say is-whitespace("\t \n");            # True
say is-whitespace("\ta\n");            # False
say is-whitespace("");                 # True

dd leading-whitespace(" \t foo");      # " \t "

say leaf <zip.txt zop.txt ff.txt>;     # .txt

say letters("//foo:bar");              # foobar

say ngram "foobar", 3;                 # foo oob oba bar

say nomark("élève");                   # eleve

say non-word "foobar";                 # False
say non-word "foo/bar";                # True

.say for paragraphs("a\n\nb");         # 0 => a␤2 => b␤
.say for paragraphs($path.IO.lines);   # …

my $string = "foo";
my $regex  = regexify($string, :ignorecase);
say "FOOBAR" ~~ $regex;                # ｢FOO｣

say replace("ab42cd", /\d+/, "666");   # ab666cd
say replace-all("a4b2c", /\d+/, "6");  # a6b6c

say root <abcd abce abde>;             # ab

say sha1("foo bar baz";                # C7567E8B39E...

say shorten("foobarbaz", 7);           # foo…baz

say stem "foo.tar.gz";                 # foo
say stem "foo.tar.gz", 1;              # foo.tar

say text-from-url $url, :verbose;      # ...

dd trailing-whitespace("bar \t ");     # " \t "

say word-at("foo bar baz", 5);         # (4 3 1)

say word-diff("maiden");               # (daimen damine maiden median)

use String::Utils <before after>;  # only import "before" and "after"

DESCRIPTION

String::Utils provides some simple string functions that are not (yet) provided by the core Raku Programming Language.

These functions are implemented without using regexes for speed.

SELECTIVE IMPORTING

use String::Utils <before after>;  # only import "before" and "after"

By default all utility functions are exported. But you can limit this to the functions you actually need by specifying the names in the use statement.

To prevent name collisions and/or import any subroutine with a more memorable name, one can use the "original-name:known-as" syntax. A semi-colon in a specified string indicates the name by which the subroutine is known in this distribution, followed by the name with which it will be known in the lexical context in which the use command is executed.

use String::Utils <root:common-start>;  # import "root" as "common-start"

say common-start <abcd abce abde>;  # ab

SUBROUTINES

abbrev

say abbrev(<yes no>).keys.sort;       # (n no y ye yes)

say abbrev(<foo bar baz>).keys.sort;  # (bar baz f fo foo)

Takes 0 or more strings as arguments, and returns a Map with all shortest possible unambigious versions of the given strings as keys, and their associated original string as the value.

Inspired by Text::Abbrev module by KamilaBorowska.

after

say after("foobar","foo");   # bar

say "foobar".&after("foo");  # bar

say after("foobar","goo");   # Nil

Return the string after a given string, or Nil if the given string could not be found. The equivalent of the stringification of / <?after foo> .* /.

all-same

say all-same("aaaaaa");                # "a"
say all-same("aaaaba");                # Nil
say all-same("");                      # Nil

If the given string consists of a single character, returns that character. Else returns Nil.

around

say around("foobarbaz","ob","rb");     # foaz

say "foobarbaz".&around("ob","rb");    # foaz

say around("foobarbaz","goo","baz");   # foobarbaz

Return the string around two given strings, or the string itself if either of the bounding strings could not be found. The equivalent of .subst: / <?after ob> .*? <?before rb> /.

before

say before("foobar","bar");   # foo

say "foobar".&before("bar");  # foo

say before("foobar","baz");   # Nil

Return the string before a given string, or Nil if the given string could not be found. The equivalent of the stringification of / .*? <?before bar> /.

between

say between("foobarbaz","foo","baz");   # bar

say "foobarbaz".&between("foo","baz");  # bar

say between("foobarbaz","goo","baz");   # Nil

Return the string between two given strings, or Nil if either of the bounding strings could not be found. The equivalent of the stringification of / <?after foo> .*? <?before baz> /.

between-included

say between-included("foobarbaz","oo","baz");   # oobarbaz

say "foobarbaz".&between-included("oo","baz");  # oobarbaz

say between-included("foobarbaz","goo","baz");  # Nil

Return the string between two given strings including the given strings, or Nil if either of the bounding strings could not be found. The equivalent of the stringification of / o .*? baz /.

chomp-needle

say chomp-needle("foobarbaz","baz");   # foobar

say "foobarbaz".&chomp-needle("baz");  # foobar

say chomp-needle("foobarbaz","bar");   # foobarbaz

Return the string without the given target string at the end, or the string itself if the target string is not at the end. The equivalent of .subst(/ baz $/).

consists-of

say consists-of("aaabbcc", "abc");     # True
say consists-of("aaadbcc", "abc");     # False
say consists-of("", "abc");            # True

Returns a Bool indicating whether the string given as the first positional argument only consists of characters given as the second positional argument, or is empty.

describe-Version

say $*VM.version;                    # v2025.06.5.gb.1.c.74.b.8.d.8
say describe-Version($*VM.version);  # 2025.06-5-gb1c74b8d8

The describe-Version subroutine takes a Version object and returns a string in the "git describe" format.

expand-tab

dd expand-tab("a\tbb\tccc",4);  # "a   bb  ccc"

Expand any tabs in a string (the first argument) to the given tab width (the second argument). If there are no tabs, then the given string will be returned unaltered.

If the tab width is zero or negative, will remove any tabs from the string. If the tab width is one, then all tabs will be replaced by spaces.

has-marks

say has-marks("foo👩🏽‍💻bar");             # False
say has-marks("fóöbar");               # True

Returns a Bool indicating whether the given string contains any alphanumeric characters with marks (accents).

is-lowercase

say is-lowercase("foobar");  # True
say is-lowercase("FooBar");  # False
say is-lowercase("");        # True

Returns a Bool indicating whether the string consists of just lowercase characters, or is empty.

is-sha1

say is-sha1 "abcd abce abde";  # False
say is-sha1 "356A192B7913B04C54574D18C28D46E6395428AB";  # True

Return a Bool indicating whether the given string is a SHA1 string (40 chars and only containing 0123456789ABCDEF).

is-uppercase

say is-uppercase("FOOBAR");  # True
say is-uppercase("FooBar");  # False
say is-uppercase("");        # True

Returns a Bool indicating whether the string consists of just uppercase characters, or is empty.

is-whitespace

say is-whitespace("\t \n");  # True
say is-whitespace("\ta\n");  # False
say is-whitespace("");       # True

Returns a Bool indicating whether the string consists of just whitespace characters, or is empty.

leading-whitespace

dd leading-whitespace("foo");      # ""
dd leading-whitespace(" \t foo");  # " \t "
dd leading-whitespace(" \t ");     # " \t "

Returns a Str containing any leading whitespace of the given string.

leaf

say leaf <zip.txt zop.txt ff.txt>;  # .txt

Return the common end of the given strings, or the empty string if no common string could be found. See also root.

letters

say letters("//foo:bar");  # foobar

Returns all of the alphanumeric characters in the given string as a string.

ngram

say ngram "foobar", 3;            # foo oob oba bar

say ngram "foobar", 4, :partial;  # foob ooba obar bar ar r

Return a sequence of substrings of the given size, while only moving up one position at a time in the original string. Optionally takes a :partial flag to also produce incomplete substrings at the end of the sequence.

nomark

say nomark("élève");  # eleve

Returns the given string with any diacritcs removed.

non-word

say non-word "foobar";   # False

say non-word "foo/bar";  # True

Returns a Bool indicating whether the string contained any non-word characters.

paragraphs

.say for paragraphs($path.IO.lines);   # …
.say for paragraphs("a\n\nb");         # 0 => a␤2 => b␤
.say for paragraphs("a\n\nb", 1);      # 1 => a␤3 => b␤

Lazily produces a Seq of Pairs with paragraphs from a Seq or string in which the key is the line number where the paragraph starts, and the value is the paragraph (without last trailing newline).

The optional second argument can be used to indicate the ordinal number of the first line in the string.

my class A is Pair { }
.say for paragraphs("a\n\nb", 1, :Pair(A));  # 1 => a␤␤3 => b␤

Also takes an optional named argument :Pair that indicates the class with which the objects should be created. This defailts to the core Pair class.

regexify

my $string = "foo";
my $regex  = regexify($string, :ignorecase);
say "FOOBAR" ~~ $regex;  # ｢FOO｣

Produce a Regex object from a given string and modifiers. Note that this is similar to the / <$string> / syntax. But opposed to that syntax, which interpolates the contents of the string each time the regex is executed, the Regex object returned by regexify is immutable.

The following modifiers are supported:

i / ignorecase

# accept haystack if "bar" is found, regardless of case
my $regex = regexify("bar", :i);  # or :ignorecase

Allow characters to match even if they are of mixed case.

smartcase

# accept haystack if "bar" is found, regardless of case
my &anycase = regexify("bar", :smartcase);

# accept haystack if "Bar" is found
my &exactcase = regexify("Bar", :smartcase);

If the needle is a string and does not contain any uppercase characters, then ignorecase semantics will be assumed.

m / ignoremark

# accept haystack if "bar" is found, regardless of any accents
my &anycase = regexify("bar", :m);  # or :ignoremark

Allow characters to match even if they have accents (or not).

smartmark

# accept haystack if "bar" is found, regardless of any accents
my &anymark = regexify("bar", :smartmark);

# accept haystack if "bår" is found
my &exactmark = regexify("bår", :smartmark);

If the needle is a string and does not contain any characters with accents, then ignoremark semantics will be assumed.

replace

say replace "ab42cd", / \d+ /, "666";  # ab666cd

Return the string with the first match of the specified regex replaced by the replacement string. Does not set $/, and can thus be up to 8x as fast as the equivalent .subst call.

replace-all

say replace-all "ab42cd88ef", / \d+ /, "666";  # ab666cd666ef

Return the string with the all matches of the specified regex replaced by the replacement string. Does not set $/, and can thus be up to 5x as fast as the equivalent .subst call.

root

say root <abcd abce abde>;  # ab

Return the common beginning of the given strings, or the empty string if no common string could be found. See also leaf.

sha1

say sha1("foo bar baz";  # C7567E8B39E2428E38BF9C9226AC68DE4C67DC39

Returns a SHA1 of the given string. It should only be used for simple identification uses, as it can no longer reliably serve in any cryptographic use.

shorten

say shorten("foobarbaz", 3);           # f…z
say shorten("foobarbaz", 4);           # f…az
say shorten("foobarbaz", 7);           # foo…baz
say shorten("foobarbaz",10);           # foobarbaz
say shorten("f",1),                    # f

Returns a shortened version of a given string, shortened to the maximum number of chars given. If this maximum is less than the number of chars in the string, than some characters from the start and from the end will be shown connected by the "…" character.

Specifying a maximum of less then 3 will result in a Failure if the given string was more than 3 characters long.

stem

say stem "foo.tar.gz";     # foo
say stem "foo.tar.gz", 1;  # foo.tar
say stem "foo.tar.gz", *;  # foo

Return the stem of a string with all of its extensions removed. Optionally accepts a second argument indicating the number of extensions to be removed. This may be * (aka Whatever) to indicate to remove all extensions.

text-from-url

my $text = text-from-url $url, :verbose;

Returns the text found at the given URL, or Nil if the fetch of the text failed for some reason. Takes an optional :verbose named argument: if specified with a trueish value, will show any error output that was received on STDERR: defaults to False, to quietly just return Nil on error.

Assumes the curl command-line program is installed and a network connection is available.

trailing-whitespace

dd trailing-whitespace("bar");      # ""
dd trailing-whitespace("bar \t ");  # " \t "
dd trailing-whitespace(" \t ");     # " \t "

Returns a Str containing any trailing whitespace of the given string.

word-at

say word-at("foo bar baz", 5);  # (4 3 1)

Returns a List with the start position, the number of characters of the word, and the ordinal number of the word found in the given string at the given position, or directly before it (using .words semantics).

Returns Empty if no word could be found at the given position, or the position was out of range.

word-diff

# from /usr/share/dict/words
say word-diff("maiden");  # (daimen damine maiden median)

my @words = <ace ach act add aft zip>;
say word-diff("cat", :@words);             # (act)
say word-diff("cat", :@words, :1letters);  # (ace ach aft)
say word-diff("cat", :@words, :2letters);  # (add)
say word-diff("cat", :@words, :3letters);  # (zip)
say word-diff("cat", :@words, :4letters);  # ()

Returns a (lazy) List of words that differ from a given word by a number of letters, but with the same number of characters.

Apart from the word (passed as the only positional argument) it also takes these named arguments:

:letters

The number of letters that should be different. Defaults to 0, making it essentially look for anagrams. In that case, the original word may occur in the result list as well.

:words

The word list that should be used to search in. Defaults to the word list found in "/usr/share/dict/words". If specified, should be a list or an IO object. In the latter case it will be assumed to be a file that contains a single word on each line.

AUTHOR

Elizabeth Mattijsen liz@raku.rocks

Source can be located at: https://github.com/lizmat/String-Utils . Comments and Pull Requests are welcome.

If you like this module, or what I’m doing more generally, committing to a small sponsorship would mean a great deal to me!

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.META		.META
.github		.github
doc		doc
lib/String		lib/String
t		t
xt		xt
.gitignore		.gitignore
Changes		Changes
LICENSE		LICENSE
META6.json		META6.json
README.md		README.md
dist.ini		dist.ini
run-tests		run-tests

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

NAME

SYNOPSIS

DESCRIPTION

SELECTIVE IMPORTING

SUBROUTINES

abbrev

after

all-same

around

before

between

between-included

chomp-needle

consists-of

describe-Version

expand-tab

has-marks

is-lowercase

is-sha1

is-uppercase

is-whitespace

leading-whitespace

leaf

letters

ngram

nomark

non-word

paragraphs

regexify

i / ignorecase

smartcase

m / ignoremark

smartmark

replace

replace-all

root

sha1

shorten

stem

text-from-url

trailing-whitespace

word-at

word-diff

:letters

:words

AUTHOR

COPYRIGHT AND LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages