Kommons Text is a Kotlin Multiplatform Library that offers:
- the Unicode-aware string abstraction Text
- a couple of string operations
- regex operations such as the possibility to use glob patterns
This library is hosted on GitHub with releases provided on Maven Central.
-
Gradle
implementation("com.bkahlert.kommons:kommons-text:2.8.0")
-
Maven
<dependency> <groupId>com.bkahlert.kommons</groupId> <artifactId>kommons-text</artifactId> <version>2.8.0</version> </dependency>
Handling user input requires functions to handle Unicode correctly, unless you're not afraid of the following:
"👨🏾🦱".substring(0, 3) // "👨?", skin tone gone, curly hair gone
"👩👩👦👦".substring(1, 7) // "?👩?", wife gone, kids gone
Decode any string to a sequence / list of code points using String.asCodePointSequence
/ String.toCodePointList
.
Decode any string to a sequence / list of graphemes using String.asGraphemeSequence
/ String.toGraphemeList
.
Use truncate
/truncateStart
/truncateEnd
for reduce the number of characters, codepoints or graphemes.
Transliterations and transforms can be done using String.transform
.
"a".asCodePoint().name // "LATIN SMALL LETTER A"
"a𝕓c̳🔤".toCharArray() // "a", "?", "?", "c", "̳", "?", "?"
"a𝕓c̳🔤".toCodePointList() // "a", "𝕓", "c", "̳", "🫠"
"a𝕓c̳🔤".toGraphemeList() // "a", "𝕓", "c̳", "🫠"
"a𝕓🫠🇩🇪👨🏾🦱👩👩👦👦".length // 27 (= number of Java chars)
"a𝕓🫠🇩🇪👨🏾🦱👩👩👦👦".asText(CodePoint).length // 16 (= number of Unicode code points)
"a𝕓🫠🇩🇪👨🏾🦱👩👩👦👦".asText(Grapheme).length // 6 (= visually perceivable units)
"a𝕓🫠🇩🇪👨🏾🦱👩👩👦👦".truncate(7.characters) // "a\uD835 … 👦"
"a𝕓🫠🇩🇪👨🏾🦱👩👩👦👦".truncate(7.codePoints) // "a𝕓 … 👦"
"a𝕓🫠🇩🇪👨🏾🦱👩👩👦👦".truncate(7.graphemes) // "a𝕓 … 👨🏾🦱👩👩👦👦"
"© А-З Ä-ö-ß".transform("de_DE", "de_DE-ASCII") // "(C) A-Z AE-oe-ss"
UTF-16 | Char (Java, JavaScript, Kotlin, ...) |
Unicode Code Point |
Unicode Grapheme Cluster |
---|---|---|---|
\u0061 | a (LATIN SMALL LETTER A) | a | a |
\uD835 \uDD53 |
𝕓 (MATHEMATICAL DOUBLE-STRUCK SMALL B) | 𝕓 | 𝕓 |
\uD83E \uDEE0 \uD83C \uDDE9 |
? (HIGH SURROGATES D83E) ? (LOW SURROGATES DEE0) |
🫠 (MELTING FACE EMOJI) | 🫠 |
\uD83C \uDDE9 |
? (HIGH SURROGATES D83C) ? (LOW SURROGATES DDE9) |
[D] (REGIONAL INDICATOR SYMBOL LETTER D) | 🇩🇪 |
\uD83C \uDDEA |
? (HIGH SURROGATES D83C) ? (LOW SURROGATES DDEA) |
[E] (REGIONAL INDICATOR SYMBOL LETTER E) | |
\uD83D \uDC68 |
? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC68) |
👨 (MAN) | 👨🏾🦱 |
\uD83C \uDFFE |
? (HIGH SURROGATES D83C) ? (LOW SURROGATES DFFE) |
🏾 (EMOJI MODIFIER FITZPATRICK TYPE-5) | |
\u200D | [ZWJ] (ZERO WIDTH JOINER) | [ZWJ] (ZERO WIDTH JOINER) | |
\uD83E \uDDB1 |
? (HIGH SURROGATES D83E) ? (LOW SURROGATES DDB1) |
🦱 (EMOJI COMPONENT CURLY HAIR) | |
\uD83D \uDC69 |
? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC69) |
👩 (WOMAN) | 👩👩👦👦 |
\u200D | [ZWJ] (ZERO WIDTH JOINER) | [ZWJ] (ZERO WIDTH JOINER) | |
\uD83D \uDC69 |
? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC69) |
👩 (WOMAN) | |
\u200D | [ZWJ] (ZERO WIDTH JOINER) | [ZWJ] (ZERO WIDTH JOINER) | |
\uD83D \uDC66 |
? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC66) |
👦 (BOY) | |
\u200D | [ZWJ] (ZERO WIDTH JOINER) | [ZWJ] (ZERO WIDTH JOINER) | |
\uD83D \uDC66 |
? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC66) |
👦 (BOY) |
spaced
/startSpaced
/endSpaced
: adds a space before and/or after a string if there isn't already onetoIdentifier
: create an identifier from any string that resembles it- LineSeparators: many extension functions to work with usual and exotic Unicode line breaks.
"string".quoted // "string"
"""{ bar: "baz" }""".quoted // "{ bar: \"baz\" }"
"""
line 1
"line 2"
""".quoted // "line1\n\"line2\""
"\u001B[1mbold \u001B[34mand blue\u001B[0m".ansiRemoved
// "bold and blue"
"\u001B[34m↗\u001B(B\u001B[m \u001B]8;;https://example.com\u001B\\link\u001B]8;;\u001B\\".ansiRemoved
// "↗ link"
"string".spaced // " string "
"bar".withPrefix("foo") // "foobar"
"foo bar".withPrefix("foo") // "foo bar"
"foo".withSuffix("bar") // "foobar"
"1👋 xy-z".toIdentifier() // "i__xy-z3"
randomString()
// returns "Ax-212kss0-xTzy5" (16 characters by default)
Capitalize / decapitalize strings using capitalize
/decapitalize
or
manipulate the case style using toCasesString
or any of its specializations.
"fooBar".capitalize() // "FooBar"
"FooBar".decapitalize() // "fooBar"
"FooBar".toCamelCasedString() // "fooBar"
"FooBar".toPascalCasedString() // "FooBar"
"FooBar".toScreamingSnakeCasedString() // "FOO_BAR"
"FooBar".toKebabCasedString() // "foo-bar"
"FooBar".toTitleCasedString() // "Foo Bar"
enum class FooBar { FooBaz }
FooBar::class.simpleCamelCasedName // "fooBar"
FooBar::class.simplePascalCasedName // "FooBar"
FooBar::class.simpleScreamingSnakeCasedName // "FOO_BAR"
FooBar::class.simpleKebabCasedName // "foo-bar"
FooBar::class.simpleTitleCasedName // "Foo Bar"
FooBar.FooBaz.camelCasedName // "fooBaz"
FooBar.FooBaz.pascalCasedName // "FooBaz"
FooBar.FooBaz.screamingSnakeCasedName // "FOO_BAZ"
FooBar.FooBaz.kebabCasedName // "foo-baz"
FooBar.FooBaz.titleCasedName // "Foo Baz
Easily check edge-case with a fluent interface as does requireNotNull
does:
requireNotEmpty("abc") // passes and returns "abc"
requireNotBlank(" ") // throws IllegalArgumentException
checkNotEmpty("abc") // passes and returns "abc"
checkNotBlank(" ") // throws IllegalStateException
"abc".takeIfNotEmpty() // returns "abc"
" ".takeIfNotBlank() // returns null
"abc".takeUnlessEmpty() // returns "abc"
" ".takeUnlessBlank() // returns null
Regex
can be authored as follows:
Regex("foo") + Regex("bar") // Regex("foobar")
Regex("foo") + "bar" // Regex("foobar")
Regex("foo") or Regex("bar") // Regex("foo|bar")
Regex("foo") or "bar" // Regex("foo|bar")
Regex.fromLiteralAlternates( // Regex("\\[foo\\]|bar\\?")
"[foo]", "bar?"
)
Regex("foo").optional() // Regex("(?:foo)?")
Regex("foo").repeatAny() // Regex("(?:foo)*")
Regex("foo").repeatAtLeastOnce() // Regex("(?:foo)+")
Regex("foo").repeat(2, 5) // Regex("(?:foo){2,5}")
Regex("foo").group() // Regex("(?:foo)")
Regex("foo").group("name") // Regex("(?<name>foo)")
Find matches easier:
// get group by name
Regex("(?<name>ba.)")
.findAll("foo bar baz")
.mapNotNull { it.groups["name"]?.value } // "bar", "baz"
// get group value by name
Regex("(?<name>ba.)")
.findAll("foo bar baz")
.map { it.groupValue("name") } // "bar", "baz"
// find all values
Regex("(?<name>ba.)")
.findAllValues("foo bar baz") // "bar", "baz"
// match URLs / URIs
Regex.UrlRegex.findAll(/* ... */)
Regex.UriRegex.findAll(/* ... */)
Match multiline strings with simple glob patterns:
// matching within lines with wildcard
"foo.bar()".matchesGlob("foo.*") // ✅
// matching across lines with multiline wildcard
"""
foo
.bar()
.baz()
""".matchesGlob(
"""
foo
.**()
""".trimIndent() // ✅
)
"""
foo
.bar()
.baz()
""".matchesGlob(
"""
foo
.*()
""".trimIndent() // ❌ (* doesn't match across lines)
)
Or, you can use matchesCurly
if you prefer SLF4J / Logback style
wildcards {}
and {{}}
.
Want to contribute? Awesome! The most basic way to show your support is to star the project or to raise issues. You can also support this project by making a PayPal donation to ensure this journey continues indefinitely!
Thanks again for your support, it's much appreciated! 🙏
MIT. See LICENSE for more details.