Kommons Text

About

Kommons Text is a Kotlin Multiplatform Library that offers:

the Unicode-aware string abstraction Text
a couple of string operations
regex operations such as the possibility to use glob patterns

Installation / setup

This library is hosted on GitHub with releases provided on Maven Central.

Gradle implementation("com.bkahlert.kommons:kommons-text:2.8.0")

Maven

<dependency>
    <groupId>com.bkahlert.kommons</groupId>
    <artifactId>kommons-text</artifactId>
    <version>2.8.0</version>
</dependency>

Features

Unicode Operations

Handling user input requires functions to handle Unicode correctly, unless you're not afraid of the following:

"👨🏾‍🦱".substring(0, 3) // "👨?", skin tone gone, curly hair gone
"👩‍👩‍👦‍👦".substring(1, 7) // "?‍👩‍?", wife gone, kids gone

Decode any string to a sequence / list of code points using String.asCodePointSequence / String.toCodePointList.

Decode any string to a sequence / list of graphemes using String.asGraphemeSequence / String.toGraphemeList.

Use truncate/truncateStart/truncateEnd for reduce the number of characters, codepoints or graphemes.

Transliterations and transforms can be done using String.transform.

Examples

"a".asCodePoint().name     // "LATIN SMALL LETTER A"
"a𝕓c̳🔤".toCharArray()      // "a", "?", "?", "c", "̳", "?", "?"
"a𝕓c̳🔤".toCodePointList()  // "a", "𝕓", "c", "̳", "🫠"
"a𝕓c̳🔤".toGraphemeList()   // "a", "𝕓", "c̳", "🫠"

"a𝕓🫠🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦".length                    // 27 (= number of Java chars)
"a𝕓🫠🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦".asText(CodePoint).length  // 16 (= number of Unicode code points)
"a𝕓🫠🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦".asText(Grapheme).length   //  6 (= visually perceivable units)

"a𝕓🫠🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦".truncate(7.characters)  // "a\uD835 … 👦"
"a𝕓🫠🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦".truncate(7.codePoints)  // "a𝕓 … ‍👦"
"a𝕓🫠🇩🇪👨🏾‍🦱👩‍👩‍👦‍👦".truncate(7.graphemes)   // "a𝕓 … 👨🏾‍🦱👩‍👩‍👦‍👦"

"© А-З Ä-ö-ß".transform("de_DE", "de_DE-ASCII")  // "(C) A-Z AE-oe-ss"

UTF-16 Char vs Code Point vs Grapheme Cluster

UTF-16	Char (Java, JavaScript, Kotlin, ...)	Unicode Code Point	Unicode Grapheme Cluster
\u0061	a (LATIN SMALL LETTER A)	a	a
\uD835 \uDD53	𝕓 (MATHEMATICAL DOUBLE-STRUCK SMALL B)	𝕓	𝕓
\uD83E \uDEE0 \uD83C \uDDE9	? (HIGH SURROGATES D83E) ? (LOW SURROGATES DEE0)	🫠 (MELTING FACE EMOJI)	🫠
\uD83C \uDDE9	? (HIGH SURROGATES D83C) ? (LOW SURROGATES DDE9)	[D] (REGIONAL INDICATOR SYMBOL LETTER D)	🇩🇪
\uD83C \uDDEA	? (HIGH SURROGATES D83C) ? (LOW SURROGATES DDEA)	[E] (REGIONAL INDICATOR SYMBOL LETTER E)
\uD83D \uDC68	? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC68)	👨 (MAN)	👨🏾‍🦱
\uD83C \uDFFE	? (HIGH SURROGATES D83C) ? (LOW SURROGATES DFFE)	🏾 (EMOJI MODIFIER FITZPATRICK TYPE-5)
\u200D	[ZWJ] (ZERO WIDTH JOINER)	[ZWJ] (ZERO WIDTH JOINER)
\uD83E \uDDB1	? (HIGH SURROGATES D83E) ? (LOW SURROGATES DDB1)	🦱 (EMOJI COMPONENT CURLY HAIR)
\uD83D \uDC69	? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC69)	👩 (WOMAN)	👩‍👩‍👦‍👦
\u200D	[ZWJ] (ZERO WIDTH JOINER)	[ZWJ] (ZERO WIDTH JOINER)
\uD83D \uDC69	? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC69)	👩 (WOMAN)
\u200D	[ZWJ] (ZERO WIDTH JOINER)	[ZWJ] (ZERO WIDTH JOINER)
\uD83D \uDC66	? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC66)	👦 (BOY)
\u200D	[ZWJ] (ZERO WIDTH JOINER)	[ZWJ] (ZERO WIDTH JOINER)
\uD83D \uDC66	? (HIGH SURROGATES D83D) ? (LOW SURROGATES DC66)	👦 (BOY)

String Operations

spaced/startSpaced/endSpaced: adds a space before and/or after a string if there isn't already one
toIdentifier: create an identifier from any string that resembles it
LineSeparators: many extension functions to work with usual and exotic Unicode line breaks.

Examples

"string".quoted              // "string"
"""{ bar: "baz" }""".quoted  // "{ bar: \"baz\" }"

"""
line 1
"line 2"
""".quoted                   // "line1\n\"line2\""

"\u001B[1mbold \u001B[34mand blue\u001B[0m".ansiRemoved
// "bold and blue"

"\u001B[34m↗\u001B(B\u001B[m \u001B]8;;https://example.com\u001B\\link\u001B]8;;\u001B\\".ansiRemoved
// "↗ link"

"string".spaced              // " string "

"bar".withPrefix("foo")      // "foobar"
"foo bar".withPrefix("foo")  // "foo bar"
"foo".withSuffix("bar")      // "foobar"

"1👋 xy-z".toIdentifier()     // "i__xy-z3"

randomString()
// returns "Ax-212kss0-xTzy5" (16 characters by default)

Capitalize / decapitalize strings using capitalize/decapitalize or manipulate the case style using toCasesString or any of its specializations.

Examples

"fooBar".capitalize()    // "FooBar"
"FooBar".decapitalize()  // "fooBar"

"FooBar".toCamelCasedString()           // "fooBar"
"FooBar".toPascalCasedString()          // "FooBar"
"FooBar".toScreamingSnakeCasedString()  // "FOO_BAR"
"FooBar".toKebabCasedString()           // "foo-bar"
"FooBar".toTitleCasedString()           // "Foo Bar"

enum class FooBar { FooBaz }

FooBar::class.simpleCamelCasedName           // "fooBar"
FooBar::class.simplePascalCasedName          // "FooBar"
FooBar::class.simpleScreamingSnakeCasedName  // "FOO_BAR"
FooBar::class.simpleKebabCasedName           // "foo-bar"
FooBar::class.simpleTitleCasedName           // "Foo Bar"

FooBar.FooBaz.camelCasedName           // "fooBaz"
FooBar.FooBaz.pascalCasedName          // "FooBaz"
FooBar.FooBaz.screamingSnakeCasedName  // "FOO_BAZ"
FooBar.FooBaz.kebabCasedName           // "foo-baz"
FooBar.FooBaz.titleCasedName           // "Foo Baz

Easily check edge-case with a fluent interface as does requireNotNull does:

Examples

requireNotEmpty("abc")  // passes and returns "abc"
requireNotBlank("   ")  // throws IllegalArgumentException
checkNotEmpty("abc")    // passes and returns "abc"
checkNotBlank("   ")    // throws IllegalStateException
"abc".takeIfNotEmpty()  // returns "abc"
"   ".takeIfNotBlank()  // returns null
"abc".takeUnlessEmpty() // returns "abc"
"   ".takeUnlessBlank() // returns null

Regular Operations

Regex can be authored as follows:

Regex("foo") + Regex("bar")      // Regex("foobar") 
Regex("foo") + "bar"             // Regex("foobar")

Regex("foo") or Regex("bar")     // Regex("foo|bar") 
Regex("foo") or "bar"            // Regex("foo|bar")

Regex.fromLiteralAlternates(     // Regex("\\[foo\\]|bar\\?")
    "[foo]", "bar?"
)

Regex("foo").optional()          // Regex("(?:foo)?") 
Regex("foo").repeatAny()         // Regex("(?:foo)*") 
Regex("foo").repeatAtLeastOnce() // Regex("(?:foo)+") 
Regex("foo").repeat(2, 5)        // Regex("(?:foo){2,5}") 

Regex("foo").group()             // Regex("(?:foo)") 
Regex("foo").group("name")       // Regex("(?<name>foo)")

Find matches easier:

// get group by name
Regex("(?<name>ba.)")
    .findAll("foo bar baz")
    .mapNotNull { it.groups["name"]?.value } // "bar", "baz"

// get group value by name
Regex("(?<name>ba.)")
    .findAll("foo bar baz")
    .map { it.groupValue("name") }           // "bar", "baz"

// find all values
Regex("(?<name>ba.)")
    .findAllValues("foo bar baz")            // "bar", "baz"

// match URLs / URIs
Regex.UrlRegex.findAll(/* ... */)
Regex.UriRegex.findAll(/* ... */)

Match multiline strings with simple glob patterns:

// matching within lines with wildcard
"foo.bar()".matchesGlob("foo.*")  // ✅

// matching across lines with multiline wildcard
"""
foo
  .bar()
  .baz()
""".matchesGlob(
    """
    foo
      .**()
    """.trimIndent()              // ✅
)

"""
foo
  .bar()
  .baz()
""".matchesGlob(
    """
    foo
      .*()
    """.trimIndent()              // ❌ (* doesn't match across lines)
)

Or, you can use matchesCurly if you prefer SLF4J / Logback style wildcards {} and {{}}.

Contributing

Want to contribute? Awesome! The most basic way to show your support is to star the project or to raise issues. You can also support this project by making a PayPal donation to ensure this journey continues indefinitely!

Thanks again for your support, it's much appreciated! 🙏

License

MIT. See LICENSE for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Kommons Text

About

Installation / setup

Features

Unicode Operations

Examples

UTF-16 Char vs Code Point vs Grapheme Cluster

String Operations

Examples

Examples

Examples

Regular Operations

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Kommons Text

About

Installation / setup

Features

Unicode Operations

Examples

UTF-16 Char vs Code Point vs Grapheme Cluster

String Operations

Examples

Examples

Examples

Regular Operations

Contributing

License