Skip to content

Commit b5a2f5c

Browse files
committed
Add utf8-converter.
1 parent ace76c1 commit b5a2f5c

File tree

5 files changed

+1104
-0
lines changed

5 files changed

+1104
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ A collection of useful algorithms written in Rust. Currently contains:
44

55
- [`geo_filters`](crates/geo_filters): probabilistic data structures that solve the [Distinct Count Problem](https://en.wikipedia.org/wiki/Count-distinct_problem) using geometric filters.
66
- [`bpe`](crates/bpe): fast, correct, and novel algorithms for the [Byte Pair Encoding Algorithm](https://en.wikipedia.org/wiki/Large_language_model#BPE) which are particularly useful for chunking of documents.
7+
- [`utf8-converter`](crates/utf8-converter): converts string positions between bytes, chars, UTF-16 code units, and line numbers. Useful when sending string indices across language boundaries.
78

89
## Background
910

crates/utf8-converter/Cargo.toml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[package]
2+
authors = ["The blackbird team <[email protected]>"]
3+
edition = "2021"
4+
name = "utf8-converter"
5+
version = "0.1.0"
6+
7+
[dependencies]
8+
itertools = "0.13"
9+
rand = "0.8"
10+
rand_chacha = "0.3"

crates/utf8-converter/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# UTF-8 Converter
2+
3+
This crate converts string positions between Rust style (UTF-8 byte offsets) and styles used by other programming languages, as well as line numbers.
4+
5+
## Usage
6+
7+
Add this to your `Cargo.toml`:
8+
9+
```toml
10+
[dependencies]
11+
utf8-converter = "0.1"
12+
```

0 commit comments

Comments
 (0)