Skip to content

Tracking Issue for ByteStr/ByteString #134915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
joshtriplett opened this issue Dec 30, 2024 · 20 comments
Open
1 of 3 tasks

Tracking Issue for ByteStr/ByteString #134915

joshtriplett opened this issue Dec 30, 2024 · 20 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@joshtriplett
Copy link
Member

joshtriplett commented Dec 30, 2024

Feature gate: #![feature(bstr)]

This is a tracking issue for the ByteStr/ByteString types, which represent human-readable strings that are usually, but not always, UTF-8. Unlike &str/String, these types permit non-UTF-8 contents, making them suitable for user input, non-native filenames (as Path only supports native filenames), and other applications that need to round-trip whatever data the user provides.

This was approved in ACP rust-lang/libs-team#502 .

Public API

// In core::bstr
#[repr(transparent)]
pub struct ByteStr(pub [u8]);

impl ByteStr {
    pub fn new<B: ?Sized + AsRef<[u8]>>(bytes: &B) -> &Self { ... }
}

impl Debug for ByteStr { ... }
impl Display for ByteStr { ... }
impl Deref for ByteStr { type Target = [u8]; ... }
impl DerefMut for ByteStr { ... }
// Other trait impls from bstr, including From impls

// In alloc::bstr
#[repr(transparent)]
pub struct ByteString(pub Vec<u8>);

impl Debug for ByteString { ... }
impl Display for ByteString { ... }
impl Deref for ByteString { type Target = Vec<u8>; ... }
impl DerefMut for ByteString { ... }
// Other trait impls from bstr, including From impls

Steps / History

Unresolved Questions

  • Should we call this BStr/BString, or ByteStr/ByteString? The former will be more familiar to users of the bstr crate in the ecosystem. The latter is more explicit, and avoids potential naming conflicts (making it easier to, for instance, add it to the prelude).
  • Should the Display impl use the Unicode replacement character, or do escaping like the Debug impl?

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@joshtriplett joshtriplett added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Dec 30, 2024
@joshtriplett joshtriplett changed the title Tracking Issue for BStr/BString Tracking Issue for ByteStr/ByteString Dec 30, 2024
@joshtriplett
Copy link
Member Author

In the course of implementing this, I'm addressing BurntSushi/bstr#190 : both the ByteStr and ByteString types will implement Index and IndexMut.

joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 3, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 4, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 4, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`, and
`From`, when those would be the second implementation for a type
(counting the `T` impl), to avoid potential inference failures. These
impls are important, but we can attempt to add them later in standalone
commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 4, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`, and
`From`, when those would be the second implementation for a type
(counting the `T` impl), to avoid potential inference failures. These
impls are important, but we can attempt to add them later in standalone
commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 4, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`, and
`From`, when those would be the second implementation for a type
(counting the `T` impl), to avoid potential inference failures. These
impls are important, but we can attempt to add them later in standalone
commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 4, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`, and
`From`, when those would be the second implementation for a type
(counting the `T` impl), to avoid potential inference failures. These
impls are important, but we can attempt to add them later in standalone
commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 5, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`, and
`From`, when those would be the second implementation for a type
(counting the `T` impl), to avoid potential inference failures. These
impls are important, but we can attempt to add them later in standalone
commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 5, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`,
`From`, and `PartialOrd`, when those would be the second implementation
for a type (counting the `T` impl) or otherwise may cause inference
failures. These impls are important, but we can attempt to add them
later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 6, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 7, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 7, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`,
`From`, and `PartialOrd`, when those would be the second implementation
for a type (counting the `T` impl) or otherwise may cause inference
failures. These impls are important, but we can attempt to add them
later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 7, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
joshtriplett added a commit to joshtriplett/rust that referenced this issue Jan 11, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`,
`From`, and `PartialOrd`, when those would be the second implementation
for a type (counting the `T` impl) or otherwise may cause inference
failures. These impls are important, but we can attempt to add them
later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 12, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 12, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 23, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
GrigorenkoPV referenced this issue in evmar/n2 Jan 23, 2025
There was a lobste.rs thread about this that was pretty disappointing,
given that the purpose of this project is the big ideas and this is a
pretty irrelevant detail.  Moving this here means it's not the first thing
you see clicking around on the project, and also puts the strings discussion
in context of the larger question of encoding handling.
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Jan 24, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang/rust#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
@clarfonthey
Copy link
Contributor

clarfonthey commented Jan 24, 2025

This is mostly just a nit, but it feels appropriate that the Debug impl would emit a b"..." string rather than a "..." string, to more closely resemble Rust syntax. It still wouldn't include the ByteStr(...) wrapper but it makes at least a little bit of sense and shouldn't cause too many issues.

Thinking about it, we should probably do the same for CStr as well, prefixing the string with c.

I'd be fine making a PR if this seems reasonable, or not if it just feels too pedantic.

github-actions bot pushed a commit to rust-lang/rustc-dev-guide that referenced this issue Jan 27, 2025
Implement `ByteStr` and `ByteString` types

Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang/rust#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, and `Borrow`,
when those would be the second implementation for a type (counting the
`T` impl), to avoid potential inference failures. We can attempt to add
more impls later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (`@BurntSushi).`

r? `@BurntSushi`
github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this issue Feb 20, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`,
`From`, and `PartialOrd`, when those would be the second implementation
for a type (counting the `T` impl) or otherwise may cause inference
failures. These impls are important, but we can attempt to add them
later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
github-actions bot pushed a commit to tautschnig/verify-rust-std that referenced this issue Feb 20, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`,
`From`, and `PartialOrd`, when those would be the second implementation
for a type (counting the `T` impl) or otherwise may cause inference
failures. These impls are important, but we can attempt to add them
later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
github-actions bot pushed a commit to carolynzech/rust that referenced this issue Feb 20, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`,
`From`, and `PartialOrd`, when those would be the second implementation
for a type (counting the `T` impl) or otherwise may cause inference
failures. These impls are important, but we can attempt to add them
later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue Feb 20, 2025
Approved ACP: rust-lang/libs-team#502
Tracking issue: rust-lang#134915

These types represent human-readable strings that are conventionally,
but not always, UTF-8. The `Debug` impl prints non-UTF-8 bytes using
escape sequences, and the `Display` impl uses the Unicode replacement
character.

This is a minimal implementation of these types and associated trait
impls. It does not add any helper methods to other types such as `[u8]`
or `Vec<u8>`.

I've omitted a few implementations of `AsRef`, `AsMut`, `Borrow`,
`From`, and `PartialOrd`, when those would be the second implementation
for a type (counting the `T` impl) or otherwise may cause inference
failures. These impls are important, but we can attempt to add them
later in standalone commits, and run them through crater.

In addition to the `bstr` feature, I've added a `bstr_internals` feature
for APIs provided by `core` for use by `alloc` but not currently
intended for stabilization.

This API and its implementation are based *heavily* on the `bstr` crate
by Andrew Gallant (@BurntSushi).
tamird added a commit to tamird/rust that referenced this issue Mar 14, 2025
This produces a deref chain of `CStr` -> `BStr` -> `[u8]` which is
present in the Rust-for-Linux analogues of these types.

Link: rust-lang#134915
Link: Rust-for-Linux/linux#1075
Link: https://lore.kernel.org/all/20250221142816.0c015e9f@eugeo/
Link: Rust-for-Linux/linux#1146
tamird added a commit to tamird/rust that referenced this issue Mar 14, 2025
This produces a deref chain of `CStr` -> `BStr` -> `[u8]` which is
present in the Rust-for-Linux analogues of these types.

Link: rust-lang#134915
Link: Rust-for-Linux/linux#1075
Link: https://lore.kernel.org/all/20250221142816.0c015e9f@eugeo/
Link: Rust-for-Linux/linux#1146
@HKalbasi
Copy link
Member

What about having a struct ByteChar(u8) as well? With that, we can change b"str" and b'c' types to ByteStr and ByteChar over an edition.

@jhpratt
Copy link
Member

jhpratt commented Mar 15, 2025

@HKalbasi see #110998, which is the closest you'll get realistically.

tamird added a commit to tamird/rust that referenced this issue Mar 16, 2025
This produces a deref chain of `CStr` -> `BStr` -> `[u8]` which is
present in the Rust-for-Linux analogues of these types.

Add `AsRef<ByteStr>` as well.

Link: rust-lang#134915
Link: Rust-for-Linux/linux#1075
Link: https://lore.kernel.org/all/20250221142816.0c015e9f@eugeo/
Link: Rust-for-Linux/linux#1146
@glandium
Copy link
Contributor

I don't think this type is very useful without adding methods to make it closer in API to str than [u8]. I would go as far as saying it probably shouldn't deref to [u8], for two reasons:

  • Methods of [u8] will return &[u8], not ByteStr. As someone using the bstr crate, this is the single most annoying aspect of its API: that for everything you do with it, you don't get a BStr out.
  • [u8] has methods with the same name as str methods, but with a different API. (example: split).

Well, I guess the deref makes it convenient to use where something takes a &[u8], but I would argue that str not having a deref to [u8] would be an indicator that maybe ByteStr shouldn't.

@thaliaarchi
Copy link
Contributor

@glandium For your first point, both of these type slice to ByteStr (and if #138381 merges, will implement SliceIndex, so will have unchecked variants).

joshtriplett: In the course of implementing this, I'm addressing BurntSushi/bstr#190 : both the ByteStr and ByteString types will implement Index and IndexMut.

@joshtriplett
Copy link
Member Author

joshtriplett commented Mar 22, 2025 via email

@jplatte
Copy link
Contributor

jplatte commented Mar 22, 2025

I very much want to add many such methods, taking a cue from the bstr crate. The initial addition of the type aimed for minimalism, but that doesn't mean that's the final state or goal.

I think the problem is that with the Deref impl, no extra methods shadowing slice methods can be added after stabilization.

@glandium
Copy link
Contributor

I think the problem is that with the Deref impl, no extra methods shadowing slice methods can be added after stabilization.

This is exactly what I had in mind but for some reason didn't spell out, and this being a tracking issue with no mention of anything else than the current implementation, it's hard to tell what the end goal is. Thankfully, @joshtriplett clarified, but updating the top comment would be welcome.

I very much want to add many such methods, taking a cue from the bstr crate.

I think it would be better to take cue from str rather than the bstr crate. One of my biggest gripes with the bstr crate is precisely that it doesn't match the str API. So much that I wrote a similar crate that does (I never found the time to polish it to publish it, though, maybe I should find that time somehow).

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Apr 5, 2025
…htriplett

Implement `SliceIndex` for `ByteStr`

Implement `Index` and `IndexMut` for `ByteStr` in terms of `SliceIndex`. Implement it for the same types that `&[u8]` supports (a superset of those supported for `&str`, which does not have `usize` and `ops::IndexRange`).

At the same time, move compare and index traits to a separate file in the `bstr` module, to give it more space to grow as more functionality is added (e.g., iterators and string-like ops). Order the items in `bstr/traits.rs` similarly to `str/traits.rs`.

cc `@joshtriplett`

`ByteStr`/`ByteString` tracking issue: rust-lang#134915
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Apr 5, 2025
Rollup merge of rust-lang#138381 - thaliaarchi:bstr-sliceindex, r=joshtriplett

Implement `SliceIndex` for `ByteStr`

Implement `Index` and `IndexMut` for `ByteStr` in terms of `SliceIndex`. Implement it for the same types that `&[u8]` supports (a superset of those supported for `&str`, which does not have `usize` and `ops::IndexRange`).

At the same time, move compare and index traits to a separate file in the `bstr` module, to give it more space to grow as more functionality is added (e.g., iterators and string-like ops). Order the items in `bstr/traits.rs` similarly to `str/traits.rs`.

cc `@joshtriplett`

`ByteStr`/`ByteString` tracking issue: rust-lang#134915
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue Apr 10, 2025
…htriplett

Implement `SliceIndex` for `ByteStr`

Implement `Index` and `IndexMut` for `ByteStr` in terms of `SliceIndex`. Implement it for the same types that `&[u8]` supports (a superset of those supported for `&str`, which does not have `usize` and `ops::IndexRange`).

At the same time, move compare and index traits to a separate file in the `bstr` module, to give it more space to grow as more functionality is added (e.g., iterators and string-like ops). Order the items in `bstr/traits.rs` similarly to `str/traits.rs`.

cc `@joshtriplett`

`ByteStr`/`ByteString` tracking issue: rust-lang#134915
@kornelski
Copy link
Contributor

Reposting for visibility rust-lang/libs-team#502 (comment)

Since ByteStr makes no assumptions about encoding, it can be useful for processing non-text binary data too. I suggest using byte-oriented Debug/Display instead of Unicode escapes or lossy replacement chars.

@joshtriplett
Copy link
Member Author

@kornelski BStr is intended for usually-Unicode text, not for arbitrary binary data. You certainly can abuse it for something else, but that's not its intended purpose, so I don't think that usage motivates any changes to its implementation.

If you're looking for additional methods to work with binary data, I'd suggest proposing those for Vec<u8> or similar.

@kornelski
Copy link
Contributor

find(pattern) has been proposed for [u8] on the internals. However, ByteStr already does exactly this in the implementation, only high-level intention is different. I'm not sure if it makes sense to have exactly the same methods with different intentions behind them:

https://internals.rust-lang.org/t/idea-add-some-binary-data-methods/22799

@futile
Copy link
Contributor

futile commented Apr 24, 2025

find(pattern) has been proposed for [u8] on the internals. However, ByteStr already does exactly this in the implementation, only high-level intention is different. I'm not sure if it makes sense to have exactly the same methods with different intentions behind them:

Maybe it could make sense to implement that on [u8] instead of ByteStr then, and use the Deref from ByteStr -> [u8] (just assuming it exists here, not sure) for ByteStr's api?

@clarfonthey
Copy link
Contributor

Deref is included; it's mentioned in the issue description. The goal is to offer as few methods as possible on ByteStr/ByteString, since they mostly exist to provide alternative Display and Debug implementations.

@tmandry
Copy link
Member

tmandry commented May 21, 2025

Moving my comment from rust-lang/libs-team#550 (comment) to here, where it's more appropriate.

ByteStr was discussed in the meeting, its purpose is to explicitly opt into treating maybe-utf8 as utf8-or-replacement output, so ByteStr::new(value) similar to calling value.display().

I'm not sure how much I buy this line of reasoning. The name ByteStr does not make it obvious the string is expected to be UTF8. We have several other kinds of Str types that are not UTF8, and the term "bytes" is often used to refer to binary data. While calling ByteStr::new() explicitly might cause some people to think twice about the invariants of the type, there are other ways to get one, like deserializing a struct that contains one.

This makes me think of a possible story where Rust didn't have a user's back:

  • Add String to a struct and deserialize the struct.
  • Encounter UTF8 errors in production and change the type to ByteString.
  • Later, someone else sees the field but needs a String and calls .to_string(), inadvertently roundtripping the value through a lossy path.

Again, this is made much worse by the fact that Display impls are inherently lossy, but the name to_string does not imply this.

@tmandry
Copy link
Member

tmandry commented May 21, 2025

From rust-lang/libs-team#550 (comment) (@joshtriplett):

@tmandry The entire point of ByteStr is to be "conventionally UTF-8". The type exists to serve that role. It is already widely used in the ecosystem as "bstr::BStr", including the behavior of Display and Debug. (Also, the right place for ByteStr discussion is on its own tracking issue.)

I get that the point of ByteStr is to be conventionally UTF-8; my point is I don't think the name does a good job of conveying this role.

I agree that to_string is not a great name. That should never have been based on Display. But that's a universal problem, and would also apply if someone called to_string on some other type that didn't convert losslessly to a string.

Yes, and its existence has caused other string types not to implement Display. Given that we have to_string when implementing Display, I think we should strive to make it as clear as possible when a user is going down a potentially-lossy path.

@jhpratt
Copy link
Member

jhpratt commented May 24, 2025

One nit: if the ByteStr naming is going to stick around instead of BStr, should the module still be named bstr? That seems unusual.

@clarfonthey
Copy link
Contributor

clarfonthey commented May 24, 2025

I think it would make sense for the module to be named either byte_str or bytes.

Although, honestly, now that I've thought about it more, I'm kind of in favour of just adding a display method to [u8] and Vec<u8> over this. One of the bigger problems with bstr is that there tends to be a lot of conversion boilerplate: converting BString back to Vec<u8>, converting &[u8] to BStr, and then all these questions about whether an API should accept one variant over the other. If we just make this a display() method then we get the desired functionality without having to do any conversions, with one notable exception: there isn't a Debug flag for opting into String-style display, so, perhaps {:s?} could offer that.

Would require likely an RFC or at least an MCP for the latter, though. Not sure if format! falls under lang or libs, but I assume land at least wants a say.

Another benefit of {:s?} is it also allows nested types just working properly, like sets and maps containing bytes. Whereas this requires wrapping them first.

@programmerjake
Copy link
Member

programmerjake commented May 24, 2025

a downside of {:s?} is you can't select which parts get displayed as byte strings and as arrays without manually implementing Debug, e.g. if you had:

struct Directory {
    files: HashMap<Vec<u8>, Vec<u8>>,
}

fn example_dir() -> Directory {
    let files = HashMap::from_iter([(b"foo.mp4".to_vec(), include_bytes!("foo.mp4").to_vec())]);
    Directory { files }
}

you'd want to print the keys as text but not the values, since they're binary data, not text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests