Skip to content

Add a CreateSpanCollation method on SqliteConnection #36225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hahn-kev
Copy link

@hahn-kev hahn-kev commented Jun 11, 2025

closes #35236

Adds a new method to SqliteConnection which enables creating allocation free collation methods. Here's an example of using it:

var compareInfo = cultureProvider.GetCompareInfo(writingSystem);
connection.CreateSpanCollation(
    writingSystem,
    compareInfo,
    static (compareInfo, x, y) => compareInfo.Compare(x, y, CompareOptions.IgnoreCase)
);
  • I've read the guidelines for contributing and seen the walkthrough
  • I've posted a comment on an issue with a detailed description of how I am planning to contribute and got approval from a member of the team
  • The code builds and tests pass locally (also verified by our automated build checks)
  • Commit messages follow this format:
        Summary of the changes
        - Detail 1
        - Detail 2

        Fixes #bugnumber
  • Tests for the changes have been added (for bug fixes / features)
  • Code follows the same patterns and style as existing code in this repo

There are no tests for the existing CreateCollation method, so it's unclear how or even if this new version should be tested also.

- add new method CreateSpanCollation

Fixes dotnet#35236
@hahn-kev hahn-kev requested a review from a team as a code owner June 11, 2025 04:15
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See some comments. Also, this PR lacks tests.

/// Zero if <paramref name="s1"/> is equal to <paramref name="s2"/>;
/// Greater than zero if <paramref name="s1"/> follows <paramref name="s2"/>.
/// </returns>
public delegate int SpanDelegateCollation(in object? state, in ReadOnlySpan<char> s1, in ReadOnlySpan<char> s2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should state be generic (T) instead of object?

/// Zero if <paramref name="s1"/> is equal to <paramref name="s2"/>;
/// Greater than zero if <paramref name="s1"/> follows <paramref name="s2"/>.
/// </returns>
public delegate int SpanDelegateCollation(in object? state, in ReadOnlySpan<char> s1, in ReadOnlySpan<char> s2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have in modifiers here? ReadOnlySpan is very small, so passing it in by reference doesn't seem to make sense, and in fact will make things slower, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I was copying Func, but looking at methods in CompareInfo none of them use the in modifier either so I'll drop it here.

/// <param name="state">State object passed to each invocation of the collation.</param>
/// <param name="comparison">Method that compares two char spans, using additional state.</param>
/// <seealso href="https://docs.microsoft.com/dotnet/standard/data/sqlite/collation">Collation</seealso>
public virtual void CreateSpanCollation<T>(string name, T state, SpanDelegateCollation? comparison)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have the special new name CreateSpanCollation rather than just an overload of CreateCollation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make this a breaking change because now something like con.CreateCollation("name", null, (t, s1, s2) => s1.Length - s2.Length) is ambiguous now. I'm not sure the policy on those kinds of breaking changes so that's why I didn't use the same name.

#if NET5_0_OR_GREATER
delegate_collation? collation = comparison != null ? (v, s1, s2) =>
{
Span<char> s1Span = stackalloc char[Encoding.UTF8.GetCharCount(s1)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unsafe: stackalloc should never be used with an unbounded size. Unless I'm missing something, a huge string in the database could cause an arbitrary stack allocation and potentially cause a stack overflow, no?

Regardless, this copies/decodes the SQLite-provided ReadOnlySpan<byte> to ReadOnlySpan<char>; the point of this PR is to provide better performance, so that's not great. You should be able to simply reinterpret-cast using MemoryMarshal.Cast<byte, char>(), but need to check that the memory is aligned first. I'm assuming that SQLite-provided memory will generally be aligned (this needs to be checked); if that's the case, we can fall back to inefficient copying for unaligned memory (if that's a rare case).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right about the size issue. I'm not sure if there's an existing pattern but it seems like we should avoid a stack allocation if the size is over some limit? Not sure what that would be though.

As for copying, that's required as a char is utf16 and a byte is only 8 bits, so we can't just reinterpret a byte as a char because they have different sizes.

That said there's an argument for this whole method to use ReadOnlySpan<byte> and not ReadOnlySpan<char>, or we could make both? I'm not sure, it's not very easy to handle utf8 strings in dotnet and if someone is going down to that level then they can just call sqlite3_create_collation themselves. Though it would be pretty easy for CreateCollation char to be implemented on top of the utf8 version.

@roji roji changed the title add a CreateSpanCollation method on SqliteConnection Add a CreateSpanCollation method on SqliteConnection Jun 11, 2025
@hahn-kev
Copy link
Author

@roji, resolved some comments.

Sorry I didn't write tests at first, the test project wasn't loaded so I didn't find the tests which referenced CreateCollation. I've now copied the existing collation tests and rewritten them for the span version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support span based collations comparison functions in Sqlite
2 participants