-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Add a CreateSpanCollation method on SqliteConnection #36225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- add new method CreateSpanCollation Fixes dotnet#35236
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See some comments. Also, this PR lacks tests.
/// Zero if <paramref name="s1"/> is equal to <paramref name="s2"/>; | ||
/// Greater than zero if <paramref name="s1"/> follows <paramref name="s2"/>. | ||
/// </returns> | ||
public delegate int SpanDelegateCollation(in object? state, in ReadOnlySpan<char> s1, in ReadOnlySpan<char> s2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should state be generic (T
) instead of object?
/// Zero if <paramref name="s1"/> is equal to <paramref name="s2"/>; | ||
/// Greater than zero if <paramref name="s1"/> follows <paramref name="s2"/>. | ||
/// </returns> | ||
public delegate int SpanDelegateCollation(in object? state, in ReadOnlySpan<char> s1, in ReadOnlySpan<char> s2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have in
modifiers here? ReadOnlySpan
is very small, so passing it in by reference doesn't seem to make sense, and in fact will make things slower, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I was copying Func, but looking at methods in CompareInfo none of them use the in modifier either so I'll drop it here.
/// <param name="state">State object passed to each invocation of the collation.</param> | ||
/// <param name="comparison">Method that compares two char spans, using additional state.</param> | ||
/// <seealso href="https://docs.microsoft.com/dotnet/standard/data/sqlite/collation">Collation</seealso> | ||
public virtual void CreateSpanCollation<T>(string name, T state, SpanDelegateCollation? comparison) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have the special new name CreateSpanCollation
rather than just an overload of CreateCollation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make this a breaking change because now something like con.CreateCollation("name", null, (t, s1, s2) => s1.Length - s2.Length)
is ambiguous now. I'm not sure the policy on those kinds of breaking changes so that's why I didn't use the same name.
#if NET5_0_OR_GREATER | ||
delegate_collation? collation = comparison != null ? (v, s1, s2) => | ||
{ | ||
Span<char> s1Span = stackalloc char[Encoding.UTF8.GetCharCount(s1)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unsafe: stackalloc should never be used with an unbounded size. Unless I'm missing something, a huge string in the database could cause an arbitrary stack allocation and potentially cause a stack overflow, no?
Regardless, this copies/decodes the SQLite-provided ReadOnlySpan<byte>
to ReadOnlySpan<char>
; the point of this PR is to provide better performance, so that's not great. You should be able to simply reinterpret-cast using MemoryMarshal.Cast<byte, char>()
, but need to check that the memory is aligned first. I'm assuming that SQLite-provided memory will generally be aligned (this needs to be checked); if that's the case, we can fall back to inefficient copying for unaligned memory (if that's a rare case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right about the size issue. I'm not sure if there's an existing pattern but it seems like we should avoid a stack allocation if the size is over some limit? Not sure what that would be though.
As for copying, that's required as a char is utf16 and a byte is only 8 bits, so we can't just reinterpret a byte as a char because they have different sizes.
That said there's an argument for this whole method to use ReadOnlySpan<byte>
and not ReadOnlySpan<char>
, or we could make both? I'm not sure, it's not very easy to handle utf8 strings in dotnet and if someone is going down to that level then they can just call sqlite3_create_collation
themselves. Though it would be pretty easy for CreateCollation char to be implemented on top of the utf8 version.
@roji, resolved some comments. Sorry I didn't write tests at first, the test project wasn't loaded so I didn't find the tests which referenced CreateCollation. I've now copied the existing collation tests and rewritten them for the span version. |
closes #35236
Adds a new method to SqliteConnection which enables creating allocation free collation methods. Here's an example of using it:
There are no tests for the existing
CreateCollation
method, so it's unclear how or even if this new version should be tested also.