-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle UTF-8 code points in StdString
#381
Conversation
Looking into the CI failures |
Thanks, to me this looks good, I'm CC'ing @kleinschmidt since he recently contributed to the string code as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any major objections to this; it may be surprising to users that you can't construct a StdString
from a String
with null bytes but uh that's also probably a pretty small set of users so I think this is okay
I do think we shoudl be documenting this a bit more carefully, since it's been a footgun for us quite a few times recently!
I completely agree. I'll add a docstring to Additionally, I'll validate this PR's changes against the Ray.jl repo which currently implements a custom conversion function to losslessly convert |
I took another look into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in good shape. For the sake of completeness, could you add a short description to the README as well?
Validating these changes against Ray.jl works well and revealed two things:
I don't think either of these need to be addressed as part of this PR. Examples for those who like them: julia> using CxxWrap, Serialization
julia> collect(ConstCxxRef(StdString("foo")))
ERROR: MethodError: no method matching size(::CxxWrap.StdLib.StdStringDereferenced)
Closest candidates are:
size(::Union{LinearAlgebra.Adjoint{T, var"#s972"}, LinearAlgebra.Transpose{T, var"#s972"}} where {T, var"#s972"<:(AbstractVector)})
@ LinearAlgebra ~/Development/Julia/aarch64/1.9/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/adjtrans.jl:296
size(::Union{LinearAlgebra.Adjoint{T, var"#s972"}, LinearAlgebra.Transpose{T, var"#s972"}} where {T, var"#s972"<:(AbstractMatrix)})
@ LinearAlgebra ~/Development/Julia/aarch64/1.9/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/adjtrans.jl:297
size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted})
@ LinearAlgebra ~/Development/Julia/aarch64/1.9/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/qr.jl:582
...
julia> x = sprint(serialize, StdString("foo"))
"7JL\x14\x04\0\0\x005\x10\x01\x12StdStringAllocated\x1f\v\xf4}Iz\xf0\x891\xae*Zʗ<\xa4\x15\x1f\x01\aCxxWrap\x01\x06StdLibD4\x10T\x1fN\x9bD\x01\0\0\0\x10\x01\aNothing\x1fN\x9bD\0\0\0\0\0\0\0\0"
julia> deserialize(IOBuffer(x))
Error showing value of type CxxWrap.StdLib.StdStringAllocated:
ERROR: C++ object of type NSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE was deleted |
Should be good to go from my end. |
I have never tested serialization/deserialization with any C++ type, so that is a whole other can of worms. Thanks for these changes to you both! |
Update
StdString
Julia string interface to handle UTF-8 code points. For example on CxxWrap 0.14:This PR includes the following changes:
StdString
(https://en.m.wikipedia.org/wiki/C%2B%2B_string_handling#Standard_string_types)std::string (const char* s)
) separate fromStdString(::String)
.String
andStdString
.StdString
explaining the intended behaviour of the new constructors.