Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JSI] Support non-copy jsi::String constructors #1597

Open
mrousavy opened this issue Jan 21, 2025 · 5 comments
Open

[JSI] Support non-copy jsi::String constructors #1597

mrousavy opened this issue Jan 21, 2025 · 5 comments
Labels
enhancement New feature or request

Comments

@mrousavy
Copy link
Contributor

Problem

Currently, all jsi::String factory methods (e.g. createFromAscii(...)) copy the given std::string/char*. This could cause unnecessary performance hits if the strings are large, or constexpr/statically known at compile-time

Solution

It would be cool to have a constructor/factory method that doesn't copy the given std::string/std::string_view/char*/char[N]/jsi::ArrayBuffer.

This would be beneficial in two situations:

  • For static/constexpr strings that do not change
  • For huge strings (like JSON strings) that are memory-managed elsewhere (aka "unsafely") to be passed to JS directly and synchronously, and then freed later. This avoids large copies in situations where performance could matter.

Additional Context

I don't know how this works internally in the Runtime, but is it even possible for the Hermes/JSC runtime to not have ownership on a string? I'm guessing strings are not represented using the C++ STL internally, so a conversion from std::string will always happen - but what about char*/char[N]?

@mrousavy mrousavy added the enhancement New feature or request label Jan 21, 2025
@neildhar
Copy link
Contributor

Such functionality doesn't currently exist, but we have discussed in the past. As you've described, the string provided through JSI is currently always copied.

One way this could be provided is to add a constructor of jsi::String that accepts a std::shared_ptr<jsi::Buffer> similar to what we have for jsi::ArrayBuffer. The jsi::Buffer implementation can then consume the string from any source you choose.

I'd be curious to hear more about your use case for such an API. For instance, what are the encodings that you would need it to support? Hermes internally only supports ASCII and UTF-16 strings, so supporting something like this for UTF-8 would be much more challenging.

@tmikov
Copy link
Contributor

tmikov commented Jan 22, 2025

Internally we support taking ownership of std::string and std::u16string, but this functionality isn't exposed through JSI. We could expose it, possibly using the Buffer API, as @neildhar mentioned. It would only make sense for very large strings (I haven't quantified it, but certainly over a 1 KB, perhaps even 16KB). Also note that for char strings there would be cost to validate that the contents are ASCII when "adopting" the string into the VM.

is it even possible for the Hermes/JSC runtime to not have ownership on a string

I can't speak for JSC, but in Hermes the question is moot, since "ownership" doesn't actually mean anything in C++. You control the pointer and the destructor of the jsi::Buffer or the deleter of std::shared_ptr<>. The only requirement for correctness is that the contents of the buffer must not change while the VM "believes" that it owns it.

@mrousavy
Copy link
Contributor Author

Hermes internally only supports ASCII and UTF-16 strings, so supporting something like this for UTF-8 would be much more challenging.

In my case I have a few ASCII constants that I'd love to micro-optimize.

I also had a scenario where I parsed a huge (around 10 KB) JSON file and passed it around between native and JS - I found a way to use ArrayBuffer for this instead, but it made the API a bit more annoying on the JS side.

@tmikov
Copy link
Contributor

tmikov commented Jan 22, 2025

In my case I have a few ASCII constants that I'd love to micro-optimize.

If they are constants they are likely small? Wouldn't it make more sense to add them to the VM once? The heap allocation of the Buffer and the shared_ptr is likely more expensive than copying of a small string.

I also had a scenario where I parsed a huge (around 10 KB) JSON file and passed it around between native and JS - I found a way to use ArrayBuffer for this instead, but it made the API a bit more annoying on the JS side.

Hmm. I have two questions here. First, 10KB doesn't sound that large. Are you passing many of these to the VM with high frequency? Second, how certain are you that it is ASCII? Usually it is risky to assume that a file is ASCII.

@mrousavy
Copy link
Contributor Author

Hm yea good point the strings aren't as big.

I thought about the use-case of embedding translation strings, but maybe it's better to keep those separate assets anyways.

And no, the JSON is not ASCII - I'm guessing it's UTF8.

I think you're right tho this is probably not a very big performance gain anyways, might've been a nice to have feature with the added risk of the unsafe ownership

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants