|
| 1 | +# TruffleString in TruffleRuby |
| 2 | + |
| 3 | +TruffleRuby uses `TruffleString` to represent Ruby Strings, but wraps them in either a RubyString or a ImmutableRubyString object. |
| 4 | + |
| 5 | +## Encodings Compatibility |
| 6 | + |
| 7 | +The notion of encodings compatibility is mostly the same between Ruby and TruffleString but differs in one point: |
| 8 | +* An empty Ruby String is always considered compatible with any other Ruby String of any encoding. |
| 9 | +* TruffleString does not consider whether a string is empty or not, and only look at their encodings and code range. |
| 10 | + |
| 11 | +As a result, to use TruffleString equality nodes, one needs to: |
| 12 | +1. Compute the compatible encoding with `NegotiateCompatibleStringEncodingNode` or `Primitive.encoding_ensure_compatible_str`. |
| 13 | +2. Check if both sides are empty, and if so return true before using TruffleString equality nodes. |
| 14 | + |
| 15 | +`StringHelperNodes.StringEqualInternalNode` is a good example showing what is needed. |
| 16 | + |
| 17 | +An example which would throw without empty checks is comparing an empty ISO-2022-JP (a dummy, non-ascii-compatible, fixed-width encoding) string with an empty US-ASCII string: |
| 18 | + |
| 19 | +```bash |
| 20 | +$ jt ruby -e '"".force_encoding("ISO-2022-JP") == ""' |
| 21 | +the given string is not compatible to the expected encoding "ISO_2022_JP", did you forget to convert it? (java.lang.IllegalArgumentException) |
| 22 | +``` |
| 23 | + |
| 24 | +## Logical vs Physical Byte Offsets |
| 25 | + |
| 26 | +We categorize a byte offset into a `TruffleString` as either *logical* or *physical*. |
| 27 | +A physical byte offset includes the offset from the `InternalByteArray` (`InternalByteArray#getOffset()`). |
| 28 | +A logical byte offset does not include that and is the semantic byte offset from the start of the string. |
| 29 | +Physical offsets are quite difficult to use and they are error-prone as they can be passed by mistake to a method taking a logical offset. |
| 30 | +So avoid physical offsets as much as possible, and therefore avoid `InternalByteArray#getArray()`. |
| 31 | + |
| 32 | +## Tests |
| 33 | + |
| 34 | +This is a good set of tests to run when touching String code: |
| 35 | +``` |
| 36 | +jt test integration strict-encoding-checks |
| 37 | +``` |
0 commit comments