Skip to content

Commit 885c9e8

Browse files
committed
I've made it worse, but this is my progress..
1 parent 22657c4 commit 885c9e8

File tree

1 file changed

+79
-0
lines changed

1 file changed

+79
-0
lines changed

user-guide/type-mapping/characters.qmd

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,85 @@ title: "Character Strings"
77
library(rextendr)
88
```
99

10+
The standard type for a UTF-8 encoded string type is `String`. An example of
11+
instantiating such a type
12+
13+
```{extendr, echo=TRUE}
14+
let mut rust_string = String::new();
15+
rust_string.push_str("Hello world!");
16+
rust_string
17+
```
18+
19+
A direct translation of this to R is
20+
```{r}
21+
r_string <- "Hello world!"
22+
r_string
23+
```
24+
25+
Indeed, these are the same as they contain the same utf-8 bytes
26+
27+
```{r}
28+
charToRaw(r_string)
29+
```
30+
31+
```{extendr}
32+
let bytes = String::from("Hello world!");
33+
let bytes = bytes.as_bytes().to_owned();
34+
bytes
35+
```
36+
37+
A `character`-vector in R could be compared to a `Vec<String>` in Rust. However, there is an important distinction, that we'll illustrate with an example.
38+
39+
```{extendr}
40+
let states = ["Idaho", "Texas", "Maine"]; // 5 letter states in USA
41+
let b_states = states.into_iter().map(|x| x.as_bytes()).flatten().collect::<Vec<_>>();
42+
b_states
43+
```
44+
45+
And in R
46+
47+
```{r}
48+
# charToRaw(c("Idaho", "Texas", "Maine")) // only uses first argument
49+
vapply(c("Idaho", "Texas", "Maine"), charToRaw, FUN.VALUE = raw(5))
50+
```
51+
52+
But what about identity and permanence? Let us first look at an array of string types, but with repeated strings:
53+
54+
```{extendr}
55+
let sample_states = ["Texas", "Maine", "Maine", "Idaho", "Maine", "Maine"];
56+
sample_states.into_iter()
57+
.map(|x| format!("{:p}", x.as_ptr())).collect::<Vec<_>>()
58+
```
59+
60+
and in R
61+
62+
```{r}
63+
sample_states <- c("Texas", "Maine", "Maine", "Idaho", "Maine", "Maine");
64+
.Internal(inspect(sample_states))
65+
```
66+
67+
Thus, `[&str]` and `character` behave similarly. Let's investigate `&[String]`:
68+
69+
<!-- @co-authors: This was the only way I could write this code without rustc optimising it out.. -->
70+
71+
```{extendr}
72+
[
73+
"Texas".to_string(),
74+
"Maine".to_string(),
75+
"Maine".to_string(),
76+
"Idaho".to_string(),
77+
"Maine".to_string(),
78+
"Maine".to_string(),
79+
]
80+
.iter()
81+
.map(|x| format!("{:p}", x.as_ptr()))
82+
.collect::<Vec<_>>()
83+
```
84+
85+
The memory addresses of all the items are different, even for those entries that have the same value.
86+
87+
Thus, R's `character` is actually more resembling that of `[&str]`, rather than a container of `String`.
88+
1089
The R runtime performs [string interning](https://en.wikipedia.org/wiki/String_interning) to
1190
all of its string elements. This means, that whenever R encounters a new string,
1291
it adds it to its internal string intern pool. Therefore, it is unsound to

0 commit comments

Comments
 (0)