diff --git a/_freeze/user-guide/type-mapping/characters/execute-results/html.json b/_freeze/user-guide/type-mapping/characters/execute-results/html.json new file mode 100644 index 0000000..728e9f9 --- /dev/null +++ b/_freeze/user-guide/type-mapping/characters/execute-results/html.json @@ -0,0 +1,15 @@ +{ + "hash": "4876597e0d5a972f10de82702b5692bc", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: \"Character Strings\"\n---\n\n::: {.cell}\n\n:::\n\n\nThe R runtime performs [string interning](https://en.wikipedia.org/wiki/String_interning) to\nall of its string elements. This means, that whenever R encounters a new string,\nit adds it to its internal string intern pool. Therefore, it is unsound to\naccess R strings mutably.\n\n::: {.callout-tip }\nA string intern pool can be thought of as a container that stores all distinct\nstrings, and then provides a lightweight reference counted variable back to it.\nAn example of such a string interner is the [`lasso`](https://crates.io/crates/lasso) crate.\n:::\n\nLet's look at a concrete example:\n\n\n::: {.cell}\n\n```{.rust .cell-code}\n#[extendr]\nfn hello_world() -> &'static str {\n \"Hello world!\"\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\n.Internal(inspect(hello_world()))\n#> @11c4bd628 16 STRSXP g0c1 [] (len=1, tl=0)\n#> @119641448 09 CHARSXP g0c2 [REF(2),gp=0x60,ATT] [ASCII] [cached] \"Hello world!\"\n```\n:::\n\n\nThen, any time R encounters `\"Hello world!\"`, it retrieves it from the pool, rather\nthan re-instantiate it\n\n\n::: {.cell}\n\n```{.r .cell-code}\n.Internal(inspect(\"Hello world!\"))\n#> @11b2f0780 16 STRSXP g0c1 [REF(2)] (len=1, tl=0)\n#> @119641448 09 CHARSXP g0c2 [MARK,REF(3),gp=0x60,ATT] [ASCII] [cached] \"Hello world!\"\n```\n:::\n\n\nThe `STRSXP` is different, due to R's clone semantics, but the underlying\nstring `CHARSXP` is the same. Thus, equality is determined if two strings\nhave the same pointer, rather than if they have the same bytes.\n\nTherefore, `extendr` does not provide mutable access to an R string, because it breaks\nthe assumption that all strings are the immutable.", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/user-guide/.gitignore b/user-guide/.gitignore new file mode 100644 index 0000000..20b9ac0 --- /dev/null +++ b/user-guide/.gitignore @@ -0,0 +1 @@ +_drafts/* diff --git a/user-guide/type-mapping/characters.qmd b/user-guide/type-mapping/characters.qmd index 01df022..5998856 100644 --- a/user-guide/type-mapping/characters.qmd +++ b/user-guide/type-mapping/characters.qmd @@ -1,3 +1,177 @@ --- title: "Character Strings" --- + +```{r} +#| echo: false +library(rextendr) +``` + +The standard type for a UTF-8 encoded string type is `String`. An example of +instantiating such a type + +```{extendr, echo=TRUE} +let mut rust_string = String::new(); +rust_string.push_str("Hello world!"); +rust_string +``` + +A direct translation of this to R is +```{r} +r_string <- "Hello world!" +r_string +``` + +Indeed, these are the same as they contain the same utf-8 bytes + +```{r} +charToRaw(r_string) +``` + +```{extendr} +let bytes = String::from("Hello world!"); +let bytes = bytes.as_bytes().to_owned(); +bytes +``` + + +Let us investigate the address of these two identical snippets of data + +```{extendrsrc} +#[extendr] +fn hello_world() -> &'static str { + let hello_world = "Hello World!"; + rprintln!("Address of the Rust `hello_world`: {:p}", hello_world.as_ptr()); + hello_world +} +``` + + +```{r} +hello_world() +``` + +And the adress of `hello_world`, once it is part of the R runtime: + +```{r} +.Internal(inspect(hello_world())) +``` + +::::: {.callout-note} +The return type of `hello_world` need not be `'static str`. The life-time can be made +arbitrary, such as `fn hello_world<'a>() -> &'a str`. +::: + +A `character`-vector in R could be compared to a `Vec` in Rust. However, there is an important distinction, that we'll illustrate with an example. + +```{extendr} +let states = ["Idaho", "Texas", "Maine"]; // 5 letter states in USA +let b_states = states.into_iter().map(|x| x.as_bytes()).flatten().collect::>(); +b_states +``` + +And in R + +```{r} +# charToRaw(c("Idaho", "Texas", "Maine")) // only uses first argument +vapply(c("Idaho", "Texas", "Maine"), charToRaw, FUN.VALUE = raw(5)) +``` + +But what about identity and permanence? Let us first look at an array of string types, but with repeated strings: + +```{extendr} +let sample_states = ["Texas", "Maine", "Maine", "Idaho", "Maine", "Maine"]; +sample_states.into_iter() + .map(|x| format!("{:p}", x.as_ptr())).collect::>() +``` + +and in R + +```{r} +sample_states <- c("Texas", "Maine", "Maine", "Idaho", "Maine", "Maine"); +.Internal(inspect(sample_states)) +``` + +Thus, `[&str]` and `character` behave similarly. Let's investigate `&[String]`: + + + +```{extendr} +[ + "Texas".to_string(), + "Maine".to_string(), + "Maine".to_string(), + "Idaho".to_string(), + "Maine".to_string(), + "Maine".to_string(), +] +.iter() +.map(|x| format!("{:p}", x.as_ptr())) +.collect::>() +``` + + + +```{extendr, echo=FALSE, eval=FALSE} +let sample_states = [ + "Texas", + "Maine", + "Maine", + "Idaho", + "Maine", + "Maine", +]; +let mut state_ptrs = Vec::with_capacity(sample_states.len()); +let mut state_strings = Vec::with_capacity(sample_states.len()); +for state in sample_states { + let mut x_string = String::with_capacity(5); + x_string.push_str(state); + state_ptrs.push(format!("{:p}", x_string.as_ptr())); + state_strings.push(x_string); +} +state_ptrs +``` + +The memory addresses of all the items are different, even for those entries that have the same value. + +Thus, R's `character` is actually more resembling that of `[&str]`, rather than a container of `String`. + + + +The R runtime performs [string interning](https://en.wikipedia.org/wiki/String_interning) to +all of its string elements. This means, that whenever R encounters a new string, +it adds it to its internal string intern pool. Therefore, it is unsound to +access R strings mutably. + +::: {.callout-tip } +A string intern pool can be thought of as a container that stores all distinct +strings, and then provides a lightweight reference counted variable back to it. +An example of such a string interner is the [`lasso`](https://crates.io/crates/lasso) crate. +::: + +Let's look at a concrete example: + +```{extendrsrc} +#[extendr] +fn hello_world() -> &'static str { + "Hello world!" +} +``` + +```{r} +.Internal(inspect(hello_world())) +``` + +Then, any time R encounters `"Hello world!"`, it retrieves it from the pool, rather +than re-instantiate it + +```{r} +.Internal(inspect("Hello world!")) +``` + +The `STRSXP` is different, due to R's clone semantics, but the underlying +string `CHARSXP` is the same. Thus, equality is determined if two strings +have the same pointer, rather than if they have the same bytes. + +Therefore, `extendr` does not provide mutable access to an R string, because it breaks +the assumption that all strings are the immutable. \ No newline at end of file