Skip to content

Commit

Permalink
Merge pull request #32 from condekind/docs/string_allocations
Browse files Browse the repository at this point in the history
New subsection in the string chapter about allocations
  • Loading branch information
MarcoGorelli authored Aug 21, 2024
2 parents 86bea37 + 7f15423 commit 89f149f
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 1 deletion.
32 changes: 31 additions & 1 deletion docs/stringify.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,34 @@ fn pig_latinnify(inputs: &[Series]) -> PolarsResult<Series> {
```

Simpler, faster, and more memory-efficient.
Thinking about allocations can really make a difference!
_Thinking about allocations_ can really make a difference!

## So let's think about allocations!

If you have an elementwise function which produces `String` output, then chances are it does one of the following:

- Creates a new string. In this case, you can use `apply_into_string_amortized` to amortise the cost of allocating a new string for each input row,
as we did above in `pig_latinnify`. This works by allocating a `String` upfront and then repeatedly re-writing to it.
- Slices the original string. In this case, you can use `apply_values` with `Cow::Borrowed`, for example:

```rust
fn remove_last_extension(s: &str) -> &str {
match s.rfind('.') {
Some(pos) => &s[..pos],
None => s,
}
}

#[polars_expr(output_type=String)]
fn remove_extension(inputs: &[Series]) -> PolarsResult<Series> {
let s = &inputs[0];
let ca = s.str()?;
let out: StringChunked = ca.apply_values(|val| {
let res = Cow::Borrowed(remove_last_extension(val));
res
});
Ok(out.into_series())
}
```

There are low-level optimisations you can do to take things further, but - if in doubt - `apply_into_string_amortized` / `binary_elementwise_into_string_amortized` are probably good enough.
9 changes: 9 additions & 0 deletions minimal_plugin/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,15 @@ def pig_latinnify(expr: IntoExpr) -> pl.Expr:
)


def remove_extension(expr: IntoExpr) -> pl.Expr:
return register_plugin_function(
args=[expr],
plugin_path=LIB,
function_name="remove_extension",
is_elementwise=True,
)


def abs_i64_fast(expr: IntoExpr) -> pl.Expr:
return register_plugin_function(
args=[expr],
Expand Down
7 changes: 7 additions & 0 deletions run.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,10 @@
result = df.with_columns(interpolate=mp.interpolate("a"))
print(result)


df = pl.DataFrame({
'filename': [
"requirements.txt", "Makefile", "pkg.tar.gz", "tmp.d"
],
})
print(df.with_columns(without_ext=mp.remove_extension('filename')))
18 changes: 18 additions & 0 deletions src/expressions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,24 @@ fn pig_latinnify(inputs: &[Series]) -> PolarsResult<Series> {
Ok(out.into_series())
}

fn remove_last_extension(s: &str) -> &str {
match s.rfind('.') {
Some(pos) => &s[..pos],
None => s,
}
}

#[polars_expr(output_type=String)]
fn remove_extension(inputs: &[Series]) -> PolarsResult<Series> {
let s = &inputs[0];
let ca = s.str()?;
let out: StringChunked = ca.apply_values(|val| {
let res = Cow::Borrowed(remove_last_extension(val));
res
});
Ok(out.into_series())
}

#[polars_expr(output_type=Int64)]
fn abs_i64_fast(inputs: &[Series]) -> PolarsResult<Series> {
let s = &inputs[0];
Expand Down

0 comments on commit 89f149f

Please sign in to comment.