Commit 8939726
authored
perf: Optimize NULL handling in
## Which issue does this PR close?
- Closes #21518.
## Rationale for this change
Similar to other recent changes, `substr` currently checks for NULLs and
builds the result NULL bitmap on a per-row basis. It is faster to
instead compute the result NULL bitmap in bulk via bitwise AND.
Benchmarks (ARM64):
```
- substr, no count, short strings/substr_large_string [size=1024]: 21.4µs → 20.9µs (-2.3%)
- substr, no count, short strings/substr_large_string [size=4096]: 83.1µs → 83.0µs (-0.1%)
- substr, no count, short strings/substr_string [size=1024]: 20.5µs → 19.8µs (-3.4%)
- substr, no count, short strings/substr_string [size=4096]: 78.8µs → 77.0µs (-2.3%)
- substr, no count, short strings/substr_string_view [size=1024]: 18.9µs → 16.1µs (-14.8%)
- substr, no count, short strings/substr_string_view [size=4096]: 74.0µs → 61.6µs (-16.8%)
- substr, scalar args, long strings/substr_large_string [size=1024]: 35.2µs → 34.0µs (-3.4%)
- substr, scalar args, long strings/substr_large_string [size=4096]: 140.6µs → 134.5µs (-4.3%)
- substr, scalar args, long strings/substr_string [size=1024]: 35.5µs → 33.8µs (-4.8%)
- substr, scalar args, long strings/substr_string [size=4096]: 138.9µs → 134.2µs (-3.4%)
- substr, scalar args, long strings/substr_string_view [size=1024]: 34.0µs → 31.0µs (-8.8%)
- substr, scalar args, long strings/substr_string_view [size=4096]: 132.0µs → 121.8µs (-7.7%)
- substr, scalar args, short strings/substr_string [size=1024]: 31.0µs → 29.2µs (-5.8%)
- substr, scalar args, short strings/substr_string [size=4096]: 120.8µs → 111.5µs (-7.7%)
- substr, scalar args, short strings/substr_string_view [size=1024]: 26.8µs → 23.1µs (-13.8%)
- substr, scalar args, short strings/substr_string_view [size=4096]: 101.6µs → 86.4µs (-14.9%)
- substr, scalar start, no count, long strings/substr_string [size=1024]: 34.5µs → 33.2µs (-3.8%)
- substr, scalar start, no count, long strings/substr_string [size=4096]: 134.4µs → 133.6µs (-0.6%)
- substr, scalar start, no count, long strings/substr_string_view [size=1024]: 32.9µs → 29.4µs (-10.6%)
- substr, scalar start, no count, long strings/substr_string_view [size=4096]: 126.1µs → 115.2µs (-8.6%)
- substr, scalar start, no count, short strings/substr_string [size=1024]: 20.9µs → 20.1µs (-3.8%)
- substr, scalar start, no count, short strings/substr_string [size=4096]: 80.1µs → 77.5µs (-3.2%)
- substr, scalar start, no count, short strings/substr_string_view [size=1024]: 19.9µs → 16.7µs (-16.1%)
- substr, scalar start, no count, short strings/substr_string_view [size=4096]: 74.4µs → 62.4µs (-16.1%)
- substr, short count, long strings/substr_large_string [size=1024]: 30.3µs → 28.4µs (-6.3%)
- substr, short count, long strings/substr_large_string [size=4096]: 117.1µs → 112.0µs (-4.4%)
- substr, short count, long strings/substr_string [size=1024]: 30.2µs → 28.3µs (-6.3%)
- substr, short count, long strings/substr_string [size=4096]: 118.0µs → 111.0µs (-5.9%)
- substr, short count, long strings/substr_string_view [size=1024]: 26.1µs → 22.8µs (-12.6%)
- substr, short count, long strings/substr_string_view [size=4096]: 101.5µs → 87.7µs (-13.6%)
- substr, with count, long strings/substr_large_string [size=1024]: 34.6µs → 32.8µs (-5.2%)
- substr, with count, long strings/substr_large_string [size=4096]: 136.7µs → 133.0µs (-2.7%)
- substr, with count, long strings/substr_string [size=1024]: 34.2µs → 32.7µs (-4.4%)
- substr, with count, long strings/substr_string [size=4096]: 136.6µs → 132.3µs (-3.1%)
- substr, with count, long strings/substr_string_view [size=1024]: 33.3µs → 30.3µs (-9.0%)
- substr, with count, long strings/substr_string_view [size=4096]: 129.1µs → 119.6µs (-7.4%)
```
## What changes are included in this PR?
* Implement optimization
* Rename `make_and_append_view` to `append_view`, and have callers deal
with NULL handling; making it part of `append_view` encourages per-row
NULL computations, which should be avoided when possible.
* Mark `append_view` as never-inline; this avoids a performance
regression on some of the `substr` microbenchmarks, where LLVM is a
little eager to inline a large-ish function into a hot loop.
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.substr (#21519)1 parent e8d217a commit 8939726
3 files changed
Lines changed: 38 additions & 49 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
| 155 | + | |
| 156 | + | |
162 | 157 | | |
163 | 158 | | |
164 | 159 | | |
| |||
204 | 199 | | |
205 | 200 | | |
206 | 201 | | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
| 202 | + | |
| 203 | + | |
214 | 204 | | |
215 | 205 | | |
216 | 206 | | |
| |||
261 | 251 | | |
262 | 252 | | |
263 | 253 | | |
264 | | - | |
| 254 | + | |
| 255 | + | |
265 | 256 | | |
266 | 257 | | |
267 | 258 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
372 | 372 | | |
373 | 373 | | |
374 | 374 | | |
375 | | - | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
376 | 378 | | |
377 | 379 | | |
378 | 380 | | |
| |||
381 | 383 | | |
382 | 384 | | |
383 | 385 | | |
384 | | - | |
385 | 386 | | |
386 | 387 | | |
387 | 388 | | |
388 | | - | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
389 | 394 | | |
390 | | - | |
391 | 395 | | |
392 | 396 | | |
393 | 397 | | |
| |||
401 | 405 | | |
402 | 406 | | |
403 | 407 | | |
404 | | - | |
405 | 408 | | |
406 | 409 | | |
407 | 410 | | |
408 | | - | |
409 | 411 | | |
410 | 412 | | |
411 | 413 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
| 23 | + | |
| 24 | + | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
278 | 278 | | |
279 | 279 | | |
280 | 280 | | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
281 | 287 | | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
290 | 291 | | |
291 | 292 | | |
292 | 293 | | |
293 | 294 | | |
294 | 295 | | |
295 | 296 | | |
296 | 297 | | |
297 | | - | |
298 | 298 | | |
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
302 | 302 | | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
| 303 | + | |
310 | 304 | | |
311 | 305 | | |
312 | 306 | | |
313 | | - | |
314 | 307 | | |
315 | 308 | | |
316 | 309 | | |
| |||
320 | 313 | | |
321 | 314 | | |
322 | 315 | | |
323 | | - | |
| 316 | + | |
324 | 317 | | |
325 | 318 | | |
326 | 319 | | |
| |||
336 | 329 | | |
337 | 330 | | |
338 | 331 | | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
339 | 338 | | |
340 | 339 | | |
341 | 340 | | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
| 341 | + | |
346 | 342 | | |
347 | 343 | | |
348 | 344 | | |
| |||
0 commit comments