Skip to content

Optimize splitPart scalar function to reduce allocation#17660

Open
justahuman1 wants to merge 3 commits intoapache:masterfrom
justahuman1:split-part-malloc-fix
Open

Optimize splitPart scalar function to reduce allocation#17660
justahuman1 wants to merge 3 commits intoapache:masterfrom
justahuman1:split-part-malloc-fix

Conversation

@justahuman1
Copy link

@justahuman1 justahuman1 commented Feb 7, 2026

Optimized the splitPart function to use direct string traversal instead of allocating a full String[] array, significantly reducing memory pressure in high-throughput query scenarios.

Performance improvements:

  • Space complexity: O(n) → O(1)
  • Time complexity: O(n) for both implementations

Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case so I added splitPartNegativeIdxSingleCharDelim (b4272e2) which increases complexity but makes the kernel much faster

Addresses #17362

JMH Results

Yeah, the common cases are more than 10x faster 🚀 . The original results showed a regression with negative_index, hence b4272e2. However, this code may be harder to manage so that is the trade-off. The below results are from all commits in the PR

Operation Type New (ns/op) Old (ns/op) Speedup (Old/New)
small_index 777.8 38,722.2 ≈ 49.8× faster
large_index 295,474.5 3,867,775.5 ≈ 13.1× faster
negative_index 13.2 38,635.4 ≈ 2,929× faster
large_negative_index 47,797.4 3,824,985.5 ≈ 80× faster
adjacent_delimiters 779.9 3,864.4 ≈ 5× faster
Benchmark                                       (_scenario)  Mode  Cnt           Score          Error  Units
BenchmarkSplitPart.testSplitPartNew             small_index  avgt   10         777.797 ±       22.025  ns/op
BenchmarkSplitPart.testSplitPartNew             large_index  avgt   10      295474.514 ±     2876.945  ns/op
BenchmarkSplitPart.testSplitPartNew          negative_index  avgt   10          13.198 ±        2.553  ns/op
BenchmarkSplitPart.testSplitPartNew    large_negative_index  avgt   10       47797.384 ±      465.119  ns/op
BenchmarkSplitPart.testSplitPartNew     adjacent_delimiters  avgt   10         779.944 ±       14.194  ns/op
BenchmarkSplitPart.testSplitPartNew             many_fields  avgt   10        1935.942 ±       59.342  ns/op
BenchmarkSplitPart.testSplitPartNew        multi_char_delim  avgt   10        4281.167 ±      102.780  ns/op
BenchmarkSplitPart.testSplitPartNew  large_multi_char_delim  avgt   10     1160607.923 ±    20648.116  ns/op
BenchmarkSplitPart.testSplitPartOld             small_index  avgt   10       38722.193 ±     1463.000  ns/op
BenchmarkSplitPart.testSplitPartOld             large_index  avgt   10     3867775.470 ±   107083.008  ns/op
BenchmarkSplitPart.testSplitPartOld          negative_index  avgt   10       38635.403 ±     1325.020  ns/op
BenchmarkSplitPart.testSplitPartOld    large_negative_index  avgt   10     3824985.465 ±    89641.937  ns/op
BenchmarkSplitPart.testSplitPartOld     adjacent_delimiters  avgt   10        3864.391 ±      101.146  ns/op
BenchmarkSplitPart.testSplitPartOld             many_fields  avgt   10     3814644.277 ±   132924.021  ns/op
BenchmarkSplitPart.testSplitPartOld        multi_char_delim  avgt   10      182955.787 ±     3254.768  ns/op
BenchmarkSplitPart.testSplitPartOld  large_multi_char_delim  avgt   10  1043425954.300 ± 29614093.332  ns/op

Allocation via -prof gc

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.002 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.007 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.036 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  6616.001 B/op

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  304.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  3560.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  6616.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.044 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.045 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.046 B/op

* TODO: Revisit if index should be one-based (both Presto and Postgres use one-based index, which starts with 1)
* @param input
* @param delimiter
* @param input the input String to be split into parts.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied verbatim from the the overloaded splitPart function below to keep in sync.

@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from 33336f1 to 5a2b4f1 Compare February 7, 2026 20:53
{"org.apache.pinot.common.function", ".", 3, 3, "common", "null"},
{"+++++", "+", 0, 100, "", ""},
{"+++++", "+", 1, 100, "null", "null"},
{"+++++org++apache++", "", 1, 100, "null", "null"},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were missing cases I added to validate identical behavior to previous version

@@ -0,0 +1,160 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this if required. Added it for my own validation of the perf

@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from 5a2b4f1 to 24644cf Compare February 7, 2026 21:38
@justahuman1 justahuman1 changed the title Update splitPart scalar function to reduce allocation Optimize splitPart scalar function to reduce allocation Feb 10, 2026
@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 92.98246% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.33%. Comparing base (7eec9a7) to head (cb95a56).

Files with missing lines Patch % Lines
.../pinot/common/function/scalar/StringFunctions.java 92.98% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17660      +/-   ##
============================================
+ Coverage     63.29%   63.33%   +0.03%     
  Complexity     1466     1466              
============================================
  Files          3190     3190              
  Lines        191904   191955      +51     
  Branches      29390    29410      +20     
============================================
+ Hits         121474   121582     +108     
+ Misses        60940    60859      -81     
- Partials       9490     9514      +24     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.29% <92.98%> (+0.04%) ⬆️
java-21 63.24% <92.98%> (-0.02%) ⬇️
temurin 63.33% <92.98%> (+0.03%) ⬆️
unittests 63.33% <92.98%> (+0.03%) ⬆️
unittests1 55.60% <92.98%> (+<0.01%) ⬆️
unittests2 34.30% <0.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@justahuman1
Copy link
Author

I will debug these tests and update the PR. I will also try to get the existing coverage to 100% so we can be confident before migrating

@Jackie-Jiang
Copy link
Contributor

Try rebasing. The test failures might not be related

Updated the splitPart function to use direct string traversal instead of
allocating a full String[] array, significantly reducing memory pressure
in high-throughput query scenarios.

Performance improvements:
  - Space complexity: O(n) → O(1)
  - Time complexity: O(n) for both implementations

Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case with a large index value. This is the uncommon case and maybe worth the tradeoff since the memory allocation in the common case is now significantly reduced.
@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from b4272e2 to cb95a56 Compare March 7, 2026 04:16
@justahuman1
Copy link
Author

Done. I also added full test coverage which uncovered a behavior change with this new implementation, let me know your thoughts @Jackie-Jiang

Behavioral difference from old implementation

The new implementation differs from StringUtils.splitByWholeSeparator for empty input strings only. The old approach returns an empty array for "", treating it as having 0 fields.
The new implementation treats "" as having 1 field (the empty string itself), which is consistent with how non-empty strings without a matching delimiter are handled:

input delimiter index old impl new impl
"" "." 0 "null" ""
"" "." -1 "null" ""
"" "::" -1 "null" ""
"a" "." 0 "a" "a" (same)

I feel that the new implementation is more correct, rather than returning null. wdyt?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the splitPart scalar function in pinot-common to avoid allocating a full String[] split result, reducing memory pressure for high-throughput query workloads. It also adds a JMH benchmark and expands unit test coverage for additional edge cases.

Changes:

  • Reimplemented StringFunctions.splitPart(input, delimiter, index) using direct string traversal (including a specialized negative-index fast path for single-char delimiters).
  • Added JMH benchmarks to compare the new vs old splitPart implementations.
  • Added many new splitPart unit test cases covering edge scenarios (empty delimiter, consecutive delimiters, leading/trailing delimiters, multi-char delimiters, empty input).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java Replaces splitPart implementation with allocation-reduced traversal logic and adds a negative-index optimized helper.
pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java Expands splitPart test matrix for more delimiter/index boundary cases.
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkSplitPart.java Introduces a JMH benchmark to measure old vs new splitPart performance across scenarios.

Comment on lines +101 to +102
_delimiter = ":";
_input = buildString(50, repeatStr(_delimiter, 10));
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the multi_char_delim scenario, _delimiter is set to ":" but _input is built using repeatStr(_delimiter, 10) as the delimiter between fields. This means the benchmark isn't actually exercising a multi-character delimiter for splitPart (it passes a single-char delimiter that happens to appear consecutively), so the results can be misleading. Consider setting _delimiter to the repeated string (or use something like "::") and building _input with that same _delimiter.

Suggested change
_delimiter = ":";
_input = buildString(50, repeatStr(_delimiter, 10));
_delimiter = repeatStr(":", 10);
_input = buildString(50, _delimiter);

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +97
case "large_index":
// Extract 80th element from 100-field string
_delimiter = ",";
_input = buildString(10000, _delimiter);
_index = 790;
break;
case "negative_index":
// Extract last element from 100-field string
_delimiter = ",";
_input = buildString(100, _delimiter);
_index = -1;
break;
case "large_negative_index":
// Extract last element from 100-field string
_delimiter = ",";
_input = buildString(10000, _delimiter);
_index = -7242;
break;
case "adjacent_delimiters":
// String with many consecutive delimiters
_input = "field0+++field1+++field2+++field3";
_delimiter = "+";
_index = 1;
break;
case "many_fields":
// Extract 5th element from 1000-field string
_delimiter = ",";
_input = buildString(10000, _delimiter);
_index = 4;
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several scenario comments don't match the actual parameters (e.g., large_index/many_fields mention 100/1000 fields but build 10,000; large_index says 80th but uses index 790). Align the comments (or the parameters) so the benchmark scenarios remain self-explanatory.

Copilot uses AI. Check for mistakes.
start += delimLen;
}
// optimization for negative index with single-char delimiter since common case
// multi-char delimiter with negative index can be handled in future since less common and more complex
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-index will overflow when index == Integer.MIN_VALUE, causing incorrect behavior for that edge case (and potentially passing a negative value into the helper). Add an explicit guard for index == Integer.MIN_VALUE (e.g., return "null" early) or handle the absolute-value conversion without overflow.

Suggested change
// multi-char delimiter with negative index can be handled in future since less common and more complex
// multi-char delimiter with negative index can be handled in future since less common and more complex
if (index == Integer.MIN_VALUE) {
return "null";
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants