-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1767: [C++] Improve writing performance of encoded string column and support EncodedStringVectorBatch for StringColumnWriter #2010
Conversation
…dd/StringColumnWriter::add/CharColumnWriter::add
Thanks for opening a PR! Could you please create a JIRA for it? |
@wgtmac could you help review this pr, thanks very much |
@taiyang-li Hi, thank you for opening a PR! |
@ffacs I had measured about this. From flamegraph we could see that |
Oh yes, std::map takes |
@ffacs do you know where can I add google benchmark in orc ? |
I added a benchmark $ ./build_gcc/src/Common/benchmarks/orc_string_dictionary
2024-08-26T15:00:09+08:00
Running ./build_gcc/src/Common/benchmarks/orc_string_dictionary
Run on (32 X 2100 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 1024 KiB (x16)
L3 Unified 11264 KiB (x2)
Load Average: 6.35, 7.05, 6.55
------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------------------------------
BM_writeStringDictionary<OldSortedStringDictionary, 10> 74094421 ns 74091713 ns 9
BM_writeStringDictionary<NewSortedStringDictionary, 10> 73200223 ns 73197770 ns 9
BM_writeStringDictionary<OldSortedStringDictionary, 100> 137097736 ns 137091909 ns 5
BM_writeStringDictionary<NewSortedStringDictionary, 100> 86304767 ns 86301101 ns 8
BM_writeStringDictionary<OldSortedStringDictionary, 1000> 237995798 ns 237986072 ns 3
BM_writeStringDictionary<NewSortedStringDictionary, 1000> 101562538 ns 101559063 ns 7
BM_writeStringDictionary<OldSortedStringDictionary, 10000> 397019465 ns 397002121 ns 2
BM_writeStringDictionary<NewSortedStringDictionary, 10000> 153278355 ns 153274828 ns 4
BM_writeStringDictionary<OldSortedStringDictionary, 100000> 787423737 ns 787397582 ns 1
BM_writeStringDictionary<NewSortedStringDictionary, 100000> 334958714 ns 334938666 ns 2
|
|
@ffacs any new requested changes ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path almost LGTM other than the typo. Thank you @taiyang-li !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…and support EncodedStringVectorBatch for StringColumnWriter Improve writing performance of encoded string column and support EncodedStringVectorBatch for StringColumnWriter. Performance was measured in ClickHouse#15 original tests. Closes apache#2010 from taiyang-li/apache_improve_dict_write. Lead-authored-by: taiyang-li <[email protected]> Co-authored-by: 李扬 <[email protected]> Signed-off-by: ffacs <[email protected]>
…and support EncodedStringVectorBatch for StringColumnWriter Improve writing performance of encoded string column and support EncodedStringVectorBatch for StringColumnWriter. Performance was measured in ClickHouse#15 original tests. Closes apache#2010 from taiyang-li/apache_improve_dict_write. Lead-authored-by: taiyang-li <[email protected]> Co-authored-by: 李扬 <[email protected]> Signed-off-by: ffacs <[email protected]>
What changes were proposed in this pull request?
Improve writing performance of encoded string column and support EncodedStringVectorBatch for StringColumnWriter.
Performance was measured in ClickHouse#15
Why are the changes needed?
How was this patch tested?
original tests.
Was this patch authored or co-authored using generative AI tooling?