-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Optimize Hive hash
computation for nested types
#2720
base: branch-25.02
Are you sure you want to change the base?
[Draft] Optimize Hive hash
computation for nested types
#2720
Conversation
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Hive hash
computation for nested typesHive hash
computation for nested types
Signed-off-by: Yan Feng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you post the benchmark results here?
Maybe also how it failed on the string type and how to reproduce it if you need help on that issue.
#include <nvbench/nvbench.cuh> | ||
|
||
constexpr auto min_width = 10; | ||
constexpr auto max_width = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this two be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization effect is related to the schema, and I'm not sure how to write the benchmark for a good comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark should support running either (or both) lists and structs. You can switch benchmark types using a boolean flag. That flag should not be constexpr
so we can compile all the code to verify syntax.
auto const bench_structs = false;
auto const data_profile = [&] {
if (bench_structs) {
...
.struct_types(...);
return ...;
} else {
...
.list_type(...);
return ...;
}();
250'000'000, | ||
500'000'000, | ||
1'000'000'000}); // 50MB, 100MB, 250MB, 500MB, 1GB | ||
//.add_int64_axis("list_depth", {1, 2, 4}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remember to remove those commented code before it is ready for review
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
@@ -419,12 +452,12 @@ class hive_device_row_hasher { | |||
top.update_cur_hash(single_level_list_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not wrong, these two lines
top.update_cur_hash(single_level_list_hash);
if (--stack_size > 0) { col_stack[stack_size - 1].update_cur_hash(top.get_hash()); }
equals to
if (--stack_size > 0) { col_stack[stack_size - 1].update_cur_hash(single_level_list_hash); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not correct.
If the calculation is for a single-layer list
, removing top.update_cur_hash(single_level_list_hash);
will result in return col_stack[0].get_hash();
not giving the correct answer.
@@ -486,15 +526,77 @@ std::unique_ptr<cudf::column> hive_hash(cudf::table_view const& input, | |||
|
|||
check_nested_depth(input); | |||
|
|||
// `flattened_column_views` only contains nested columns and columns that result from flattening | |||
// nested columns | |||
std::vector<cudf::column_view> flattened_column_views; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column view constructor will calculate null count which is time consuming.
The original approach does not need to calculate null count.
We may need to find a way to avoid this column view array.
Please help check if the contiguous_copy_column_device_views
is helpful?
// `input` to the index in `flattened_column_views` | ||
std::vector<cudf::size_type> nested_column_map; | ||
|
||
for (auto i = 0; i < input.num_columns(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract a function like get_flattened_device_views
Fixed the struct(string) type reports illegal memory access error: diff --git a/src/main/cpp/src/hive_hash.cu b/src/main/cpp/src/hive_hash.cu
index ca720b5..5e9bb35 100644
--- a/src/main/cpp/src/hive_hash.cu
+++ b/src/main/cpp/src/hive_hash.cu
@@ -22,6 +22,7 @@
#include <cudf/structs/structs_column_view.hpp>
#include <cudf/table/experimental/row_operators.cuh>
#include <cudf/table/table_device_view.cuh>
+#include <cudf/table/table_device_view.cuh>
#include <rmm/cuda_stream_view.hpp>
#include <rmm/exec_policy.hpp>
@@ -566,17 +567,9 @@ std::unique_ptr<cudf::column> hive_hash(cudf::table_view const& input,
}
}
- std::vector<cudf::column_device_view> device_flattened_column_views;
- device_flattened_column_views.reserve(flattened_column_views.size());
-
- std::transform(
- flattened_column_views.begin(),
- flattened_column_views.end(),
- std::back_inserter(device_flattened_column_views),
- [&stream](auto const& col) { return *cudf::column_device_view::create(col, stream); });
+ [[maybe_unused]] auto [device_view_owners, device_flattened_column_views] =
+ cudf::contiguous_copy_column_device_views<cudf::column_device_view>(flattened_column_views, stream);
- auto flattened_column_device_views =
- cudf::detail::make_device_uvector_async(device_flattened_column_views, stream, mr);
auto first_child_index_view =
cudf::detail::make_device_uvector_async(first_child_index, stream, mr);
auto nested_column_map_view =
@@ -594,7 +587,7 @@ std::unique_ptr<cudf::column> hive_hash(cudf::table_view const& input,
output_view.end<hive_hash_value_t>(),
hive_device_row_hasher<hive_hash_function, bool>(nullable,
*input_view,
- flattened_column_device_views.data(),
+ device_flattened_column_views,
first_child_index_view.data(),
nested_column_map_view.data()));
Please apply the above patch. |
Signed-off-by: Yan Feng <[email protected]>
- [&stream](auto const& col) { return *cudf::column_device_view::create(col, stream); }); Remember to never dereference the output from |
cudf::column_device_view _column; // the column has only one row | ||
hive_hash_value_t _cur_hash; // current hash value of the column | ||
int _idx_to_process; // the index of child or element to process next | ||
cudf::size_type _col_idx; // the column has only one row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudf::size_type _col_idx; // the column has only one row | |
cudf::size_type _col_idx; // the column index in the flattened array |
auto first_child_index_view = | ||
cudf::detail::make_device_uvector_async(first_child_index, stream, mr); | ||
auto nested_column_map_view = | ||
cudf::detail::make_device_uvector_async(nested_column_map, stream, mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory resource mr
is only passed to the function that creates the the final output (or part of it). These variables are not returned thus we only pass in cudf::get_current_device_resource_ref()
.
Currently they are the same but this is required for future evolution of libcudf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please check if there are similar situations in the other places that need to be changed.
Main Optimization:
Flatten nested columns in advance, reducing the size of the
stack_frame
.Existing Issue:
string
.🤔string
, such as executing./build/build-in-docker package -Dtest=com.nvidia.spark.rapids.jni.HashTest#testHiveHashLists
,the following error occurs:
ai.rapids.cudf.CudaFatalException: parallel_for failed: cudaErrorIllegalAddress: an illegal memory access was encountered
.TODO:
Possible Further Optimization:
flattened_column_views
,first_child_index
andnested_column_map
to shared memory.xxhash64
does not require_cur_hash
, which presents a challenge for unification.Benchmark:
struct<BOOL8, INT32, FLOAT32>
list
with a depth oflist_depth
, with the basic type beingINT32