GH-47520: [C++][Tensor] Correct sparse tensor creation from dense tensor with negative zero #47586

andishgar · 2025-09-17T13:14:16Z

Rationale for this change

As mentioned here case 1.

What changes are included in this PR?

Handle negative zero in sparse tensor creation.

Are these changes tested?

Yes, I ran the relevant unit tests.

Are there any user-facing changes?

No.

This PR contains a "Critical Fix".
(If the changes fix either (a) a security vulnerability, (b) a bug that causes incorrect or invalid data to be produced, or (c) a bug that causes a crash—even when the API contract is upheld—please provide an explanation. If not, you can remove this.)

Reference: case 1

GitHub Issue: [C++][Tensor] Incorrect logic for creating arrow::SparseCOOTensor from Tensor via arrow::SparseCOOTensor::Make (illegal memory access and incorrect values) #47520

github-actions · 2025-09-17T13:14:42Z

⚠️ GitHub issue #47520 has been automatically assigned in GitHub to PR creator.

andishgar · 2025-09-22T12:19:57Z

@rok, could you review this?

rok

I've done a high over pass and posted some questions. I want to do a deeper pass in more details later this week.

The templating approach seems more maintainable. We should make sure everything is tested.

cpp/src/arrow/tensor/converter.h

cpp/src/arrow/tensor/converter_internal.h

cpp/src/arrow/tensor/coo_converter.cc

cpp/src/arrow/tensor.h

cpp/src/arrow/tensor/coo_converter.cc

rok

Thanks for the update @andishgar !
I read through the logic and things look good to me. It's especially nice that element size calculations are no longer done by us. Some minor comments, but overall I think this is practically ready to merge.
I would now ask @pitrou to do a pass, especially on the template part.

cpp/src/arrow/tensor/csx_converter.cc

cpp/src/arrow/tensor.cc

cpp/src/arrow/sparse_tensor_test.cc

cpp/src/arrow/tensor/csx_converter.cc

rok

I have no further comments. I'll add this to the release milestone and hopefully @pitrou has time to review it before release. (Many Arrow contributors are going to PyData Paris next week so I'm not sure how long things will take.)

rok · 2025-10-06T10:18:38Z

@pitrou could you per-chance review this before release? (we've already done multiple iterations)

pitrou

Ok, I tried to read this PR but this is making many changes that do not seem related to the task of fixing a simple bug. Can we have a more minimal set of changes?

cpp/src/arrow/sparse_tensor_test.cc

andishgar · 2025-10-07T09:39:50Z

@pitrou Thanks for the feedback!

Ok, I tried to read this PR but this is making many changes that do not seem related to the task of fixing a simple bug.
Can we have a more minimal set of changes?

The issue is that the current zero-checking logic is incorrect across all sparse tensor creation paths, and that can lead to memory issues when tensors are created from negative zero values. So the fix touches several places that share this logic.

Do you think it would still make sense to split it further? (This PR is already a breakdown of a larger patch

pitrou · 2025-10-07T09:43:59Z

Ok, I'll take another look here.

cpp/src/arrow/tensor/converter.h

cpp/src/arrow/tensor/converter_internal.h

cpp/src/arrow/tensor/coo_converter.cc

cpp/src/arrow/tensor/converter.h

cpp/src/arrow/tensor/csf_converter.cc

cpp/src/arrow/tensor/csx_converter.cc

andishgar · 2025-10-07T19:12:26Z

@pitrou @rok
I’ve identified another issue while applying this change.It appears that NaN values are not handled correctly.
To clarify, the patches below demonstrate that NaN values are not properly managed during sparse tensor creation.
Would you prefer that I address this within the current pull request, or should I open a separate issue to track it?

diff --git a/cpp/src/arrow/sparse_tensor_test.cc b/cpp/src/arrow/sparse_tensor_test.cc
index c9c28a1..6aa5835 100644
--- a/cpp/src/arrow/sparse_tensor_test.cc
+++ b/cpp/src/arrow/sparse_tensor_test.cc
@@ -499,6 +499,8 @@ TYPED_TEST_P(TestFloatingSparseCOOTensorEquality, TestEquality) {
   ASSERT_OK_AND_ASSIGN(st4, SparseCOOTensor::Make(*this->tensor2_));
   EXPECT_FALSE(st4->Equals(*st4));                                  // same object
   EXPECT_TRUE(st4->Equals(*st4, EqualOptions().nans_equal(true)));  // same object
+  ASSERT_OK_AND_ASSIGN(auto my_tensor,st4->ToTensor());
+  ASSERT_TRUE(my_tensor->Equals(*this->tensor2_));
 
   std::vector<c_value_type> values5 = this->values2_;
   std::shared_ptr<SparseCOOTensor> st5;
@@ -955,6 +957,8 @@ TYPED_TEST_P(TestFloatingSparseCSRMatrixEquality, TestEquality) {
   ASSERT_OK_AND_ASSIGN(st4, SparseCSRMatrix::Make(*this->tensor2_));
   EXPECT_FALSE(st4->Equals(*st4));                                  // same object
   EXPECT_TRUE(st4->Equals(*st4, EqualOptions().nans_equal(true)));  // same object
+  ASSERT_OK_AND_ASSIGN(auto my_tensor,st4->ToTensor());
+  ASSERT_TRUE(my_tensor->Equals(*this->tensor2_));
 
   std::vector<c_value_type> values5 = this->values2_;
   std::shared_ptr<SparseCSRMatrix> st5;
@@ -1290,6 +1294,8 @@ TYPED_TEST_P(TestFloatingSparseCSCMatrixEquality, TestEquality) {
   ASSERT_OK_AND_ASSIGN(st4, SparseCSCMatrix::Make(*this->tensor2_));
   EXPECT_FALSE(st4->Equals(*st4));                                  // same object
   EXPECT_TRUE(st4->Equals(*st4, EqualOptions().nans_equal(true)));  // same object
+  ASSERT_OK_AND_ASSIGN(auto my_tensor,st4->ToTensor());
+  ASSERT_TRUE(my_tensor->Equals(*this->tensor2_));
 
   std::vector<c_value_type> values5 = this->values2_;
   std::shared_ptr<SparseCSCMatrix> st5;
@@ -1411,6 +1417,8 @@ TYPED_TEST_P(TestFloatingSparseCSFTensorEquality, TestEquality) {
   ASSERT_OK_AND_ASSIGN(st4, SparseCSFTensor::Make(*this->tensor2_));
   EXPECT_FALSE(st4->Equals(*st4));                                  // same object
   EXPECT_TRUE(st4->Equals(*st4, EqualOptions().nans_equal(true)));  // same object
+  ASSERT_OK_AND_ASSIGN(auto my_tensor,st4->ToTensor());
+  ASSERT_TRUE(my_tensor->Equals(*this->tensor2_));
 
   c_value_type values5[2][3][4][5] = {};
   std::copy_n(&this->values2_[0][0][0][0], this->length_ / sizeof(c_value_type),

pitrou · 2025-10-07T19:14:49Z

It depends, is the fix simple enough to be integrated here?

andishgar · 2025-10-07T19:23:52Z

It depends, is the fix simple enough to be integrated here?
hat casuses it

@pitrou
I’m not sure about the cause. I noticed the error while running the tests after applying your suggestion for COOTensor.
After looking into it further, I found that the same issue also affects the other sparse tensor formats.

rok · 2025-10-07T19:34:45Z

@andishgar how is my_tensor created in your test?

andishgar · 2025-10-07T19:42:06Z

@rok
Here is the complete version of the test.

TYPED_TEST_P(TestFloatingSparseCOOTensorEquality, TestEquality) {
  using ValueType = TypeParam;
  using c_value_type = typename ValueType::c_type;
  static_assert(is_floating_type<ValueType>::value, "Float type is required");

  std::shared_ptr<SparseCOOTensor> st1, st2, st3;
  ASSERT_OK_AND_ASSIGN(st1, SparseCOOTensor::Make(*this->tensor1_));
  ASSERT_OK_AND_ASSIGN(st2, SparseCOOTensor::Make(*this->tensor2_));
  ASSERT_OK_AND_ASSIGN(st3, SparseCOOTensor::Make(*this->tensor1_));

  ASSERT_TRUE(st1->Equals(*st1));
  ASSERT_FALSE(st1->Equals(*st2));
  ASSERT_TRUE(st1->Equals(*st3));

  // sparse tensors with NaNs
  const c_value_type nan_value = static_cast<c_value_type>(NAN);
  this->values2_[13] = nan_value;
  EXPECT_TRUE(std::isnan(this->tensor2_->Value({1, 0, 1})));

  std::shared_ptr<SparseCOOTensor> st4;
  ASSERT_OK_AND_ASSIGN(st4, SparseCOOTensor::Make(*this->tensor2_));
  EXPECT_FALSE(st4->Equals(*st4));                                  // same object
  EXPECT_TRUE(st4->Equals(*st4, EqualOptions().nans_equal(true)));  // same object
  ASSERT_OK_AND_ASSIGN(auto my_tensor,st4->ToTensor());
  ASSERT_TRUE(my_tensor->Equals(*this->tensor2_));

  std::vector<c_value_type> values5 = this->values2_;
  std::shared_ptr<SparseCOOTensor> st5;
  std::shared_ptr<Buffer> buffer5 = Buffer::Wrap(values5);
  NumericTensor<ValueType> tensor5(buffer5, this->shape_);
  ASSERT_OK_AND_ASSIGN(st5, SparseCOOTensor::Make(tensor5));
  EXPECT_FALSE(st4->Equals(*st5));                                  // different memory
  EXPECT_TRUE(st4->Equals(*st5, EqualOptions().nans_equal(true)));  // different memory
}

rok · 2025-10-07T19:59:53Z

This only happens if you have a nan in the tensor? By default nans are not considered equal.

andishgar · 2025-10-07T20:16:00Z

@rok You’re right — I added some extra code to handle validation, and the problem is solved.

rok · 2025-10-08T12:57:34Z

I'll look at this first thing tomorrow.

cpp/src/arrow/tensor/coo_converter.cc

cpp/src/arrow/tensor/csf_converter.cc

cpp/src/arrow/tensor.h

github-actions bot added Component: C++ awaiting review Awaiting review labels Sep 17, 2025

andishgar marked this pull request as draft September 17, 2025 13:15

andishgar marked this pull request as ready for review September 17, 2025 16:48

rok requested changes Sep 22, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 22, 2025

andishgar force-pushed the resolve_negative_zero_in_sparse_tensor_creataion branch from 09c2259 to 8e7620f Compare September 23, 2025 14:38

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 23, 2025

andishgar requested a review from rok September 23, 2025 15:38

rok reviewed Sep 25, 2025

View reviewed changes

cpp/src/arrow/tensor/csx_converter.cc Show resolved Hide resolved

cpp/src/arrow/tensor.cc Outdated Show resolved Hide resolved

cpp/src/arrow/sparse_tensor_test.cc Outdated Show resolved Hide resolved

rok requested a review from pitrou September 25, 2025 11:42

github-actions bot added awaiting review Awaiting review awaiting changes Awaiting changes awaiting committer review Awaiting committer review and removed awaiting change review Awaiting change review awaiting review Awaiting review awaiting changes Awaiting changes labels Sep 25, 2025

andishgar commented Sep 26, 2025

View reviewed changes

cpp/src/arrow/tensor/csx_converter.cc Show resolved Hide resolved

andishgar force-pushed the resolve_negative_zero_in_sparse_tensor_creataion branch from 8e7620f to c8c66e1 Compare September 26, 2025 08:41

andishgar requested a review from rok September 26, 2025 09:27

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 26, 2025

rok approved these changes Sep 26, 2025

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes awaiting merge Awaiting merge labels Sep 26, 2025

rok mentioned this pull request Oct 6, 2025

[C++][Tensor] Incorrect logic for creating arrow::SparseCOOTensor from Tensor via arrow::SparseCOOTensor::Make (illegal memory access and incorrect values) #47520

Open

pitrou requested changes Oct 7, 2025

View reviewed changes

cpp/src/arrow/sparse_tensor_test.cc Show resolved Hide resolved

cpp/src/arrow/sparse_tensor_test.cc Show resolved Hide resolved

cpp/src/arrow/sparse_tensor_test.cc Show resolved Hide resolved

pitrou requested changes Oct 7, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 7, 2025

andishgar added 5 commits October 8, 2025 15:47

resolve negative zero

7995113

Apply rok suggestion

205e132

apply rok suggestion

e780cb8

Apply +0.0,-0.0,0.0 to relevant test cases

6241290

apply pitrou suggestion

9e98736

andishgar force-pushed the resolve_negative_zero_in_sparse_tensor_creataion branch from a90bd1e to 9e98736 Compare October 8, 2025 12:18

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 8, 2025

andishgar requested review from pitrou and rok October 8, 2025 12:52

pitrou reviewed Oct 8, 2025

View reviewed changes

cpp/src/arrow/tensor/coo_converter.cc Show resolved Hide resolved

cpp/src/arrow/tensor/csf_converter.cc Show resolved Hide resolved

cpp/src/arrow/tensor.h Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 8, 2025

andishgar mentioned this pull request Oct 8, 2025

[C++][IPC][Tensor] Incorrect SparseTensorIndexCSF IPC format #47613

Closed

GH-47520: [C++][Tensor] Correct sparse tensor creation from dense tensor with negative zero #47586

Are you sure you want to change the base?

GH-47520: [C++][Tensor] Correct sparse tensor creation from dense tensor with negative zero #47586

Conversation

andishgar commented Sep 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Sep 17, 2025

Uh oh!

andishgar commented Sep 22, 2025

Uh oh!

rok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rok left a comment

Choose a reason for hiding this comment

Uh oh!

rok commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andishgar commented Oct 7, 2025

Uh oh!

pitrou commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andishgar commented Oct 7, 2025

Uh oh!

pitrou commented Oct 7, 2025

Uh oh!

andishgar commented Oct 7, 2025

Uh oh!

rok commented Oct 7, 2025

Uh oh!

andishgar commented Oct 7, 2025

Uh oh!

rok commented Oct 7, 2025

Uh oh!

andishgar commented Oct 7, 2025

Uh oh!

rok commented Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andishgar commented Sep 17, 2025 •

edited by github-actions bot

Loading

rok commented Oct 6, 2025 •

edited

Loading