Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeCSR reads past the end of user-provided buffers #455

Closed
Infinoid opened this issue May 6, 2021 · 0 comments · Fixed by #463
Closed

makeCSR reads past the end of user-provided buffers #455

Infinoid opened this issue May 6, 2021 · 0 comments · Fixed by #463

Comments

@Infinoid
Copy link
Contributor

Infinoid commented May 6, 2021

makeCSR() (the pointer variant) accesses memory past the end of the pos array. This array is passed in by the application, and isn't necessarily padded up in a way that makes this safe. This was causing segfaults for us on Summit, though I no longer have a perfect reproducer for that. Maybe the end of the buffer was perfectly page-aligned and hit the end of the memory map?

Even without the segfaults, we can see these warnings from valgrind that demonstrate the problem:

==463332== Invalid read of size 8
==463332==    at 0x4B4F0A4: taco::TypedComponentRef::getAsIndex() const (typed_value.cpp:370)
==463332==    by 0x4B42DA6: taco::Index::getSize() const (index.cpp:61)
==463332==    by 0x402B02: taco::TensorBase taco::makeCSR<double>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<int, std::allocator<int> > const&, int*, int*, double*, taco::Array::Policy) (tensor.h:760)
==463332==    by 0x402505: main (test.cpp:43)
==463332==  Address 0x5227f6c is 32,764 bytes inside a block of size 32,768 alloc'd
==463332==    at 0x483950F: operator new[](unsigned long) (vg_replace_malloc.c:431)
==463332==    by 0x4023EB: main (test.cpp:22)
==463332== 
==463332== Invalid read of size 8
==463332==    at 0x4B4F0A7: taco::TypedComponentRef::getAsIndex() const (typed_value.cpp:370)
==463332==    by 0x4B42DA6: taco::Index::getSize() const (index.cpp:61)
==463332==    by 0x402B02: taco::TensorBase taco::makeCSR<double>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<int, std::allocator<int> > const&, int*, int*, double*, taco::Array::Policy) (tensor.h:760)
==463332==    by 0x402505: main (test.cpp:43)
==463332==  Address 0x5227f74 is 4 bytes after a block of size 32,768 alloc'd
==463332==    at 0x483950F: operator new[](unsigned long) (vg_replace_malloc.c:431)
==463332==    by 0x4023EB: main (test.cpp:22)

What is happening is this:

  1. app calls makeCSR()
  2. makeCSR() calls index::getSize()
  3. getSize() looks up the last element of pos using a via a TypedComponentRef
  4. The TypedComponentRef reads a 128-bit chunk of memory into a union type (*ptr)
  5. The TypedComponent base class then extracts and returns the 32 bits it wants from the copied data in the union type.

The compiler is splitting up the 128-bit read into two 64-bit reads, which is what valgrind is complaining about. The first 64-bit read is reading 32 bits of valid data and 32 bits of invalid data ("32,764 bytes inside a block of size 32,768"); the second read is completely invalid ("4 bytes after a block of size 32,768 alloc'd").

I think the memory accesses could be cleaned up by passing a pointer into TypedComponent::getAsIndex and changing mem.int32Value to mem->int32Value, but I don't know what impact that change would have on everything else.

Here is the reproducer I used to get the valgrind errors listed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant