Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Reading corrupted encrypted Parquet files can cause a segfault #43070

Closed
adamreeve opened this issue Jun 26, 2024 · 2 comments
Closed
Assignees
Labels
Component: C++ Component: Parquet Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Milestone

Comments

@adamreeve
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

We experienced a segfault reading an encrypted Parquet file and traced this down to EVP_DecryptUpdate being called here with a negative value for the last parameter, which is the ciphertext length.

By debugging the problem I found that the length was being read from the file as zero, which became negative after subtracting length_buffer_length_, kNonceLength and kGcmTagLength.

This file was written with PyArrow but we suspect it may have become corrupted somehow. I would expect that this should result in a catchable exception rather than a segfault though.

Component(s)

C++, Parquet

@pitrou
Copy link
Member

pitrou commented Jun 27, 2024

Wow, this is horrid. Thanks for noticing this @adamreeve .

@pitrou pitrou added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Jun 27, 2024
@pitrou pitrou added this to the 17.0.0 milestone Jun 27, 2024
mapleFU pushed a commit that referenced this issue Jul 4, 2024
… segfault (#43071)

### Rationale for this change

See #43070

### What changes are included in this PR?

Checks that the ciphertext length is at least enough to hold the length (if written), nonce and GCM tag for the GCM cipher type.

Also enforces that the input ciphertext length parameter is provided (is > 0) and verifies that the ciphertext size read from the file isn't going to cause reads beyond the end of the ciphertext buffer.

### Are these changes tested?

Yes I've added new unit tests for this.

### Are there any user-facing changes?

No
* GitHub Issue: #43070

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: mwish <[email protected]>
@mapleFU
Copy link
Member

mapleFU commented Jul 4, 2024

Issue resolved by pull request 43071
#43071

@mapleFU mapleFU closed this as completed Jul 4, 2024
raulcd pushed a commit that referenced this issue Jul 4, 2024
… segfault (#43071)

### Rationale for this change

See #43070

### What changes are included in this PR?

Checks that the ciphertext length is at least enough to hold the length (if written), nonce and GCM tag for the GCM cipher type.

Also enforces that the input ciphertext length parameter is provided (is > 0) and verifies that the ciphertext size read from the file isn't going to cause reads beyond the end of the ciphertext buffer.

### Are these changes tested?

Yes I've added new unit tests for this.

### Are there any user-facing changes?

No
* GitHub Issue: #43070

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: mwish <[email protected]>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Jul 9, 2024
…revent segfault (apache#43071)

### Rationale for this change

See apache#43070

### What changes are included in this PR?

Checks that the ciphertext length is at least enough to hold the length (if written), nonce and GCM tag for the GCM cipher type.

Also enforces that the input ciphertext length parameter is provided (is > 0) and verifies that the ciphertext size read from the file isn't going to cause reads beyond the end of the ciphertext buffer.

### Are these changes tested?

Yes I've added new unit tests for this.

### Are there any user-facing changes?

No
* GitHub Issue: apache#43070

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: mwish <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Component: Parquet Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Projects
None yet
Development

No branches or pull requests

3 participants