Skip to content

ParquetMetaData memory size is not reported accurately when encryption is enabled #8472

@alamb

Description

@alamb

Describe the bug
While working on #8470 I noticed that the API to report memory usage when encryption was used undercounts the actual memory used

ParquetMetaData::memory_size is used for memory accounting for in memory parquet caches, and thus should be accurate

To Reproduce
Specifically this function

pub fn memory_size(&self) -> usize {
std::mem::size_of::<Self>()
+ self.file_metadata.heap_size()
+ self.row_groups.heap_size()
+ self.column_index.heap_size()
+ self.offset_index.heap_size()

Does not account for the heap allocations in the file_decryptor field:

file_decryptor: Option<FileDecryptor>,

Expected behavior
ParquetMetaData::memory_size should report its actually heap allocation size (by implementing the HeapSize trait for FileDecryptor and all its subfields

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions