Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption problem in large file parsing. #866

Closed
ChaithraMahadeva opened this issue Mar 25, 2021 · 5 comments
Closed

Memory consumption problem in large file parsing. #866

ChaithraMahadeva opened this issue Mar 25, 2021 · 5 comments
Labels

Comments

@ChaithraMahadeva
Copy link

Hello,

I am trying to parse Measurement Data Format file of size 4 TB and perform some operations on the parsed data, the application which is written based on the C++ parser generated from kaitai struct compiler in the target language Cpp-STL with C++ 11 compatibility works fine but with an exception that it consumes more memory as user defined types(classes in the generated library) in .ksy is created as parsing continues through out the file and only be destroyed at the end.
The application works with good performance for files with size upto 4TB(as I have tested on few files), but as the parsing objects increases for the file, the memory is increasing linearly and already parsed objects saved in memory will not allow me parse more(when max memory available is reached) as all of them will be destroyed at the end.

Is there any way I can destroy the objects that have been parsed to manage memory consumption of my application before the parsing actually ends. Solution to this would help me to strengthen application to parse file of any size and as many components(user types defined in .ksy) available in the file.

Appreciate your help and suggestion.

Thank you.

Regards,
Chaithra

@KOLANICH
Copy link

KOLANICH commented Mar 27, 2021

#65

Is there any way I can destroy the objects that have been parsed to manage memory consumption of my application before the parsing actually ends.

You can try to patch the generated files manually. But I have not succeeded with that.

Here is the spec that suffers the similar issues (for extracting the info about offsets of a 3 GiB Qt Windows installer it takes 12 GiB of RAM, manual patching to free aggressively haven't helped): kaitai-io/kaitai_struct_formats#314

@ChaithraMahadeva
Copy link
Author

I have tried patching generated files manually, still I am facing the same issue. Could you provide me some ideas which can help me destroy the objects created while parsing through the file before parsing is completed.

@generalmimon
Copy link
Member

generalmimon commented Apr 23, 2021

@ChaithraMahadeva See #730 (comment) and #255

Also maybe @webbnh can help you with your specific needs in case what you find in the referenced issues isn't enough - he knows a thing or two about parsing large files in C++.

@ChaithraMahadeva
Copy link
Author

#730 (comment) and #255 seems to be describing the problem of loading a large file at once(which has repetitive structure). But MDF Files which I am trying parse are like linked lists where each block in the file will be parsed only when I want them to be parsed. My problem lies in parsing all the blocks of the file till I have enough information to process which means all the blocks parsed correspond to same number objects that are created will be destroyed in the same reverse order as they will be created, because one block references the next one and that's how parsing goes on.

I now would like to ask @webbnh if there is any workaround to destroy the child object before destroying the parent object which referenced the present one. Because if the child object is destroyed, parent object sees a field missing and throws exception while getting destroyed. C++ 11 compiled library files have unique pointers to these objects where even destructors cannot be modified.

Advice me if my understanding is wrong and suggest me the ways to address the problem by patching the source files generated by Kaitai Struct Compiler.

@ChaithraMahadeva
Copy link
Author

Hello,

My problem has been resolved. I used reset() function for every unique pointers(object types) that has been parsed already which released memory and kept destruction of parent object safe either.

Thank you @KOLANICH and @generalmimon for your help.

Regards,
Chaithra Mahadeva

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants