Very long process of getting repr of huge objects in torch #461

seniorsolt · 2024-08-02T15:17:21Z

viztracer --tracer_entries 10000000 --ignore_frozen --ignore_c_function --log_func_args --max_stack_depth 30 -- scripts\train_ui.py

it's not infinite recursion like in #338, just very long process. Maybe additional flag limiting repr process would solve both issues.
Max Stack depth option helps if it prevents from entrying into this objects, but if we are already in, it will not help us to get out

gaogaotiantian · 2024-08-02T18:38:46Z

VizTracer is not able to limit repr - it can do whatever it wants. If repr itself takes too long, that's not something VizTracer can deal with. Worst case all of the repr process is written in C and VizTracer will not even be triggered. However, the tracer itself should be paused when calculating the repr.

seniorsolt · 2024-08-03T13:48:24Z

I didn't understand the part about repr in C, where it says that viztracer will not be triggered (invoked?). It's viztracer that calls repr, isn't it?

I think it is possible to implement this using execution isolation (running repr in a separate thread, process, or asynchronous task to limit execution with a timeout). However, it sounds like a lot of effort just for handling torch, which can be simply filtered out through exclude files.

Another way I can suppose - use sys.getsizeof() before calling repr and skip large files.
Anyway the issue is not so important, maybe the best solution is doing nothing) Let this issue be here just for information

Upd: Another one - print warning about unusual long repr during processing <file.py> to show user file which he might want to exclude

gaogaotiantian · 2024-08-07T03:21:34Z

Yes it's viztracer that calls repr, but it's the user that asks viztracer to call it. The time of repr has nothing to do with the size of the file/object - it only depends on how the object implements it. Running in a separate executor is not a solution, there will be racing issue or it will stuck (if not concurrent).

Overall, if an object decides to make its repr slow, there's nothing viztracer can do.

seniorsolt · 2024-08-07T15:50:04Z

The time of repr depends on size of object if the object repr implementation is supposed to iterate all of its stuff as in case of torch storage. However, I agree, that's a corner case, and we can't say wheather time of repr depends on size or not before invoking repr.

What about running repr in separate executor, why will there be racing issue or stuck? And why we don't see it now? Can't we find a way of implementation without introducing new issues?

seniorsolt · 2024-08-10T17:43:29Z

think I found one more option - using base class method instead of overloaded method of torch tensor:

gaogaotiantian · 2024-08-11T15:19:49Z

What about running repr in separate executor, why will there be racing issue or stuck? And why we don't see it now? Can't we find a way of implementation without introducing new issues?

Concurrency is not the silver bullet for everything that's slow. VizTracer needs the result of repr at that point, it's not helpful to calculate the result in a separate thread/process because VizTracer will be blocked there waiting for the result.

Is it possible to just move forward without the result? Maybe. But first of all - it's just not worth it because it's not worth it. You'll need to change a lot of the current structure to make it work. Then, what if the object changed during the process? That's the racing I talked about.

So, overall, it's infeasible and not worth it to do it in a separate executor.

Also, we will not do anything special just for torch. It's not a TorchTracer. We could potentially provide a way to solve all the similar issues (like I mentioned, use objprint or provide a way to customize all of your repr requests). So any solution specific to torch is not a solution.

seniorsolt · 2024-08-11T15:23:37Z

okay, sounds reasonable, thank you for your time)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very long process of getting repr of huge objects in torch #461

Very long process of getting repr of huge objects in torch #461

seniorsolt commented Aug 2, 2024 •

edited

Loading

gaogaotiantian commented Aug 2, 2024

seniorsolt commented Aug 3, 2024 •

edited

Loading

gaogaotiantian commented Aug 7, 2024

seniorsolt commented Aug 7, 2024

seniorsolt commented Aug 10, 2024 •

edited

Loading

gaogaotiantian commented Aug 11, 2024

seniorsolt commented Aug 11, 2024

Very long process of getting repr of huge objects in torch #461

Very long process of getting repr of huge objects in torch #461

Comments

seniorsolt commented Aug 2, 2024 • edited Loading

gaogaotiantian commented Aug 2, 2024

seniorsolt commented Aug 3, 2024 • edited Loading

gaogaotiantian commented Aug 7, 2024

seniorsolt commented Aug 7, 2024

seniorsolt commented Aug 10, 2024 • edited Loading

gaogaotiantian commented Aug 11, 2024

seniorsolt commented Aug 11, 2024

seniorsolt commented Aug 2, 2024 •

edited

Loading

seniorsolt commented Aug 3, 2024 •

edited

Loading

seniorsolt commented Aug 10, 2024 •

edited

Loading