You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using viztracer for lightweight tracing when training pytorch models. Running in a datacenter, all of the clocks are synchronized to within some small number of ms. Since viztracer uses only the monotonic clock during tracing (absolutely the correct answer), traces from different machines will have wildly different timestamps. Since we can't force the traces to start at the same moment, the --align_combine feature gets them to within seconds of each other (some improvement!) but I think we can do better.
It would be keen to have an option (or update the default) to calculate the offset between the system time and monotonic time during trace save, and offset the timestamp by that difference. That way, we will project the monotonic time into global time (+/- the error of the system clock), and be able to compare traces that have been combined.
If it's something you're interested in, I can look into making a PR.
The text was updated successfully, but these errors were encountered:
What are you looking for to solve this issue? There are a couple of ways to do this.
Post-run edit. This would be the most straightforward way to solve the problem and you probably do not even need any changes from viztracer. Loop through events and do the offset as you want.
Have an option to pass in an offset, which is 0 by default, then do the offset when saving the trace. This is not too bad, but there will be C code involved and the trace saving part is .. hmm, not the cleanest code to follow.
Do it on run-time, add the offset when getting the timestamp. This would probably be the easiest as getts is already a function, but I don't want to do this as it hits performance.
An even more interesting way, to add system clock(or it's offset to monitonic clock) to metadata, and let --combine command to solve it. Similar to --align_combine, but with a known offset.
We're using viztracer for lightweight tracing when training pytorch models. Running in a datacenter, all of the clocks are synchronized to within some small number of ms. Since viztracer uses only the monotonic clock during tracing (absolutely the correct answer), traces from different machines will have wildly different timestamps. Since we can't force the traces to start at the same moment, the
--align_combine
feature gets them to within seconds of each other (some improvement!) but I think we can do better.It would be keen to have an option (or update the default) to calculate the offset between the system time and monotonic time during trace save, and offset the timestamp by that difference. That way, we will project the monotonic time into global time (+/- the error of the system clock), and be able to compare traces that have been combined.
If it's something you're interested in, I can look into making a PR.
The text was updated successfully, but these errors were encountered: