-
-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip invalid json files in ReportBuilder #384
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #384 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 21 21
Lines 2233 2249 +16
=========================================
+ Hits 2233 2249 +16 ☔ View full report in Codecov by Sentry. |
Seemd that there's not an easy way to get children on mac with pure python. I skipped find all subprocess on mac. |
Okay. I think the code to check existing pids on Windows is a bit too invasive. Bringing up another subprocess to run a cmd based command probably makes this more complicated than it is supposed to be, and the problem it solves is not as important. We can deal with The benefit to introduce such a large piece of fragile code is not significant enough - we still can't recover the data, we only know the process actually finished. I'm not saying it's not useful, but the code might be hard to maintain in a long term. The skipping of corrupted json report on the other hand, is much useful - it keeps the data from finished processes. There are some stuff that I'd like to change on the UI side, but overall I think that makes sense. So for now, could you remove the pid checking part, just leave the report piece? I think Ctrl+C to cancel is enough. However, there might be a cleaner way to do this, without relying on the cmdlets and Linux path for processes so it's cross-platform. We can check the size of the json report. If it has not changed for a while, then the process must be dead (at least we can warn the users). Or, we can at least list out the processes that we are waiting for (requires some techniques to refresh the line), and the user might realize it's stuck. For this PR, let's do one thing at a time. Do the json report skipping first, then let's do the feature to detect dead processes. |
Yes the current code to check existing pids is not suitable. For the children of multiprocessing. Maybe we can use If we use file size to check if the process is running,I'm worried that there's no json file if the child has not yet run to it's finalizer. |
Seems like you can use the API interface for WMI https://stackoverflow.com/a/568589. That's a much cleaner way to check the existing PIDs. Combining with the current So - get the pids to wait for from the (The windows api is not the fastest, you may either need some cache mechanism, or have a new function However, do this in a separate PR. We want to keep each PR as single purpose as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I mentioned, in combine_json()
, if there's no valid json, but there are inputs (all of them invalid), we should generate a better message to the users.
src/viztracer/report_builder.py
Outdated
@@ -67,10 +67,16 @@ def load_jsons(self) -> None: | |||
self.jsons = [get_json(self.data)] | |||
elif isinstance(self.data, (list, tuple)): | |||
self.jsons = [] | |||
corrupted_jsons = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it self.invalid_json_paths
as this does not store json data and it might not be "corrupted" - could be not json at all.
We need this in the future because if there's no valid json files, we will raise an exception and the final message won't be printed.
valid_json_file = shutil.copy(valid_json_path, os.path.join(tmpdir, "valid.json")) | ||
rb = ReportBuilder([corrupted_json_file, valid_json_file], verbose=1) | ||
with io.StringIO() as s: | ||
rb.save(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check for some printed messages here. This test only assures no exceptions raised.
src/viztracer/report_builder.py
Outdated
if not self.jsons and not self.invalid_json_paths: | ||
raise ValueError("Can't get report of nothing") | ||
elif not self.jsons: | ||
# print corrupted json files | ||
self.print_messages() | ||
raise ValueError("All json files are corrupted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not self.jsons and not self.invalid_json_paths: | |
raise ValueError("Can't get report of nothing") | |
elif not self.jsons: | |
# print corrupted json files | |
self.print_messages() | |
raise ValueError("All json files are corrupted") | |
if not self.jsons: | |
if not self.invalid_json_paths: | |
raise ValueError("Can't get report of nothing") | |
else: | |
raise ValueError("No valid json files found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the exception message should be enough - this is a rather rare case, just separate that from "no files".
src/viztracer/report_builder.py
Outdated
except json.JSONDecodeError: | ||
self.invalid_json_paths.append(j) | ||
if len(self.invalid_json_paths) > 0: | ||
self.final_messages.append(("corrupted_json", {"paths": self.invalid_json_paths})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do a full search on this PR and change all "corrupted" to "invalid".
src/viztracer/report_builder.py
Outdated
raise ValueError("Can't get report of nothing") | ||
elif not self.jsons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you hate nested if so much :) . You are evaluating self.jsons
twice (performance wise it's not a big deal, but logic wise it's not clean).
Just check self.jsons
first, and raise different errors based on self.invalid_json_paths
. Nested if works in many cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you hate nested if so much :) . You are evaluating
self.jsons
twice (performance wise it's not a big deal, but logic wise it's not clean).Just check
self.jsons
first, and raise different errors based onself.invalid_json_paths
. Nested if works in many cases.
Sorry, I didn't notice that. And thank you for correcting my bad habit.
Good job on this, thanks! |
Now the
ReporterBuilder
will ignore the corrupted json and only combine those valid json. It will warn user if there's corrupted json file.When waiting for child, the main process will find all subprocess recursively and check if the subproceess has terminated.