-
-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSONDecodeError in report_builder.py #364
Comments
Since I don't know what else to do (other than dig into the viztracer source code and debug there)... Here's more errors. I've seen many different JSON errors that all point to the file being improperly formatted in some way or another.
|
Interesting. Could you share the script (preferably minimal reproducible example) so I can check it out? For now the only thing I know is there's something wrong with the JSON built in the subprocess, which is not that helpful as this is a corner case. Also are you using the latest release version of VizTracer? |
The JSON file is generated manually by VizTracer to achieve better performance, which could have bugs when it comes to some combination of characters. I just need to figure out which. |
Another error:
I'm on viztracer 0.15.6, just installed from pip today and it seems up to date. I'll see about providing you a minimal reproducible example at my next convenience, but the code is complicated. If there's a way I can just grab the JSON that VizTracer is generating and send that, that would be much easier for me. |
I've also tried this on two different (Windows) computers to see if maybe only one computer had the issue, but it's happening on both. Also... I don't know if its related to viztracer but I'm having another issue with my code that I didn't notice before investigating this viztracer issue. I have a process that just won't die. The process writes to a PipeConnection, then closes it when its done and returns from the target function. The process is also a daemon. For both of those reasons (closes pipe and is daemon), the process should close when the main is done. And of course, when main is done, that's when viztracer says "waiting for child processes to close". I've tried using close(), kill(), and terminate() on the process and it just won't end. edit: I think this issue occurs with or without viztracer profiling the code. I just hadn't figured out which process was the one that was always not closing, forcing me to end the process. |
The JSON file that has issues is a temporary file, it's not easy (though not impossible) to find the file. If the process failed out, the temporary JSON files should be somewhere in your temporary directory. The better solution is still getting a minimum reproducible example so I can reproduce it somehow. However, if you figured out how the JSON file is corrupted, that could also work. |
stdout:
This is a Google Drive link: https://drive.google.com/drive/folders/1PUa6LD_Q9RoY0DoWpF0PJQWrejlV0CgE?usp=drive_link In that folder, there's the two raw files that were generated, associated with the stack trace exception above. Just in case Google Drive is fucky with the files (which I don't think it would be), and for your convenience, I also added a .zip (compressed by Windows, not 7z) and a .7z (compressed by... 7z). If you have 7z, great, and I trust 7z won't have altered the data. If you don't, the .zip should still contain the raw data but if anything would fuck with the binary data, it would be Windows. Here are the checksums for those files (calculated by 7z on the raw files): Name: result_10880.json Name: result_12684.json I can also tell you... if the files are meant to have a standard encoding (like UTF-8) or are even meant to be ASCII... there's clearly an issue with one or both of them because Notepad++ gets very unhappy when I try to view them. NP++ just hanged on me and I had to force close it as well. When I manage to open one and scroll down a ways, there's corrupted text/unicode spam everywhere. I can't copy the text into this comment though. When I do, it appears normal. |
The file requires access. |
Apologies. Thought I made it public. I gave you access (including edit access in case you want to send files back to me for testing). |
Your file |
Every time I run viztracer, I have to do Ctrl+C to "skip" the "waiting for child processes to close". That seems like a logical reason why there wouldn't be a graceful exit. The problem now is... I've got to figure out why Python won't close this process when it absolutely should be. Its target function returns. The pipe it writes to is emptied and then closed (on both ends). Doesn't make any sense and I can't even terminate() it... Is there any way the viztracer process is interfering with the ability for my child process to close? How is it converting the process into JSON? Is my child process itself being abruptly ended the issue, or is it a viztracer process that's being interrupted? |
I identified the problem and found the solution. By adding a time.sleep() at the end of the main process's inline code (the last line of code run by the entire program) of at least 2 seconds (1 second isn't enough), it works. It seems like the process does get closed properly and it just takes a couple seconds for garbage collection or something to mark the process as terminated... whatever viztracer relies on to check if there are active child processes. But if the process ends after the main.py returns, viztracer won't detect that. It just hangs on "waiting for child processes to close...". After typing that out... I realized what the proper solution here should be: Manually calling a "kill" or "end" method of the class that generated the process which joins the process (and threads) to block until the process (and threads) are closed. I was hoping I wouldn't need to do that so I could move on with processing and let GC take care of it, but this does seem more proper and obviously the ability to I tested this lazily (just by joining the process directly from the main thread via the class's reference, doesn't matter) and viztracer works just fine. It actually looks visually differently than it did when I was testing it the other day. The webpage looks very clean and easier to use. There must have been an older version of it that I was accessing somehow (or you just updated it). In either case, it looks pretty sexy now and I'm glad I found the solution. I'm not sure if you would consider this behavior a bug of viztracer or not, but I have learned better programming habits from this (properly cleaning up threads and processes when they are done). Thanks for being with me on the journey. If you need anything else from me (for help in fixing this bug), feel free to ask. I don't think I need any more help though as I identified the issue and found the solution (which I should have been doing to begin with). Thanks man! If you don't need anything from me either, feel free to close the issue. |
The way VizTracer works is to add a hook when the subprocess ends, which:
However, if you try to For multi-processing to work, you have to ensure all the processes end gracefully, that's just by definition. So yes, having a "join" in the main process makes a lot of sense and it's a good habit. VizTracer is not able to deal with such situations on its own, if the user forcefully kill the subprocess, there's nothing VizTracer can do and it just simply won't work. |
Sorry to revive this, but I'm having what I believe to be a similar issue. Sometimes when multiprocessing workers use shared data, vizviewer gets stuck thinking a child process is still running when it isn't. So it gets stuck on "Wait for child processes to finish, Ctrl+C to skip" (even though we now know pressing Ctrl+C will cut off the stream and leave a string that json can't decode). import multiprocessing as mp
import time
def worker(currentMin, timeout=float('inf'), startTime=0):
while True:
# it only works when this next line is commented
x = currentMin.value
if time.perf_counter() - startTime > timeout:
#-- exit early because of timeout
break
def multiprocess(timeout=float('inf')):
t0 = time.perf_counter()
with mp.Pool() as pool:
with mp.Manager() as manager:
integer = manager.Value('i', 1)
for _ in range(100):
pool.apply_async(worker, (integer, timeout, t0))
# end all the processes
pool.close()
pool.join()
if __name__ == '__main__':
val = multiprocess(timeout=10)
print(mp.active_children()) # [] I am using viztracer 0.16.0 on python 3.11.6. Thanks |
Which OS are you using? I can't repro this with the code provided on Linux. |
I'm on Windows 10. |
I've repro this on Windows, seemed that the saving process was terminated by |
Hmm, I don't think there's an easy way to prevent that. Windows kills processes without signals to catch, so many multiprocessing stuff won't work on Windows. |
I‘ve tried to use signal to fix this, but it's really hard to do. I think there may be some imperfect solution for this. The current Do you think it's necessary to warn users when they terminate a subprocss on windows? I found that there won't be any trace data for the terminated process. |
Yeah we can combine only the valid ones and send a warning about the invalid ones to users. Increasing the timeout is not a good idea because that won't "fix" the problem - there's no time guarantee for the trace saving to be done. The method is a bit intrusive as well. I don't think the benefit worth the cost. You can't warn users, that's the issue. There's no hook you can use when a process is terminated - otherwise we can stop the termination and save the trace. You'll have the same thing on Linux for SIGKILL. Those are OS level signals that mandate the process to be killed immediately. You can do something smart in multi-process case, when you have a main process as a monitor - that's what we are doing now for waiting for other sub-processes to finish. You can maybe check whether the process that is being waited exists while waiting for it, and somehow deal with it if it does not. |
Tried running this in command line in multiple different ways and can't get it to work. VizTracer does work when used in-line in python but in-line doesn't work for multiprocessing (which I need it to).
edit: Figured out how to format it better.
The text was updated successfully, but these errors were encountered: