Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Add more logging for runtime test timeouts #109998

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Nov 20, 2024

The runtime testing infrastructure collects dumps of the test process and all of its corerun child processes whenever the runtime test times out.

foreach (var child in FindChildProcessesByName(process, "corerun"))
{
string crashDumpPath = Path.Combine(Path.GetFullPath(crashDumpFolder), string.Format("crashdump_{0}.dmp", child.Id));
Console.WriteLine($"Attempting to collect crash dump: {crashDumpPath}");
if (CollectCrashDump(child, crashDumpPath, outputWriter))
{
Console.WriteLine("Collected crash dump: {0}", crashDumpPath);
}
else
{
Console.WriteLine("Failed to collect crash dump");
}
}

However, there have been instances where a runtime test times out due to waiting for a subprocess to finish, yet there are logs indicating that the subprocess has finished. See #104670 (comment). Its not clear whether 1) the subprocess is indeed hanging (yet no crashdump is collected) or 2) the subprocess completed yet failed to signal back to the parent process (no crashdump to collect).

This PR aims to clarify, if the Id of the subprocess that the main test process is waiting for is known, whether that Id is part of the list of active processes at timeout.

Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

@@ -642,6 +642,8 @@ public static bool TryPrintStackTraceFromDmp(string dmpFile, TextWriter outputWr
// The children are sorted in the order they should be dumped
static unsafe IEnumerable<Process> FindChildProcessesByName(Process process, string childName)
{
Console.WriteLine($"Finding all child processes of '{process.ProcessName}' (ID: {process.Id}) with name '{childName}'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere else in this code I see an outputWriter being used but in a few spots I also see Console.WriteLine() being used. Can you tell what the distinction is? Its not clear to me when we want to use one and when we use the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the outputWriter + errorWriter are used to write each test's output.txt + error.txt.
I haven't yet pinpointed the logic for how each helix workitem's console.<id>.log is created, but it looks to me that the console.<id>.log will stitch together each individually generated xunit test wrapper's error.txt and output.txt in addition to the Console.WriteLines used.

I think the output.txt would be used to validate the test result, error.txt would be used to capture any test specific failures, and any other failures that show up in console.<id>.log are probably non test-specific (i.e. infrastructure or timeout?) failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the outputWriter + errorWriter are used to write each test's output.txt + error.txt.

Sounds reasonable to me. As long as the output will be visible in the CI output I'm happy but hopefully there is some owner for this code who can confirm we are following the intended convention :)

@@ -788,6 +791,12 @@ public int RunTest(string executable, string outputFile, string errorFile, strin
executable, timeout, (environmentVar != null) ? " from variable " : "", (environmentVar != null) ? TIMEOUT_ENVIRONMENT_VAR : "",
startTime.ToString(), endTime.ToString());

Console.WriteLine("Timed Out with Active Processes:");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log above said we timed out so I don't think we need to repeat that again. Perhaps output like this:

cmdLine: blablabla Timed Out (timeout in milliseconds: XYZ, start: A, end: B)
Collecting diagnostic information...
Snapshot of processes currently running:
   ID    ParentId  CmdLine
   145   9001      Foo.exe arg1 arg2
   1458  145       Blah.exe something something
   10632 2056      corerun.exe SomeTest.dll --whatever arg more_args
   ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have a straightforward way to grab the ParentIds and the CommandLine args of the process?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping that either the existing enumeration code that handles GetChildren() would have it or you could get it from Process.GetProcesses(). If neither of them provide an easy option feel free to simplify it as much as needed. Maybe it would just be ID and ProcessName as a simpler alternative.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, but hopefully at least one of the past editors/owners of this file will also be able to take a look. My knowledge of it is minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants