Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET crashes with exit code 80131506 and kills powershell.exe & CoreCycler #56

Open
Grzywax opened this issue Aug 31, 2023 · 11 comments
Open
Labels
help wanted Extra attention is needed under investigation Checking out the issue

Comments

@Grzywax
Copy link

Grzywax commented Aug 31, 2023

Hi, so after two weeks of tuning I have my Curve optimal enough so cycler can run long time without errors. Since then it happens regularly that it stops. No error, nothing. It can happen iteration 14 or 18th or 17... usually after 10 iterations minimum and I never passed 20 iterations in one shot. What could be the reason ? Still unstable cores bugging ?

here is the log:
+ Resuming the stress test process
+ Resumed: True
+ 11:33:57 - Getting new log file entries
+ Getting new log entries starting at position 151591 / Line 5744
+ The new log file entries:
+ - [Line 5745] Self-test 14336K passed!
+ New file position: 151617 / Line 5745
+ 11:33:57 - Checking CPU usage: 3.12%
+
+ 11:33:58 - Tick 5 of max 36
+ Remaining max runtime: 312s
+ 11:34:07 - Suspending the stress test process for 1000 milliseconds
+ Suspended: True
+ Resuming the stress test process
+ Resumed: True
+ 11:34:09 - Getting new log file entries
+ Getting new log entries starting at position 151617 / Line 5745
+ The new log file entries:
+ - [Line 5746] Self-test 15360K passed!
+ New file position: 151643 / Line 5746
+ 11:34:09 - Checking CPU usage: 3.12%
+
+ 11:34:10 - Tick 6 of max 36
+ Remaining max runtime: 300s
+ 11:34:19 - Suspending the stress test process for 1000 milliseconds
+ Suspended: True
+ Resuming the stress test process

Here is screenshot:
image

And here's Prime95 log around the same time when corecycler stops:
Self-test 8960K passed!
[Thu Aug 31 11:29:22 2023]
Self-test 14336K passed!
Self-test 9216K passed!
Self-test 9600K passed!
Self-test 10240K passed!
[Thu Aug 31 11:30:36 2023]
Self-test 10752K passed!
Self-test 11200K passed!
Self-test 15360K passed!
Self-test 11520K passed!
[Thu Aug 31 11:31:57 2023]
Self-test 12288K passed!
Self-test 12800K passed!
Self-test 13440K passed!
[Thu Aug 31 11:33:09 2023]
Self-test 13824K passed!
Self-test 16000K passed!
Self-test 14336K passed!
Self-test 15360K passed!
[Thu Aug 31 11:34:30 2023]
Self-test 16000K passed!
Self-test 16384K passed!
Self-test 17920K passed!
Self-test 16384K passed!
[Thu Aug 31 11:35:40 2023]

@sp00n
Copy link
Owner

sp00n commented Aug 31, 2023

At first I thought you meant it freezes, but it seems the script just exits? That's a first, I've never seen this being reported. 😮
Is there any common denominator in the log files, e.g. it always happens when resuming the stress test program, etc.?
Does the Windows Event Log maybe say something about this?

@Grzywax
Copy link
Author

Grzywax commented Aug 31, 2023

Yes, my previous issue was with "freezing" but since I deactivated quick edit it never happened.

But this one is different. I'm not able to run corecycler unattended for more than a day and a half. My theory is that some of "best" cores have still too high curve value and they clock too high in ultra light tasks - so when corecycler suspends task best core is able to clock high and if this core is at the moment used for cocrecycler script - it bugs and exits execution.

I did not check winows logs (no time for that deep dive ;] ).

But next time it happens I can post log fragment where it stopped - to see if it's the same step.

@sp00n
Copy link
Owner

sp00n commented Aug 31, 2023

If you haven't deleted the old log files, they should still be in our logs directory.

It's been a long time since I've let CoreCycler run for so long, it was still in the early days of writing it, so I could think of various issues for these long runtimes. Maybe a buffer overflow somewhere, maybe the log files become too large, maybe there's a memory leak... therefore checking the Windows Event Log might be useful to determine what's going on there, maybe there's some entry in there pointing in the right direction.

@Grzywax
Copy link
Author

Grzywax commented Sep 1, 2023

So today it happened again.

Here is the output:
image

Here is the log:
+ 14:06:27 - Getting new log file entries
+ Getting new log entries starting at position 120928 / Line 4431
+ The new log file entries:
+ - [Line 4432] Self-test 16384K passed!
+ New file position: 120954 / Line 4432
+ 14:06:27 - Checking CPU usage: 3.12%
+
+ 14:06:28 - Tick 21 of max 36
+ Remaining max runtime: 120s
+ 14:06:37 - Suspending the stress test process for 1000 milliseconds
+ Suspended: True
+ Resuming the stress test process
+ Resumed: True
+ 14:06:39 - Getting new log file entries
+ No file size change for the log file
+ 14:06:39 - Checking CPU usage: 3.12%
+
+ 14:06:40 - Tick 22 of max 36
+ Remaining max runtime: 108s
+ 14:06:49 - Suspending the stress test process for 1000 milliseconds
+ Suspended: True
+ Resuming the stress test process
+ Resumed: True
+ 14:06:51 - Getting new log file entries
+ Getting new log entries starting at position 120954 / Line 4432
+ The new log file entries:
+ - [Line 4433] [Fri Sep 1 14:06:41 2023]
+ - [Line 4434] Self-test 18432K passed!
+ New file position: 121008 / Line 4434
+ 14:06:51 - Checking CPU usage: 3.12%
+
+ 14:06:52 - Tick 23 of max 36
+ Remaining max runtime: 96s
+ 14:07:01 - Suspending the stress test process for 1000 milliseconds

And in windows log I got this:
01/09/2023 14:07:01
Faulting application name: powershell.exe, version: 10.0.19041.2913, time stamp: 0xcb0b8e31
Faulting module name: clr.dll, version: 4.8.9167.0, time stamp: 0x648f6bcc
Exception code: 0xc0000005
Fault offset: 0x000000000009d843
Faulting process id: 0x1864
Faulting application start time: 0x01d9dc29fa19d58f
Faulting application path: C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
Report Id: efda66dd-b2e7-4684-bc0d-dea89c5d6be7
Faulting package full name:
Faulting package-relative application ID:
Application
Application Error
ID: 1000
level:Error

and this one:
01/09/2023 14:07:01
Application: powershell.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFD4252D843 (00007FFD42490000) with exit code 80131506.
Source .NET Runtime
Event ID: 1023

At the same second there was service logon:
Special privileges assigned to new logon.

Subject:
Security ID: SYSTEM
Account Name: SYSTEM
Account Domain: NT AUTHORITY
Logon ID: 0x3E7

Privileges: SeAssignPrimaryTokenPrivilege
SeTcbPrivilege
SeSecurityPrivilege
SeTakeOwnershipPrivilege
SeLoadDriverPrivilege
SeBackupPrivilege
SeRestorePrivilege
SeDebugPrivilege
SeAuditPrivilege
SeSystemEnvironmentPrivilege
SeImpersonatePrivilege
SeDelegateSessionUserImpersonatePrivilege

Also I have those errors sometime - those are 100% not cores that are being stressed that bug, but something else... It lasts second to a minute and can mark one or few cores in a row as bad:
image

@Grzywax
Copy link
Author

Grzywax commented Sep 1, 2023

I would say it could be some leak. Looks like overflow of some kind.

@sp00n
Copy link
Owner

sp00n commented Sep 1, 2023

image
https://stackoverflow.com/questions/4367664/application-crashes-with-internal-error-in-the-net-runtime
This response points that it might be some hardware issues, so it could be related to overclocking (of the CPU or the RAM or both).
Other responses say that at least at some point it was a bug in the .NET framework itself. Maybe you could try to update your .NET installation and see if that helps.

You could also try to disable the suspend functionality by setting suspendPeriodically = 0 in your config. It will potentially decrease the usefulness of the stress test by removing the simulated "load switches", but I've noticed that there is a potential memory leak when calling this function, so for long runtimes it may actually hit a limit where something breaks.

If you decide to test either of these, please let me know the results.

@Grzywax
Copy link
Author

Grzywax commented Sep 2, 2023

For the period of tunning CPU I have RAM @ JEDEC settings 4800 (those are 6000 CL30 sticks) so it's not it. CPU - sure - it's in the middle of tuning ;]

Yesterday I have updated .NET packages but today it stopped again with same errors:

cocrycler log:

             + 11:28:52 - Checking CPU usage: 3.12%
             + 
             + 11:28:53 - Tick 11 of max 36
             +            Remaining max runtime: 240s
             + 11:29:02 - Suspending the stress test process for 1000 milliseconds
             +            Suspended: True
             +            Resuming the stress test process
             +            Resumed: True
             + 11:29:04 - Getting new log file entries
             +            No file size change for the log file
             + 11:29:04 - Checking CPU usage: 3.12%
             + 
             + 11:29:05 - Tick 12 of max 36
             +            Remaining max runtime: 228s
             + 11:29:14 - Suspending the stress test process for 1000 milliseconds

Win:

Application: powershell.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFAF31DCE9E (00007FFAF3140000) with exit code 80131506.

Win:
Faulting application name: powershell.exe, version: 10.0.19041.2913, time stamp: 0xcb0b8e31
Faulting module name: clr.dll, version: 4.8.9181.0, time stamp: 0x64b85478
Exception code: 0xc0000005
Fault offset: 0x000000000009ce9e
Faulting process id: 0x1ce8
Faulting application start time: 0x01d9dd64af7bee5c
Faulting application path: C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell.exe
Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll
Report Id: 2c63bc0d-a169-4e03-a9b7-fe2ee2731fe9
Faulting package full name:
Faulting package-relative application ID:

@sp00n
Copy link
Owner

sp00n commented Jun 7, 2024

Closing, as to me this looks like a hardware / overclock / undervolt issue.
If you or anyone else sees this happening, let me know. But I guess I have no way to debug this without access to the faulting system.

@sp00n sp00n closed this as completed Jun 7, 2024
@sp00n
Copy link
Owner

sp00n commented Jun 20, 2024

I've now had that .NET crash with exit code 80131506 as well, which simply kills the powershell.exe and therefore also CoreCycler.
Also while running Prime95.
I have no real angle to investigate this though, so let's see if it happens again.

@sp00n sp00n reopened this Jun 20, 2024
@sp00n sp00n added help wanted Extra attention is needed under investigation Checking out the issue labels Jun 20, 2024
@sp00n sp00n changed the title Corecycler just stopps and Prime95 continues to run same core .NET crashes with exit code 80131506 and kills powershell.exe & CoreCycler Jun 20, 2024
@sp00n
Copy link
Owner

sp00n commented Jun 22, 2024

I'm not making much progress. I thought I had solved it by refactoring some of my code, and it ran fine for over 7 hours where before it would stop within an hour, but then it returned.
I'm also not ruling out it's connected to the Remote Desktop Connection I'm using. Or it's indeed an unstable undervolt, although I haven't seen an error message for quite some time now.

I managed to get a crash dump when it happened, but I'm not really good at understanding it. The crash seems to be due to a Garbage Collector error where it tries to read memory from address 0 (i.e. null).

ExceptionAddress: 00007ffd3b357fba (clr!WKS::gc_heap::find_first_object+0x00000000000000ea)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000001
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000000
Attempt to read from address 0000000000000000

resp.
In powershell.exe.5240.dmp the assembly instruction at clr!WKS::gc_heap::find_first_object+ea in C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to read from memory location 0x00000000 on thread [7]

My experience with crash dumps and debugging with crash dumps is close to zero, so not sure how to progress from here.

@sp00n
Copy link
Owner

sp00n commented Aug 10, 2024

Since refactoring parts of code I haven't seen this happen anymore, so I hope the issue has been fixed or at least diminished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed under investigation Checking out the issue
Projects
None yet
Development

No branches or pull requests

2 participants