How can I debug an internal error in the .NET Runtime?

C#.Net

C# Problem Overview


I am trying to debug some work that processes large files. The code itself works, but there are sporadic errors reported from the .NET Runtime itself. For context, the processing here is a 1.5GB file (loaded into memory once only) being processed and released in a loop, deliberately to try to reproduce this otherwise unpredictable error.

My test fragment is basically:

try {
    byte[] data =File.ReadAllBytes(path);
    for(int i = 0 ; i < 500 ; i++)
    {
        ProcessTheData(data); // deserialize and validate

        // force collection, for tidiness
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.WaitForPendingFinalizers();
    }
} catch(Exception ex) {
    Console.WriteLine(ex.Message);
    // some more logging; StackTrace, recursive InnerException, etc
}

(with some timing and other stuff thrown in)

The loop will process fine for an non-deterministic number of iterations fully successfully - no problems whatsoever; then the process will terminate abruptly. The exception handler is not hit. The test does involve a lot of memory use, but it saw-tooths very nicely during each iteration (there is not an obvious memory leak, and I have plenty of headroom - 14GB unused primary memory at the worst point in the saw-tooth). The process is 64-bit.

The windows error-log contains 3 new entries, which (via exit code 80131506) suggest an Execution Engine error - a nasty little critter. A related answer, suggests a GC error, with a "fix" to disable concurrent GC; however this "fix" does not prevent the issue.

Clarification: this low-level error does not hit the CurrentDomain.UnhandledException event.

Clarification: the GC.Collect is there only to monitor the saw-toothing memory, to check for memory leaks and to keep things predictable; removing it does not make the problem go away: it just makes it keep more memory between iterations, and makes the dmp files bigger ;p

By adding more console tracing, I have observed it faulting during each of:

  • during deserialization (lots of allocations, etc)
  • during GC (between a GC "approach" and a GC "complete", using the GC notification API)
  • during validation (just foreach over some of the data) - curiously just after a GC "complete" during the validation

So lots of different scenarios.

I can obtain crash-dump (dmp) files; how can I investigate this further, to see what the system is doing when it fails so spectacularly?

C# Solutions


Solution 1 - C#

If you have memory dumps, I'd suggest using WinDbg to look at them, assuming that you're not doing that already.

Trying running the comment !EEStack (mixed native and managed stack trace), and see if there's anything that might jump out in the stack trace. In my test program, I found this one of the times as my stack trace where a FEEE happened (I was purposefully corrupting the heap):

0:000> !EEStack

Thread 0 Current frame: ntdll!NtWaitForSingleObject+0xa Child-SP RetAddr Caller, Callee 00000089879bd3d0 000007fc586610ea KERNELBASE!WaitForSingleObjectEx+0x92, calling ntdll!NtWaitForSingleObject 00000089879bd400 000007fc5869811c KERNELBASE!RaiseException+0x68, calling ntdll!RtlRaiseException [...] 00000089879bec80 000007fc49109cf6 clr!WKS::gc_heap::gc1+0x96, calling clr!WKS::gc_heap::mark_phase 00000089879becd0 000007fc49109c21 clr!WKS::gc_heap::garbage_collect+0x222, calling clr!WKS::gc_heap::gc1 00000089879bed10 000007fc491092f1 clr!WKS::GCHeap::RestartEE+0xa2, calling clr!Thread::ResumeRuntime 00000089879bed60 000007fc4910998d clr!WKS::GCHeap::GarbageCollectGeneration+0xdd, calling clr!WKS::gc_heap::garbage_collect 00000089879bedb0 000007fc4910df9c clr!WKS::GCHeap::Alloc+0x31b, calling clr!WKS::GCHeap::GarbageCollectGeneration 00000089879bee00 000007fc48ff82e1 clr!JIT_NewArr1+0x481

Since this could be related to heap corruption from the garbage collector, I would try the !VerifyHeap command. At least you could make sure that the heap is intact (and your problem lies elsewhere) or discover that your issue might actually be with the GC or some P/Invoke routines corrupting it.

If you find that the heap is corrupt, I might try and discover how much of the heap is corrupted, which you might be able to do via !HeapStat. That might just show the entire heap corrupt from a certain point, though.

It's difficult to suggest any other methods to analyze this via WinDbg, since I have no real clue about what your code is doing or how it's structured.

I suppose if you find it to be an issue with the heap and thus meaning it could be GC weirdness, I would look at the CLR GC events in Event Tracing for Windows.


If the minidumps you're getting aren't cutting it and you're using Windows 7/2008R2 or later, you can use Global Flags (gflags.exe) to attach a debugger when the process terminates without an exception, if you're not getting a WER notification.

In the Silent Process Exit tab, enter the name of the executable, not the full path to it (ie. TestProgram.exe). Use the following settings:

  • Check Enable Silent Process Exit Monitoring
  • Check Launch Monitor Process
  • For the Monitor Process, use {path to debugging tools}\cdb.exe -server tcp:port=5005 -g -G -p %e.

And apply the settings.

When your test program crashes, cdb will attach and wait for you to connect to it. Start WinDbg, type Ctrl+R, and use the connection string: tcp:port=5005,server=localhost.

You might be able to skip using remote debugging and instead use {path to debugging tools}\windbg.exe %e. However, the reason I suggested remote instead, was because WerFault.exe, which I believe is what reads the registry and launches the monitor process, will start the debugger in Session 0.

You can make session 0 interactive and connect to the window station, but I can't remember how that's done. It's also inconvenient, because you'd have to switch back and forth between sessions if you need to access any of your existing windows you've had open.

Solution 2 - C#

Tools->Debugging->General->Enable .Net Framework Debugging

Tools->IntelliTace-> IntelliTaceEbents And Call Information

Tools->IntelliTace-> Set StorIntelliTace Recordings in this directory

and choose a directory

should allow you to step INTO .net code and trace every single function call. I tried it on a small sample project and it works

after each debug session it suppose to create a recording of the debug session. it the set directory even if CLR dies if im not mistaken

this should allow you to get to the extact call before CLR collapsed.

Solution 3 - C#

Try writing a generic exception handler and see if there is an unhandled exception killing your app.

    AppDomain currentDomain = AppDomain.CurrentDomain;
    currentDomain.UnhandledException += new UnhandledExceptionEventHandler(MyExceptionHandler);

static void MyExceptionHandler(object sender, UnhandledExceptionEventArgs e) {
        Console.WriteLine(e.ExceptionObject.ToString());
        Console.WriteLine("Press Enter to continue");
        Console.ReadLine();
        Environment.Exit(1);

Solution 4 - C#

I usually invesitgate memory related problems with Valgrind and gdb.

If you run your things on Windows, there are plenty of good alternatives such as verysleepy for callgrind as suggested here:
https://stackoverflow.com/questions/413477/is-there-a-good-valgrind-substitute-for-windows

If you really want to debug internal errors of the .NET runtime, you have the problem that there is no source for neither the class libraries nor the VM.

Since you can't debug what you don't have, I suggest that (apart from decompiling the .NET framework libraries in question with ILSpy, and adding them to your project, which still doesn't cover the vm) you could use the mono runtime.
There you have both the source of the class libraries as well as of the VM.
Maybe your program works fine with mono, then your problem would be solved, at least as long as it's only a one-time-processing task.

If not, there is an extensive FAQ on debugging, including GDB support
http://www.mono-project.com/Debugging

Miguel also has this post regarding valgrind support:
http://tirania.org/blog/archive/2007/Jun-29.html

In addition to that, if you let it run on Linux, you can also use strace, to see what's going on in the syscalls. If you don't have extensive winforms usage or WinAPI calls, .NET programs usually work fine on Linux (for problems regarding file system case-sensitivity, you can loopmount a case-insensitive file system and/or use MONO_IOMAP).

If you're Windows centric person, this post says the closest thing Windows has is WinDbg's Logger.exe, but ltrace information is not as extensive.

Mono sourcecode is available here:
http://download.mono-project.com/sources/

You are probably interested in the sources of the latest mono version
http://download.mono-project.com/sources/mono/mono-3.0.3.tar.bz2

If you need framework 4.5, you'll need mono 3, you can find precompiled packages here
https://www.meebey.net/posts/mono_3.0_preview_debian_ubuntu_packages/

If you want to make changes to the sourcecode, this is how to compile it:
http://ubuntuforums.org/showthread.php?t=1591370

Solution 5 - C#

There are .NET exceptions which can not be caught. Check out: http://msdn.microsoft.com/en-us/magazine/dd419661.aspx.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMarc GravellView Question on Stackoverflow
Solution 1 - C#Christopher CurrensView Answer on Stackoverflow
Solution 2 - C#NahumView Answer on Stackoverflow
Solution 3 - C#DhawalkView Answer on Stackoverflow
Solution 4 - C#Stefan SteigerView Answer on Stackoverflow
Solution 5 - C#Dalibor ČarapićView Answer on Stackoverflow