How to detect and debug multi-threading problems?

MultithreadingDebuggingLanguage Agnostic

Multithreading Problem Overview


This is a follow up to this question, where I didn't get any input on this point. Here is the brief question:

Is it possible to detect and debug problems coming from multi-threaded code?

Often we have to tell our customers: "We can't reproduce the problem here, so we can't fix it. Please tell us the steps to reproduce the problem, then we'll fix it." It's a somehow nasty answer if I know that it is a multi-threading problem, but mostly I don't. How do I get to know that a problem is a multi-threading issue and how to debug it?

I'd like to know if there are any special logging frameworks, or debugging techniques, or code inspectors, or anything else to help solving such issues. General approaches are welcome. If any answer should be language related then keep it to .NET and Java.

Multithreading Solutions


Solution 1 - Multithreading

Threading/concurrency problems are notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.

Deadlocks are the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.

Live locks are harder - being able to observe the system while in the error state is your best bet there.

Race conditions tend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.

The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.

Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.

One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrent is your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.

Solution 2 - Multithreading

I thought that the answer you got to your other question was pretty good. But I'll emphasis these points.

Only modify shared state in a critical section (Mutual Exclusion)

Acquire locks in a set order and release them in the opposite order.

Use pre-built abstractions whenever possible (Like the stuff in java.util.concurrent)

Also, some analysis tools can detect some potential issues. For example, FindBugs can find some threading issues in Java programs. Such tools can't find all problems (they aren't silver bullets) but they can help.

As vanslly points out in a comment to this answer, studying well placed logging output can also very helpful, but beware of Heisenbugs.

Solution 3 - Multithreading

For Java there is a verification tool called javapathfinder which I find it useful to debug and verify multi-threading application against potential race condition and death-lock bugs from the code.
It works finely with both Eclipse and Netbean IDE.

[2019] the github repository https://github.com/javapathfinder

Solution 4 - Multithreading

Assuming I have reports of troubles that are hard to reproduce I always find these by reading code, preferably pair-code-reading, so you can discuss threading semantics/locking needs. When we do this based on a reported problem, I find we always nail one or more problems fairly quickly. I think it's also a fairly cheap technique to solve hard problems.

Sorry for not being able to tell you to press ctrl+shift+f13, but I don't think there's anything like that available. But just thinking about what the reported issue actually is usually gives a fairly strong sense of direction in the code, so you don't have to start at main().

Solution 5 - Multithreading

In addition to the other good answers you already got: Always test on a machine with at least as many processors / processor cores as the customer uses, or as there are active threads in your program. Otherwise some multithreading bugs may be hard to impossible to reproduce.

Solution 6 - Multithreading

Apart from crash dumps, a technique is extensive run-time logging: where each thread logs what it's doing.

The first question when an error is reported, then, might be, "Where's the log file?"

Sometimes you can see the problem in the log file: "This thread is detecting an illegal/unexpected state here ... and look, this other thread was doing that, just before and/or just afterwards this."

If the log file doesn't say what's happening, then apologise to the customer, add sufficiently-many extra logging statements to the code, give the new code to the customer, and say that you'll fix it after it happens one more time.

Solution 7 - Multithreading

Sometimes, multithreaded solutions cannot be avoided. If there is a bug,it needs to be investigated in real time, which is nearly impossible with most tools like Visual Studio. The only practical solution is to write traces, although the tracing itself should:

  1. not add any delay
  2. not use any locking
  3. be multithreading safe
  4. trace what happened in the correct sequence.

This sounds like an impossible task, but it can be easily achieved by writing the trace into memory. In C#, it would look something like this:

public const int MaxMessages = 0x100;
string[] messages = new string[MaxMessages];
int messagesIndex = -1;

public void Trace(string message) {
  int thisIndex = Interlocked.Increment(ref messagesIndex);
  messages[thisIndex] = message;
}

The method Trace() is multithreading safe, non blocking and can be called from any thread. On my PC, it takes about 2 microseconds to execute, which should be fast enough.

Add Trace() instructions wherever you think something might go wrong, let the program run, wait until the error happens, stop the trace and then investigate the trace for any errors.

A more detailed description for this approach which also collects thread and timing information, recycles the buffer and outputs the trace nicely you can find at: CodeProject: Debugging multithreaded code in real time 1

Solution 8 - Multithreading

A little chart with some debugging techniques to take in mind in debugging multithreaded code. The chart is growing, please leave comments and tips to be added. (update file at this link)

Multithreaded debugging chart

Solution 9 - Multithreading

Visual Studio allows you to inspect the call stack of each thread, and you can switch between them. It is by no means enough to track all kinds of threading issues, but it is a start. A lot of improvements for multi-threaded debugging is planned for the upcoming VS2010.

I have used WinDbg + SoS for threading issues in .NET code. You can inspect locks (sync blokcs), thread call stacks etc.

Solution 10 - Multithreading

Tess Ferrandez's blog has good examples of using WinDbg to debug deadlocks in .NET.

Solution 11 - Multithreading

assert() is your friend for detecting race-conditions. Whenever you enter a critical section, assert that the invariant associated with it is true (that's what CS's are for). Though, unfortunately, the check might be expensive and thus not suitable for use in production environment.

Solution 12 - Multithreading

I implemented the tool vmlens to detect race conditions in java programs during runtime. It implements an algorithm called eraser.

Solution 13 - Multithreading

Develop code the way that Princess recommended for your other question (Immutable objects, and Erlang-style message passing). It will be easier to detect multi-threading problems, because the interactions between threads will be well defined.

Solution 14 - Multithreading

I faced a thread issue which was giving SAME wrong result and was not behaving un-predictably since each time other conditions(memory, scheduler, processing load) were more or less same.

From my experience, I can say that HARDEST PART is to recognize that it is a thread issue, and BEST SOLUTION is to review the multi-threaded code carefully. Just by looking carefully at the thread code you should try to figure out what can go wrong. Other ways (thread dump, profiler etc) will come second to it.

Solution 15 - Multithreading

Narrow down on the functions that are being called, and rule out what could and could not be to blame. When you find sections of code that you suspect may be causing the issue, add lots of detailed logging / tracing to it. Once the issue occurs again, inspect the logs to see how the code executed differently than it does in "baseline" situations.

If you are using Visual Studio, you can also set breakpoints and use the Parallel Stacks window. Parallel Stacks is a huge help when debugging concurrent code, and will give you the ability to switch between threads to debug them independently. More info-

https://docs.microsoft.com/en-us/visualstudio/debugger/using-the-parallel-stacks-window?view=vs-2019

https://docs.microsoft.com/en-us/visualstudio/debugger/walkthrough-debugging-a-parallel-application?view=vs-2019

Solution 16 - Multithreading

I'm using GNU and use simple script

$ more gdb_tracer

b func.cpp:2871
r
#c
while (1)
next
#step
end

Solution 17 - Multithreading

The best thing I can think of is to stay away from multi-threaded code whenever possible. It seems there are very few programmers who can write bug free multi threaded applications and I would argue that there are no coders beeing able to write bug free large multi threaded applications.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMicSimView Question on Stackoverflow
Solution 1 - MultithreadingLawrence DolView Answer on Stackoverflow
Solution 2 - MultithreadingGreg MattesView Answer on Stackoverflow
Solution 3 - MultithreadingbLaXjackView Answer on Stackoverflow
Solution 4 - MultithreadingkrosenvoldView Answer on Stackoverflow
Solution 5 - MultithreadingmghieView Answer on Stackoverflow
Solution 6 - MultithreadingChrisWView Answer on Stackoverflow
Solution 7 - MultithreadingPeter HuberView Answer on Stackoverflow
Solution 8 - MultithreadingMouzeView Answer on Stackoverflow
Solution 9 - MultithreadingBrian RasmussenView Answer on Stackoverflow
Solution 10 - MultithreadingSeanView Answer on Stackoverflow
Solution 11 - MultithreadingzvrbaView Answer on Stackoverflow
Solution 12 - MultithreadingThomas KriegerView Answer on Stackoverflow
Solution 13 - MultithreadingSeanView Answer on Stackoverflow
Solution 14 - MultithreadingKuldeep TiwariView Answer on Stackoverflow
Solution 15 - MultithreadingiliketocodeView Answer on Stackoverflow
Solution 16 - MultithreadingBNRView Answer on Stackoverflow
Solution 17 - MultithreadingmaxView Answer on Stackoverflow