How can I tell where my python script is hanging?

PythonDebugging

Python Problem Overview


So I'm debugging my python program and have encountered a bug that makes the program hang, as if in an infinite loop. Now, I had a problem with an infinite loop before, but when it hung up I could kill the program and python spat out a helpful exception that told me where the program terminated when I sent it the kill command. Now, however, when the program hangs up and I ctrl-c it, it does not abort but continues running. Is there any tool I can use to locate the hang up? I'm new to profiling but from what I know a profiler can only provide you with information about a program that has successfully completed. Or can you use a profiler to debug such hang ups?

Python Solutions


Solution 1 - Python

Let's assume that you are running your program as:

python YOURSCRIPT.py

Try running your program as:

python -m trace --trace YOURSCRIPT.py

And have some patience while lots of stuff is printed on the screen. If you have an infinite loop, it will go on for-ever (halting problem). If it gets stuck somewhere, then mostly you are stuck on I/O or it is a deadlock.

Solution 2 - Python

I wrote a module that prints out threads that hang longer that 10 seconds at one place. hanging_threads.py

Run:

python -m pip install hanging_threads

Add this to your code:

from hanging_threads import start_monitoring
start_monitoring(seconds_frozen=10, test_interval=100)

Here is an example output:

--------------------    Thread 5588     --------------------
  File "C:\python33\lib\threading.py", line 844, in _exitfunc
        t.join()
  File "C:\python33\lib\threading.py", line 743, in join
        self._block.wait()
  File "C:\python33\lib\threading.py", line 184, in wait
        waiter.acquire()

This occurs at the exit of the main thread when you forget to set another thread as daemon.

Solution 3 - Python

Wow! 5 answers already and nobody has suggested the most obvious and simple:

  1. Try to find a reproducible test case that causes the hanging behavior.
  2. Add logging to your code. This can be as basic as print "**010", print "**020", etc. peppered through major areas.
  3. Run code. See where it hangs. Can't understand why? Add more logging. (I.e. if between **020 and **030, go and add **023, **025, **027, etc.)
  4. Goto 3.

Solution 4 - Python

If your program is too big and complex to be viable for single stepping with pdb or printing every line with the trace module then you could try a trick from my days of 8-bit games programming. From Python 2.5 onwards pdb has the ability to associate code with a breakpoint by using the commands command. You can use this to print a message and continue running:

(Pdb) commands 1
(com) print "*** Breakpoint 1 ***"
(com) continue
(com) end
(Pdb)

This will print a message and carry on running when breakpoint 1 is hit. Define similar commands for a few other breakpoints.

You can use this to do a kind of binary search of your code. Attach breakpoints at key places in the code and run it until it hangs. You can tell from the last message which was the last breakpoint it hit. You can then move the other breakpoints and re-run to narrow down the place in the code where it hangs. Rinse and repeat.

Incidentally on the 8-bit micros (Commodore 64, Spectrum etc) you could poke a value into a registry location to change the colour of the border round the screen. I used to set up a few breakpoints to do this with different colours, so when the program ran it would give a psychedelic rainbow display until it hung, then the border would change to a single colour that told you what the last breakpoint was. You could also get a good feel for the relative performance of different sections of code by the amount of each colour in the rainbow. Sometimes I miss that simplicity in these new fangled "Windows" machines.

Solution 5 - Python

From Python 3.3 on there is a built in faulthandler module. To print a stack trace for all the threads when a normally fatal signal occurs:

import faulthandler
faulthandler.enable()

For a process that is hung, it is more useful to setup faulthandler to print stack traces on demand. This can be done with:

import faulthandler
import signal
faulthandler.register(signal.SIGUSR1.value)

Then once the process becomes hung you can send a signal to trigger the printing of the stack trace:

$ python myscript.py &
[1] <pid> 
$ kill -s SIGUSR1 <pid>

This signal won't kill the process, and you can send multiple times to see stack traces at different points in the execution.

Note that Python 3.5 or later is required for signal.SIGUSR1. For an older version, you can just hardcode the signal number (10 for most common linux architectures).

faulthandler.dump_traceback can be used together with threading.enumerate to identify threads having daemon=False to narrow down to hanging threads by their hex ID via hex(t.ident).

Solution 6 - Python

Multithreaded dæmon; using pyrasite to inspect a running program

I had a multithreaded dæmon that would sometimes get stuck after hours, sometimes after weeks. Running it through a debugger would be not feasible and perhaps not even helpful, as debugging multithreaded or multiprocess programs can be painful. Running it through trace might fill up gigabytes if not terabytes before it would get stuck. The second time the dæmon appeared to hang, I wanted to know right away where it was, without restarting it, adding inspection code, running it through a debugger, and waiting for hours, days, or weeks for it to hang again for circumstances yet to be investigated.

I was rescued by pyrasite, which lets the user connect to a running Python process and interactively inspect frames (example inspired by this gist):

$ pyrasite-shell 1071  # 1071 is the Process ID (PID)
Pyrasite Shell 2.0
Connected to '/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/python3.8 /opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/satpy_launcher.py -n localhost /opt/pytroll/pytroll_inst/config/trollflow2.yaml'                                                                                               
Python 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(DistantInteractiveConsole)

>>> import sys
>>> sys._current_frames()
{139652793759488: <frame at 0x7f034b2c9040, file '<console>', line 1, code <module>>, 139653520578368: <frame at 0x7f034b232ac0, file '/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py', line 112, code __init__>}

The first frame is not informative; that's our own pyrasite shell. The second frame, however, reveals that currently our script is stuck in the module pyresample.spherical in line 112. We can use the traceback module to get a full traceback:

>>> import traceback
>>> traceback.print_stack(list(sys._current_frames().values())[1])
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/satpy_launcher.py", line 80, in <module>
    main()
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/bin/satpy_launcher.py", line 75, in main
    run(prod_list, topics=topics, test_message=test_message,
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/launcher.py", line 152, in run
    proc.start()
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
    code = process_obj._bootstrap(parent_sentinel=child_r)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/launcher.py", line 268, in process
    cwrk.pop('fun')(job, **cwrk)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/plugins/__init__.py", line 403, in covers
    cov = get_scene_coverage(platform_name, start_time, end_time,
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollflow2/plugins/__init__.py", line 425, in get_scene_coverage
    return 100 * overpass.area_coverage(area_def)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/trollsched/satpass.py", line 242, in area_coverage
    inter = self.boundary.contour_poly.intersection(area_boundary)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 494, in intersection
    return self._bool_oper(other, -1)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 475, in _bool_oper
    inter, edge2 = edge1.get_next_intersection(narcs2, inter)
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 326, in get_next_intersection
    return None, None
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 298, in intersection
    return None
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 264, in intersections
    return (SCoordinate(lon, lat),
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 62, in cross2cart
    return res
  File "/opt/pytroll/pytroll_inst/miniconda3/envs/pytroll-py38/lib/python3.8/site-packages/pyresample/spherical.py", line 112, in __init__
    self.cart = np.array(cart)

and we can use all the power of Pythons introspection to inspect the stack in order to help us reconstruct the circumstances where this got stuck.

Solution 7 - Python

You could also try http://code.activestate.com/recipes/576515-debugging-a-running-python-process-by-interrupting/ . It should work as long as the Python process doesn't have signals masked, which is normally the case even if Ctrl-C doesn't work.

Solution 8 - Python

If your program is a bit too complex to simply trace all the functions, you can try running it and manually attaching a tracer program like lptrace to it. It works a bit like strace– it prints every function call your program makes. Here's how to call it:

python lptrace -p $STUCK_PROGRAM_PID

Note that lptrace requires gdb to run.

Solution 9 - Python

Nothing like the good old pdb

import pdb
pdb.run('my_method()',globals(),locals())

Then just hit (n) to go to the next command, (s) to step into. see the docs for the full reference. Follow your program step by step, and you'll probably figure it out fast enough.

Solution 10 - Python

It's easier to prevent these hang-ups than it is to debug them.

First: for loops are very, very hard to get stuck in a situation where the loop won't terminate. Very hard.

Second: while loops are relatively easy to get stuck in a loop.

The first pass is to check every while loop to see if it must be a while loop. Often you can replace while constructs with for, and you'll correct your problem by rethinking your loop.

If you cannot replace a while loop with for, then you simply have to prove that the expression in the while statement must change every time through the loop. This isn't that hard to prove.

  1. Look at all the condition in the loop. Call this T.

  2. Look at all the logic branches in the body of the loop. Is there any way to get through the loop without making a change to the condition, T?

    • Yes? That's your bug. That logic path is wrong.

    • No? Excellent, that loop must terminate.

Solution 11 - Python

Haven't used it myself but I've heard that the Eric IDE is good and has a good debugger. That's also the only IDE I know of that has a debugger for Python

Solution 12 - Python

If your program has more than one thread, it could be ignoring ctrl-c because the one thread is wired up to the ctrl-c handler, but the live (runaway?) thread is deaf to it. The GIL (global interpreter lock) in CPython means that normally only one thread can actually be running at any one time. I think I solved my (perhaps) similar problem using this

Solution 13 - Python

i = 0
for t in threading.enumerate():
    if i != 0:# and t.getName() != 'Thread-1':
        print t.getName()
        t._Thread__stop()
    i += 1

Once you know the names of the threads; start re-executing your script and filter them down, not stopping them from being aborted. i=0 conditional prevents the main thread from being aborted.

I suggest going through and naming all your threads; such as: Thread(target=self.log_sequence_status, name='log status')

This code should be placed at the end of the main program that starts up the run-away process

Solution 14 - Python

Wow ! Seems you added so much code in one go without testing it that you can't say what code was added just before program started to hang... (the most likely cause of problem).

Seriously, you should code by small steps and test each one individually (ideally doing TDD).

For your exact problem of spotting what python code is running and ctrl-c does not work, I will try a raw guess: did you used some except: catching all exceptions indistinctly. If you did so in a loop (and continue loop after managing exception), it's a very likely reason why ctrl-c does not work : it's catched by this exception. Change to except Exception: and it should not be catched any more (there is other possibilities for ctrl+c not working like thread management as another poster suggested, but I believe the above reason is more likely).

> exception KeyboardInterrupt > > Raised when the user hits the interrupt key (normally Control-C or Delete). > During execution, a check for interrupts is made regularly. > Interrupts typed when a built-in function input() or raw_input() is > waiting for input also raise this exception. The exception inherits > from BaseException so as to not be accidentally caught by code that > catches Exception and thus prevent the interpreter from exiting. > > Changed in version 2.5: Changed to inherit from BaseException.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJohnnyView Question on Stackoverflow
Solution 1 - PythondhruvbirdView Answer on Stackoverflow
Solution 2 - PythonUserView Answer on Stackoverflow
Solution 3 - PythondkaminsView Answer on Stackoverflow
Solution 4 - PythonDave KirbyView Answer on Stackoverflow
Solution 5 - PythonfivefView Answer on Stackoverflow
Solution 6 - PythongerritView Answer on Stackoverflow
Solution 7 - PythonDaira HopwoodView Answer on Stackoverflow
Solution 8 - PythonKarim HView Answer on Stackoverflow
Solution 9 - PythonOfri RavivView Answer on Stackoverflow
Solution 10 - PythonS.LottView Answer on Stackoverflow
Solution 11 - PythonMatti LyraView Answer on Stackoverflow
Solution 12 - PythonphlipView Answer on Stackoverflow
Solution 13 - PythonJustin CattersonView Answer on Stackoverflow
Solution 14 - PythonkrissView Answer on Stackoverflow