catching stdout in realtime from subprocess

PythonSubprocessStdout

Python Problem Overview


I want to subprocess.Popen() rsync.exe in Windows, and print the stdout in Python.

My code works, but it doesn't catch the progress until a file transfer is done! I want to print the progress for each file in real time.

Using Python 3.1 now since I heard it should be better at handling IO.

import subprocess, time, os, sys
 
cmd = "rsync.exe -vaz -P source/ dest/"
p, line = True, 'start'
 
 
p = subprocess.Popen(cmd,
                     shell=True,
                     bufsize=64,
                     stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)
 
for line in p.stdout:
    print(">>> " + str(line.rstrip()))
    p.stdout.flush()

Python Solutions


Solution 1 - Python

Some rules of thumb for subprocess.

  • Never use shell=True. It needlessly invokes an extra shell process to call your program.
  • When calling processes, arguments are passed around as lists. sys.argv in python is a list, and so is argv in C. So you pass a list to Popen to call subprocesses, not a string.
  • Don't redirect stderr to a PIPE when you're not reading it.
  • Don't redirect stdin when you're not writing to it.

Example:

import subprocess, time, os, sys
cmd = ["rsync.exe", "-vaz", "-P", "source/" ,"dest/"]

p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT)

for line in iter(p.stdout.readline, b''):
    print(">>> " + line.rstrip())

That said, it is probable that rsync buffers its output when it detects that it is connected to a pipe instead of a terminal. This is the default behavior - when connected to a pipe, programs must explicitly flush stdout for realtime results, otherwise standard C library will buffer.

To test for that, try running this instead:

cmd = [sys.executable, 'test_out.py']

and create a test_out.py file with the contents:

import sys
import time
print ("Hello")
sys.stdout.flush()
time.sleep(10)
print ("World")

Executing that subprocess should give you "Hello" and wait 10 seconds before giving "World". If that happens with the python code above and not with rsync, that means rsync itself is buffering output, so you are out of luck.

A solution would be to connect direct to a pty, using something like pexpect.

Solution 2 - Python

I know this is an old topic, but there is a solution now. Call the rsync with option --outbuf=L. Example:

cmd=['rsync', '-arzv','--backup','--outbuf=L','source/','dest']
p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE)
for line in iter(p.stdout.readline, b''):
    print '>>> {}'.format(line.rstrip())

Solution 3 - Python

Depending on the use case, you might also want to disable the buffering in the subprocess itself.

If the subprocess will be a Python process, you could do this before the call:

os.environ["PYTHONUNBUFFERED"] = "1"

Or alternatively pass this in the env argument to Popen.

Otherwise, if you are on Linux/Unix, you can use the stdbuf tool. E.g. like:

cmd = ["stdbuf", "-oL"] + cmd

See also here about stdbuf or other options.

Solution 4 - Python

On Linux, I had the same problem of getting rid of the buffering. I finally used "stdbuf -o0" (or, unbuffer from expect) to get rid of the PIPE buffering.

proc = Popen(['stdbuf', '-o0'] + cmd, stdout=PIPE, stderr=PIPE)
stdout = proc.stdout

I could then use select.select on stdout.

See also https://unix.stackexchange.com/questions/25372/

Solution 5 - Python

for line in p.stdout:
  ...

always blocks until the next line-feed.

For "real-time" behaviour you have to do something like this:

while True:
  inchar = p.stdout.read(1)
  if inchar: #neither empty string nor None
    print(str(inchar), end='') #or end=None to flush immediately
  else:
    print('') #flush for implicit line-buffering
    break

The while-loop is left when the child process closes its stdout or exits. read()/read(-1) would block until the child process closed its stdout or exited.

Solution 6 - Python

Your problem is:

for line in p.stdout:
    print(">>> " + str(line.rstrip()))
    p.stdout.flush()

the iterator itself has extra buffering.

Try doing like this:

while True:
  line = p.stdout.readline()
  if not line:
     break
  print line

Solution 7 - Python

You cannot get stdout to print unbuffered to a pipe (unless you can rewrite the program that prints to stdout), so here is my solution:

Redirect stdout to sterr, which is not buffered. '<cmd> 1>&2' should do it. Open the process as follows: myproc = subprocess.Popen('<cmd> 1>&2', stderr=subprocess.PIPE)
You cannot distinguish from stdout or stderr, but you get all output immediately.

Hope this helps anyone tackling this problem.

Solution 8 - Python

To avoid caching of output you might wanna try pexpect,

child = pexpect.spawn(launchcmd,args,timeout=None)
while True:
    try:
        child.expect('\n')
        print(child.before)
    except pexpect.EOF:
        break

PS : I know this question is pretty old, still providing the solution which worked for me.

PPS: got this answer from another question

Solution 9 - Python

    p = subprocess.Popen(command,
                                bufsize=0,
                                universal_newlines=True)

I am writing a GUI for rsync in python, and have the same probelms. This problem has troubled me for several days until i find this in pyDoc.

> If universal_newlines is True, the file objects stdout and stderr are opened as text files in universal newlines mode. Lines may be terminated by any of '\n', the Unix end-of-line convention, '\r', the old Macintosh convention or '\r\n', the Windows convention. All of these external representations are seen as '\n' by the Python program.

It seems that rsync will output '\r' when translate is going on.

Solution 10 - Python

Change the stdout from the rsync process to be unbuffered.

p = subprocess.Popen(cmd,
                     shell=True,
                     bufsize=0,  # 0=unbuffered, 1=line-buffered, else buffer-size
                     stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)

Solution 11 - Python

I've noticed that there is no mention of using a temporary file as intermediate. The following gets around the buffering issues by outputting to a temporary file and allows you to parse the data coming from rsync without connecting to a pty. I tested the following on a linux box, and the output of rsync tends to differ across platforms, so the regular expressions to parse the output may vary:

import subprocess, time, tempfile, re

pipe_output, file_name = tempfile.TemporaryFile()
cmd = ["rsync", "-vaz", "-P", "/src/" ,"/dest"]

p = subprocess.Popen(cmd, stdout=pipe_output, 
                     stderr=subprocess.STDOUT)
while p.poll() is None:
    # p.poll() returns None while the program is still running
    # sleep for 1 second
    time.sleep(1)
    last_line =  open(file_name).readlines()
    # it's possible that it hasn't output yet, so continue
    if len(last_line) == 0: continue
    last_line = last_line[-1]
    # Matching to "[bytes downloaded]  number%  [speed] number:number:number"
    match_it = re.match(".* ([0-9]*)%.* ([0-9]*:[0-9]*:[0-9]*).*", last_line)
    if not match_it: continue
    # in this case, the percentage is stored in match_it.group(1), 
    # time in match_it.group(2).  We could do something with it here...

Solution 12 - Python

if you run something like this in a thread and save the ffmpeg_time property in a property of a method so you can access it, it would work very nice I get outputs like this: output be like if you use threading in tkinter

input = 'path/input_file.mp4'
output = 'path/input_file.mp4'
command = "ffmpeg -y -v quiet -stats -i \"" + str(input) + "\" -metadata title=\"@alaa_sanatisharif\" -preset ultrafast -vcodec copy -r 50 -vsync 1 -async 1 \"" + output + "\""
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True, shell=True)
for line in self.process.stdout:
    reg = re.search('\d\d:\d\d:\d\d', line)
    ffmpeg_time = reg.group(0) if reg else ''
    print(ffmpeg_time)

Solution 13 - Python

In Python 3, here's a solution, which takes a command off the command line and delivers real-time nicely decoded strings as they are received.

Receiver (receiver.py):

import subprocess
import sys

cmd = sys.argv[1:]
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in p.stdout:
    print("received: {}".format(line.rstrip().decode("utf-8")))

Example simple program that could generate real-time output (dummy_out.py):

import time
import sys

for i in range(5):
	print("hello {}".format(i))
	sys.stdout.flush()	
	time.sleep(1)

Output:

$python receiver.py python dummy_out.py
received: hello 0
received: hello 1
received: hello 2
received: hello 3
received: hello 4

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJohn AView Question on Stackoverflow
Solution 1 - PythonnoskloView Answer on Stackoverflow
Solution 2 - PythonElvinView Answer on Stackoverflow
Solution 3 - PythonAlbertView Answer on Stackoverflow
Solution 4 - PythonLingView Answer on Stackoverflow
Solution 5 - PythonIBueView Answer on Stackoverflow
Solution 6 - PythonzviadmView Answer on Stackoverflow
Solution 7 - PythonErikView Answer on Stackoverflow
Solution 8 - PythonNithinView Answer on Stackoverflow
Solution 9 - PythonxmcView Answer on Stackoverflow
Solution 10 - PythonWillView Answer on Stackoverflow
Solution 11 - PythonMikeGMView Answer on Stackoverflow
Solution 12 - PythonerfanView Answer on Stackoverflow
Solution 13 - PythonwatsonicView Answer on Stackoverflow