Reading from a frequently updated file

PythonFile IoGeneratorFopen

Python Problem Overview


I'm currently writing a program in python on a Linux system. The objective is to read a log file and execute a bash command upon finding a particular string. The log file is being constantly written to by another program.

My question: If I open the file using the open() method will my Python file object be updated as the actual file gets written to by the other program or will I have to reopen the file at timed intervals?

UPDATE: Thanks for answers so far. I perhaps should have mentioned that the file is being written to by a Java EE app so I have no control over when data gets written to it. I've currently got a program that reopens the file every 10 seconds and tries to read from the byte position in the file that it last read up to. For the moment it just prints out the string that's returned. I was hoping that the file did not need to be reopened but the read command would somehow have access to the data written to the file by the Java app.

#!/usr/bin/python
import time

fileBytePos = 0
while True:
    inFile = open('./server.log','r')
    inFile.seek(fileBytePos)
    data = inFile.read()
    print data
    fileBytePos = inFile.tell()
    print fileBytePos
    inFile.close()
    time.sleep(10)

Thanks for the tips on pyinotify and generators. I'm going to have a look at these for a nicer solution.

Python Solutions


Solution 1 - Python

I would recommend looking at David Beazley's Generator Tricks for Python, especially Part 5: Processing Infinite Data. It will handle the Python equivalent of a tail -f logfile command in real-time.

# follow.py
#
# Follow a file like tail -f.

import time
def follow(thefile):
    thefile.seek(0,2)
    while True:
        line = thefile.readline()
        if not line:
            time.sleep(0.1)
            continue
        yield line

if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print line,

Solution 2 - Python

"An interactive session is worth 1000 words"

>>> f1 = open("bla.txt", "wt")
>>> f2 = open("bla.txt", "rt")
>>> f1.write("bleh")
>>> f2.read()
''
>>> f1.flush()
>>> f2.read()
'bleh'
>>> f1.write("blargh")
>>> f1.flush()
>>> f2.read()
'blargh'

In other words - yes, a single "open" will do.

Solution 3 - Python

Here is a slightly modified version of Jeff Bauer answer which is resistant to file truncation. Very useful if your file is being processed by logrotate.

import os
import time

def follow(name):
    current = open(name, "r")
    curino = os.fstat(current.fileno()).st_ino
    while True:
        while True:
            line = current.readline()
            if not line:
                break
            yield line

        try:
            if os.stat(name).st_ino != curino:
                new = open(name, "r")
                current.close()
                current = new
                curino = os.fstat(current.fileno()).st_ino
                continue
        except IOError:
            pass
        time.sleep(1)


if __name__ == '__main__':
    fname = "test.log"
    for l in follow(fname):
        print "LINE: {}".format(l)

Solution 4 - Python

Since you're targeting a Linux system, you can use pyinotify to notify you when the file changes.

There's also this trick, which may work fine for you. It uses file.seek to do what tail -f does.

Solution 5 - Python

I am no expert here but I think you will have to use some kind of observer pattern to passively watch the file and then fire off an event that reopens the file when a change occurs. As for how to actually implement this, I have no idea.

I do not think that open() will open the file in realtime as you suggest.

Solution 6 - Python

If you have the code reading the file running in a while loop:

f = open('/tmp/workfile', 'r')
while(1):
    line = f.readline()
    if line.find("ONE") != -1:
        print "Got it"

and you are writing to that same file ( in append mode ) from another program. As soon as "ONE" is appended in the file you will get the print. You can take whatever action you want to take. In short, you dont have to reopen the file at regular intervals.

>>> f = open('/tmp/workfile', 'a')
>>> f.write("One\n")
>>> f.close()
>>> f = open('/tmp/workfile', 'a')
>>> f.write("ONE\n")
>>> f.close()

Solution 7 - Python

I have a similar use case, and I have written the following snippet for it. While some may argue that this is not the most ideal way to do it, this gets the job done and looks easy enough to understand.

def reading_log_files(filename):
    with open(filename, "r") as f:
        data = f.read().splitlines()
    return data


def log_generator(filename, period=1):
    data = reading_log_files(filename)
    while True:
        time.sleep(period)
        new_data = reading_log_files(filename)
        yield new_data[len(data):]
        data = new_data


if __name__ == '__main__':
    x = log_generator(</path/to/log/file.log>)
    for lines in x:
        print(lines)
        # lines will be a list of new lines added at the end

Hope you find this useful

Solution 8 - Python

It depends on what exactly you want to do with the file. There are two potential use-cases with this:

  1. Reading appended contents from a continuously updated file such as a log file.
  2. Reading contents from a file which is overwritten continuously (such as the network statistics file in *nix systems)

As other people have elaborately answered on how to address scenario #1, I would like to help with those who need scenario #2. Basically you need to reset the file pointer to 0 using seek(0) (or whichever position you want to read from) before calling read() n+1th time.

Your code can look somewhat like the below function.

def generate_network_statistics(iface='wlan0'):
    with open('/sys/class/net/' + iface + '/statistics/' + 'rx' + '_bytes', 'r') as rx:
        with open('/sys/class/net/' + iface + '/statistics/' + 'tx' + '_bytes', 'r') as tx:
            with open('/proc/uptime', 'r') as uptime:
                while True:
                    receive = int(rx.read())
                    rx.seek(0)
                    transmit = int(tx.read())
                    tx.seek(0)
                    uptime_seconds = int(uptime.read())
                    uptime.seek(0)
                    print("Receive: %i, Transmit: %i" % (receive, transmit))
                    time.sleep(1)

Solution 9 - Python

Keep the file handle open even if an empty string is returned at the end of the file, and try again to read it after some sleep time.

    import time

    syslog = '/var/log/syslog'
    sleep_time_in_seconds = 1

    try:
        with open(syslog, 'r', errors='ignore') as f:
            while True:
                for line in f:
                    if line:
                        print(line.strip())
                        # do whatever you want to do on the line
                time.sleep(sleep_time_in_seconds)
    except IOError as e:
        print('Cannot open the file {}. Error: {}'.format(syslog, e))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJimSView Question on Stackoverflow
Solution 1 - PythonJeff BauerView Answer on Stackoverflow
Solution 2 - PythonjsbuenoView Answer on Stackoverflow
Solution 3 - PythonAndrew DruchenkoView Answer on Stackoverflow
Solution 4 - PythonnmichaelsView Answer on Stackoverflow
Solution 5 - PythonAdam PointerView Answer on Stackoverflow
Solution 6 - Pythonw00tView Answer on Stackoverflow
Solution 7 - Pythonnoob_coderView Answer on Stackoverflow
Solution 8 - PythonDheeraj PbView Answer on Stackoverflow
Solution 9 - PythonNasimuddin AnsariView Answer on Stackoverflow