What is the use of buffering in python's built-in open() function?

PythonPython 2.7

Python Problem Overview


Python Documentation : https://docs.python.org/2/library/functions.html#open

open(name[, mode[, buffering]])  

The above documentation says "The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default.If omitted, the system default is used.".
When I use

filedata = open(file.txt,"r",0)  

or

filedata = open(file.txt,"r",1)  

or

filedata = open(file.txt,"r",2)

or

filedata = open(file.txt,"r",-1) 

or

filedata = open(file.txt,"r")

The output has no change. Each line shown above prints at same speed.
output:

> Mr. Bean is a British television programme series of fifteen 25- > > minute episodes written by Robin Driscoll and starring Rowan Atkinson > as > > the title character. Different episodes were also written by Robin > > Driscoll and Richard Curtis, and one by Ben Elton. Thirteen of the > > episodes were broadcast on ITV, from the pilot on 1 January 1990, > until > > "Goodnight Mr. Bean" on 31 October 1995. A clip show, "The Best Bits > of > > Mr. Bean", was broadcast on 15 December 1995, and one episode, "Hair > by > > Mr. Bean of London", was not broadcast until 2006 on > Nickelodeon.

Then how the buffering parameter in the open() function is useful? What value

of that buffering parameter is best to use?

Python Solutions


Solution 1 - Python

Enabling buffering means that you're not directly interfacing with the OS's representation of a file, or its file system API. Instead, a chunk of data is read from the raw OS filestream into a buffer until it is consumed, at which point more data is fetched into the buffer. In terms of the objects you get, you'll get a BufferedIOBase object wrapping an underlying RawIOBase (which represents the raw file stream).

What is the benefit of this? Well interfacing with the raw stream might have high latency, because the operating system has to fool around with physical objects like the hard disk, and this may not be acceptable in all cases. Let's say you want to read three letters from a file every 5ms and your file is on a crusty old hard disk, or even a network file system. Instead of trying to read from the raw filestream every 5ms, it is better to load a bunch of bytes from the file into a buffer in memory, then consume it at will.

What size of buffer you choose will depend on how you're consuming the data. For the example above, a buffer size of 1 char would be awful, 3 chars would be alright, and any large multiple of 3 chars that doesn't cause a noticeable delay for your users would be ideal.

Solution 2 - Python

You can also check the default buffer size by calling the read only DEFAULT_BUFFER_SIZE attribute from io module.

import io
print (io.DEFAULT_BUFFER_SIZE)

As described here

Solution 3 - Python

Buffering is the process of storing a chunk of a file in a temporary memory until the file loads completely. In python there are different values can be given. If the buffering is set to 0 , then the buffering is off. The buffering will be set to 1 when we need to buffer the file.

Solution 4 - Python

What is perhaps important from practical point of view is that the buffering parameter determines when the data you are sending to the stream is actually saved to disk.

When you open a file without the buffering parameter, and write some stuff to it, you will see the data is written only after the with open(...) as foo: block is exited (or when the file's close() method is called), or when some system-determined default buffer size is reached. But if you set the buffering parameter, it will write the data as soon as that size of the buffer is reached.

Thus using i.e. open('file.txt', 'w', buffering=1) is a useful thing to do when you have a long-running application, and you are sending some data to a file, and you want it to save after each line, and not only after the application quits. Otherwise a crash, or a power outage, etc. could cause the data to be lost.

See also: https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file

Solution 5 - Python

With buffering set to -1 my file write took 13 minutes. With buffering set to 2**10 my file write took 7 seconds. So, the purpose of buffering is to speed up your program.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSrivishnuView Question on Stackoverflow
Solution 1 - PythonAsad SaeeduddinView Answer on Stackoverflow
Solution 2 - PythonN RandhawaView Answer on Stackoverflow
Solution 3 - Pythonjoel.t.mathewView Answer on Stackoverflow
Solution 4 - PythonSimimicView Answer on Stackoverflow
Solution 5 - PythonJohn AbrahamView Answer on Stackoverflow