How does fread really work?

CFread

C Problem Overview


The declaration of fread is as following:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

The question is: Is there a difference in reading performance of two such calls to fread:

char a[1000];
  1. fread(a, 1, 1000, stdin);
  2. fread(a, 1000, 1, stdin);

Will it read 1000 bytes at once each time?

C Solutions


Solution 1 - C

There may or may not be any difference in performance. There is a difference in semantics.

fread(a, 1, 1000, stdin);

attempts to read 1000 data elements, each of which is 1 byte long.

fread(a, 1000, 1, stdin);

attempts to read 1 data element which is 1000 bytes long.

They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.

In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.

There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.

But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.

Solution 2 - C

According to the specification, the two may be treated differently by the implementation.

If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.

Of course, some implementations may still copy the 'partial' element into as many bytes as needed.

Solution 3 - C

That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):

size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
    size_t bytes_requested = size * count;
    size_t bytes_read = read(f->fd, buf, bytes_requested);
    return bytes_read / size;
}

Note that the C and POSIX standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 §7.19.8.1/2).

Edit: See the other answers about POSIX.

Solution 4 - C

fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times. Here is the siimple implementation of fread from Minix

size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;

if (size)
	while ( ndone < nmemb ) {
	s = size;
	do {
		if ((c = getc(stream)) != EOF)
			*cp++ = c;
		else
			return ndone;
	} while (--s);
	ndone++;
}

return ndone;
}

Solution 5 - C

There may be no performance difference, but those calls are not the same.

  • fread returns the number of elements read, so those calls will return different values.
  • If an element cannot be completely read, its value is indeterminate:

> If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)

There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.

Solution 6 - C

One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable

The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.

Inshort in both case data will be accessed by fgetc()...!

Solution 7 - C

I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.

All modern C libraries will have the same performance with the two calls:

fread(a, 1, 1000, file);
fread(a, 1000, 1, file);

Even something like:

for (int i=0; i<1000; i++)
  a[i] = fgetc(file)

Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.

Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.

In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRoman ByshkoView Question on Stackoverflow
Solution 1 - CKeith ThompsonView Answer on Stackoverflow
Solution 2 - CArjunShankarView Answer on Stackoverflow
Solution 3 - CkennytmView Answer on Stackoverflow
Solution 4 - CNeel BasuView Answer on Stackoverflow
Solution 5 - CArtefactoView Answer on Stackoverflow
Solution 6 - CJeegar PatelView Answer on Stackoverflow
Solution 7 - CClarusView Answer on Stackoverflow