Best way to read a large file into a byte array in C#?

C#.NetBytearrayBinary Data

C# Problem Overview


I have a web server which will read large binary files (several megabytes) into byte arrays. The server could be reading several files at the same time (different page requests), so I am looking for the most optimized way for doing this without taxing the CPU too much. Is the code below good enough?

public byte[] FileToByteArray(string fileName)
{
    byte[] buff = null;
    FileStream fs = new FileStream(fileName, 
                                   FileMode.Open, 
                                   FileAccess.Read);
    BinaryReader br = new BinaryReader(fs);
    long numBytes = new FileInfo(fileName).Length;
    buff = br.ReadBytes((int) numBytes);
    return buff;
}

C# Solutions


Solution 1 - C#

Simply replace the whole thing with:

return File.ReadAllBytes(fileName);

However, if you are concerned about the memory consumption, you should not read the whole file into memory all at once at all. You should do that in chunks.

Solution 2 - C#

I might argue that the answer here generally is "don't". Unless you absolutely need all the data at once, consider using a Stream-based API (or some variant of reader / iterator). That is especially important when you have multiple parallel operations (as suggested by the question) to minimise system load and maximise throughput.

For example, if you are streaming data to a caller:

Stream dest = ...
using(Stream source = File.OpenRead(path)) {
    byte[] buffer = new byte[2048];
    int bytesRead;
    while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0) {
        dest.Write(buffer, 0, bytesRead);
    }
}

Solution 3 - C#

I would think this:

byte[] file = System.IO.File.ReadAllBytes(fileName);

Solution 4 - C#

Your code can be factored to this (in lieu of File.ReadAllBytes):

public byte[] ReadAllBytes(string fileName)
{
    byte[] buffer = null;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, (int)fs.Length);
    }
    return buffer;
} 

Note the Integer.MaxValue - file size limitation placed by the Read method. In other words you can only read a 2GB chunk at once.

Also note that the last argument to the FileStream is a buffer size.

I would also suggest reading about FileStream and BufferedStream.

As always a simple sample program to profile which is fastest will be most beneficial.

Also your underlying hardware will have a large effect on performance. Are you using server based hard disk drives with large caches and a RAID card with onboard memory cache? Or are you using a standard drive connected to the IDE port?

Solution 5 - C#

Depending on the frequency of operations, the size of the files, and the number of files you're looking at, there are other performance issues to take into consideration. One thing to remember, is that each of your byte arrays will be released at the mercy of the garbage collector. If you're not caching any of that data, you could end up creating a lot of garbage and be losing most of your performance to % Time in GC. If the chunks are larger than 85K, you'll be allocating to the Large Object Heap(LOH) which will require a collection of all generations to free up (this is very expensive, and on a server will stop all execution while it's going on). Additionally, if you have a ton of objects on the LOH, you can end up with LOH fragmentation (the LOH is never compacted) which leads to poor performance and out of memory exceptions. You can recycle the process once you hit a certain point, but I don't know if that's a best practice.

The point is, you should consider the full life cycle of your app before necessarily just reading all the bytes into memory the fastest way possible or you might be trading short term performance for overall performance.

Solution 6 - C#

I'd say BinaryReader is fine, but can be refactored to this, instead of all those lines of code for getting the length of the buffer:

public byte[] FileToByteArray(string fileName)
{
    byte[] fileData = null;

    using (FileStream fs = File.OpenRead(fileName)) 
    { 
        using (BinaryReader binaryReader = new BinaryReader(fs))
        {
            fileData = binaryReader.ReadBytes((int)fs.Length); 
        }
    }
    return fileData;
}

Should be better than using .ReadAllBytes(), since I saw in the comments on the top response that includes .ReadAllBytes() that one of the commenters had problems with files > 600 MB, since a BinaryReader is meant for this sort of thing. Also, putting it in a using statement ensures the FileStream and BinaryReader are closed and disposed.

Solution 7 - C#

In case with 'a large file' is meant beyond the 4GB limit, then my following written code logic is appropriate. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries. In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size. (using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)

private uint Crc32CAlgorithmBigCrc(string fileName)
{
    uint hash = 0;
    byte[] buffer = null;
    FileInfo fileInfo = new FileInfo(fileName);
    long fileLength = fileInfo.Length;
    int blockSize = 1024000000;
    decimal div = fileLength / blockSize;
    int blocks = (int)Math.Floor(div);
    int restBytes = (int)(fileLength - (blocks * blockSize));
    long offsetFile = 0;
    uint interHash = 0;
    Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
    bool firstBlock = true;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[blockSize];
        using (BinaryReader br = new BinaryReader(fs))
        {
            while (blocks > 0)
            {
                blocks -= 1;
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(blockSize);
                if (firstBlock)
                {
                    firstBlock = false;
                    interHash = Crc32CAlgorithm.Compute(buffer);
                    hash = interHash;
                }
                else
                {
                    hash = Crc32CAlgorithm.Append(interHash, buffer);
                }
                offsetFile += blockSize;
            }
            if (restBytes > 0)
            {
                Array.Resize(ref buffer, restBytes);
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(restBytes);
                hash = Crc32CAlgorithm.Append(interHash, buffer);
            }
            buffer = null;
        }
    }
    //MessageBox.Show(hash.ToString());
    //MessageBox.Show(hash.ToString("X"));
    return hash;
}

Solution 8 - C#

Overview: if your image is added as a action= embedded resource then use the GetExecutingAssembly to retrieve the jpg resource into a stream then read the binary data in the stream into an byte array

   public byte[] GetAImage()
    {
        byte[] bytes=null;
        var assembly = Assembly.GetExecutingAssembly();
        var resourceName = "MYWebApi.Images.X_my_image.jpg";

        using (Stream stream = assembly.GetManifestResourceStream(resourceName))
        {
            bytes = new byte[stream.Length];
            stream.Read(bytes, 0, (int)stream.Length);
        }
        return bytes;
        
    }

Solution 9 - C#

Use the BufferedStream class in C# to improve performance. A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance.

See the following for a code example and additional explanation: http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

Solution 10 - C#

use this:

 bytesRead = responseStream.ReadAsync(buffer, 0, Length).Result;

Solution 11 - C#

I would recommend trying the Response.TransferFile() method then a Response.Flush() and Response.End() for serving your large files.

Solution 12 - C#

If you're dealing with files above 2 GB, you'll find that the above methods fail.

It's much easier just to hand the stream off to MD5 and allow that to chunk your file for you:

private byte[] computeFileHash(string filename)
{
    MD5 md5 = MD5.Create();
    using (FileStream fs = new FileStream(filename, FileMode.Open))
    {
        byte[] hash = md5.ComputeHash(fs);
        return hash;
    }
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTony_HenrichView Question on Stackoverflow
Solution 1 - C#mmxView Answer on Stackoverflow
Solution 2 - C#Marc GravellView Answer on Stackoverflow
Solution 3 - C#PowerlordView Answer on Stackoverflow
Solution 4 - C#user113476View Answer on Stackoverflow
Solution 5 - C#JoelView Answer on Stackoverflow
Solution 6 - C#vapcguyView Answer on Stackoverflow
Solution 7 - C#Menno de RuiterView Answer on Stackoverflow
Solution 8 - C#Golden LionView Answer on Stackoverflow
Solution 9 - C#Todd MosesView Answer on Stackoverflow
Solution 10 - C#Disha SharmaView Answer on Stackoverflow
Solution 11 - C#DaveView Answer on Stackoverflow
Solution 12 - C#elaverickView Answer on Stackoverflow