HttpWebRequest & Native GZip Compression

C#.NetStreamGzipHttp Compression

C# Problem Overview


When requesting a page with Gzip compression I am getting a lot of the following errors:

> System.IO.InvalidDataException: The > CRC in GZip footer does not match the > CRC calculated from the decompressed > data

I am using native GZipStream to decompress and am looking at addressing this. With that in mind is there a work around for addressing this or another GZip library (free?) which will handle this issue properly?

I am verifying the webResponse ContentEncoding is GZIP

Update 5/11 A simplified snippit

//Caller
public void SOSampleGet(string url) 
{
    // Initialize the WebRequest.
    webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = WebRequestMethods.Http.Get;
    webRequest.KeepAlive = true;
    webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
    webRequest.Referer = WebUtil.GetDomain(url);

    HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();    

    using (Stream stream = GetStreamForResponse(webResponse, READTIMEOUT_CONST))
    {
        //use stream
    }
}

//Method
private static Stream GetStreamForResponse(HttpWebResponse webResponse, int readTimeOut)
{
    Stream stream;
    switch (webResponse.ContentEncoding.ToUpperInvariant())
    {
        case "GZIP":
            stream = new GZipStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        case "DEFLATE":
            stream = new DeflateStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        
        default:
            stream = webResponse.GetResponseStream();
            stream.ReadTimeout = readTimeOut;
            break;
        }    
    return stream;
}

C# Solutions


Solution 1 - C#

What about the webrequest AutomaticDecompression Property available since .net 2? Simply add:

webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

It also adds the gzip,deflate to the accept encoding header.

See http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.automaticdecompression.aspx

Solution 2 - C#

For .NET Core things are a little more involved. A GZipStream is needed as there isn't a property (as of writing) for AutomaticCompression. See my answer here: https://stackoverflow.com/a/44508724/2421277

Code from answer:

var req = WebRequest.CreateHttp(uri);

/*
 * Headers
 */
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";

/*
 * Execute
 */
try
{
    using (var resp = await req.GetResponseAsync())
    {
        using (var str = resp.GetResponseStream())
        using (var gsr = new GZipStream(str, CompressionMode.Decompress))
        using (var sr = new StreamReader(gsr))

        {
            string s = await sr.ReadToEndAsync();  
        }
    }
}
catch (WebException ex)
{
    using (HttpWebResponse response = (HttpWebResponse)ex.Response)
    {
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            string respStr = sr.ReadToEnd();
            int statusCode = (int)response.StatusCode;

            string errorMsh = $"Request ({url}) failed ({statusCode}) on, with error: {respStr}";
        }
    }
}

Solution 3 - C#

Are you flushing and closing the stream? Try wrapping your GZipStream with a Using Statement.

Solution 4 - C#

I found some sample code that shows the entire request/response for GZip encoded pages. It uses GZipStream.

http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx

Solution 5 - C#

See my comment above, but this usually is a symptom of a corrupted file. If the site is your own, replace the file you are trying to access.

Solution 6 - C#

The native GZipStream can read a compressed GZIP (RFC 1952) stream, but it can't handle the ZIP file format.

From http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html:

> The disadvantage of using the > GZipStream class over a 3rd party > product is that it has limited > capabilities. One of the limitations > is that you cannot give a name to the > file that you place in the archive. > When GZipStream compresses the file > into a ZIP archive, it takes the > sequence of bytes from that file and > uses compression algorithms that > create a smaller sequence of bytes. > The new sequence of bytes is put into > the new ZIP file. When you open the > ZIP file you will open the archived > file itself; most popular ZIP > extractors (WinZip, WinRar, etc.) will > show you the content of the ZIP as a > file that has the same as the archive > itself.


>> EDIT: The above note is incorrect. GZipStream does not produce a ZIP file. It is not a "Single file ZIP stream". It is a GZIP Stream. They are different things. There's no guarantee that tools that handle ZIP archives will handle a .gz file.


For an implementation that can read ZIP archives, as opposed to single-file ZIP streams, try #ziplib (SharpZipLib, formerly NZipLib).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPatView Question on Stackoverflow
Solution 1 - C#EugeneView Answer on Stackoverflow
Solution 2 - C#pimView Answer on Stackoverflow
Solution 3 - C#Matthew WhitedView Answer on Stackoverflow
Solution 4 - C#Mike LView Answer on Stackoverflow
Solution 5 - C#MichaelICEView Answer on Stackoverflow
Solution 6 - C#AndomarView Answer on Stackoverflow