When compressing and encrypting, should I compress first, or encrypt first?

EncryptionCompressionPerformanceAesZlib

Encryption Problem Overview


If I were to AES-encrypt a file, and then ZLIB-compress it, would the compression be less efficient than if I first compressed and then encrypted?

In other words, should I compress first or encrypt first, or does it matter?

Encryption Solutions


Solution 1 - Encryption

Compress first. Once you encrypt the file you will generate a stream of random data, which will be not be compressible. The compression process depends on finding compressible patterns in the data.

Solution 2 - Encryption

Compression before encryption is surely more space efficient but in the same time less secure. That's why I would disagree with other answers.

Most compression algorithms use "magic" file headers and that could be used for statistical attacks.

For example, there is a CRIME SSL/TLS attack.

Solution 3 - Encryption

If your encryption algorithm is any good (and AES, with a proper chaining mode, is good), then no compressor will be able to shrink the encrypted text. Or, if you prefer it the other way round: if you succeed in compressing some encrypted text, then it is high time to question the quality of the encryption algorithm…

That is because the output of an encryption system should be indistinguishable from purely random data, even by a determined attacker. A compressor is not a malicious attacker, but it works by trying to find non-random patterns which it can represent with fewer bits. The compressor should not be able to find any such pattern in encrypted text.

So you should compress data first, then encrypt the result, not the other way round. This is what is done in, e.g., the OpenPGP format.

Solution 4 - Encryption

Compress first. If you encrypt then your data turns into (essentially) a stream of random bits. Random bits are incompressable because compression looks for patterns in the data and a random stream, by definition, has no patterns.

Solution 5 - Encryption

Of course it matters. It's generally better to compress first and then to encrypt.

ZLib uses Huffman coding and LZ77 compression. The Huffman tree will be more balanced and optimum if it's performed on plain text for instance and so the compression rate will be better.

Encryption can follow after compression even if the compression result appear to be "encrypted" but can easily be detected to be compressed because the file usually starts with PK.

ZLib don't provide encryption natively. That's why I've implemented ZeusProtection. The source code is also available at github.

Solution 6 - Encryption

From a practical perspective, I think you should compress first simply because many files are pre-compressed. For example, video encoding usually involves heavy compression. If you encrypt this video file then compress it, it has now been compressed twice. Not only will the second compression get a dismal compression ratio, but compressing again will take a great deal of resources to compress large files or streams. As Thomas Pornin and Ferruccio stated, compression of encrypted files may have little effect anyway because of the randomness of the encrypted files.

I think the best, and simplest, policy may be to compress files only-as-needed beforehand (using a whitelist or blacklist), then encrypt them regardless.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionseisatsuView Question on Stackoverflow
Solution 1 - EncryptionFerruccioView Answer on Stackoverflow
Solution 2 - EncryptionmaxbublisView Answer on Stackoverflow
Solution 3 - EncryptionThomas PorninView Answer on Stackoverflow
Solution 4 - EncryptionCameron SkinnerView Answer on Stackoverflow
Solution 5 - EncryptionmihaipopescuView Answer on Stackoverflow
Solution 6 - EncryptionVictor StoddardView Answer on Stackoverflow