# What is the computer science definition of entropy?

Computer ScienceInformation TheoryEntropyData Compression## Computer Science Problem Overview

I've recently started a course on data compression at my university. However, I find the use of the term "entropy" as it applies to computer science rather ambiguous. As far as I can tell, it roughly translates to the "randomness" of a system or structure.

What is the proper definition of computer science "entropy"?

## Computer Science Solutions

## Solution 1 - Computer Science

Entropy can mean different things:

> In computing, entropy is the > randomness collected by an operating > system or application for use in > cryptography or other uses that > require random data. This randomness > is often collected from hardware > sources, either pre-existing ones such > as mouse movements or specially > provided randomness generators.

> In information theory, entropy is a > measure of the uncertainty associated > with a random variable. The term by > itself in this context usually refers > to the Shannon entropy, which > quantifies, in the sense of an > expected value, the information > contained in a message, usually in > units such as bits. Equivalently, the > Shannon entropy is a measure of the > average information content one is > missing when one does not know the > value of the random variable

**Entropy in data compression**

Entropy in data compression may denote the randomness of the data that you are inputing to the compression algorithm. The more the entropy, the lesser the compression ratio. That means the more random the text is, the lesser you can compress it.

> Shannon's entropy represents an > absolute limit on the best possible > lossless compression of any > communication: treating messages to be > encoded as a sequence of independent > and identically-distributed random > variables, Shannon's source coding > theorem shows that, in the limit, the > average length of the shortest > possible representation to encode the > messages in a given alphabet is their > entropy divided by the logarithm of > the number of symbols in the target > alphabet.

## Solution 2 - Computer Science

My favorite definition, with a more practical focus, is found in Chapter 1 of the excellent book The Pragmatic Programmer: From Journeyman to Master by Andrew Hunt and David Thomas:

> **Software Entropy**
>
> While software development is immune
> from almost all physical laws, entropy
> hits us hard. Entropy is a term from
> physics that refers to the amount of
> "disorder" in a system. Unfortunately,
> the laws of thermodynamics guarantee
> that the entropy in the universe tends
> toward a maximum. When disorder
> increases in software, programmers
> call it "software rot."
>
> There are many factors that can
> contribute to software rot. The most
> important one seems to be the
> psychology, or culture, at work on a
> project. Even if you are a team of
> one, your project's psychology can be
> a very delicate thing. Despite the
> best laid plans and the best people, a
> project can still experience ruin and
> decay during its lifetime. Yet there
> are other projects that, despite
> enormous difficulties and constant
> setbacks, successfully fight nature's
> tendency toward disorder and manage to
> come out pretty well.
>
> ...
>
> ...
>
> **A broken window.**
>
> One broken window, left unrepaired for
> any substantial length of time,
> instills in the inhabitants of the
> building a sense of abandonment—a
> sense that the powers that be don't
> care about the building. So another
> window gets broken. People start
> littering. Graffiti appears. Serious
> structural damage begins. In a
> relatively short space of time, the
> building becomes damaged beyond the
> owner's desire to fix it, and the
> sense of abandonment becomes reality.
>
> The "Broken Window Theory" has
> inspired police departments in New
> York and other major cities to crack
> down on the small stuff in order to
> keep out the big stuff. It works:
> keeping on top of broken windows,
> graffiti, and other small infractions
> has reduced the serious crime level.
>
> Tip 4
>
> Don't Live with Broken Windows
>
>
> Don't leave "broken windows" (bad
> designs, wrong decisions, or poor
> code) unrepaired. Fix each one as soon
> as it is discovered. If there is
> insufficient time to fix it properly,
> then board it up. Perhaps you can
> comment out the offending code, or
> display a "Not Implemented" message,
> or substitute dummy data instead. Take
> some action to prevent further damage
> and to show that you're on top of the
> situation.

Text taken from: http://pragprog.com/the-pragmatic-programmer/extracts/software-entropy

## Solution 3 - Computer Science

I always encountered entropy in the sense of Shannon Entropy.

From http://en.wikipedia.org/wiki/Information_entropy:

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable.

## Solution 4 - Computer Science

from University of Mexico

> The information theoretic notion of > Entropy is a generalization of the > physical notion. There are many ways > to describe Entropy. It is a measure > of the randomness of a random > variable. It is also a measure of the > amount of information a random > variable or stochastic process > contains. It is also a lower bound on > the amount a message can be > compressed. And finally it is the > average number of yes/no questions > that need to be asked about an random > entity to determine its value.

Equation for Entropy in a sample application for probability calculation:

> it is the sum over all values of a rv > of the probability of that value times > the log of that prob(i.e. > p(x)logp(x)). This equation can be > derived from first principles of the > properties of information.

## Solution 5 - Computer Science

Here is a great alternate explanation for *entropy* in information theory.

> Entropy is a measure of **uncertainty involved in making a
> prediction**.

We can also describe entropy as how *surprised* we would be if we get an outcome after we made our initial prediction.

Lets say we have a bent coin that gives us a head 99% of the time and a tail 1% of the time. Since there is only a one percent chance of getting a tail, we would be very surprised if we actually get a tail. On the other hand, it won't be too surprising if we got a head as we already have a 99 percent chance of getting a head.

lets assume that we have a function called `Surprise(x)`

that would give us the amount of surprise for each outcome; then we can average the amount of surprise on a probability distribution. This average amount of surprise could also be used as a measure for how uncertain we are. This uncertainty is called * entropy*.

UPDATE:

I made this visualization to describe relationship between entropy and confidence of the predicted class in an animal image classifier model (machine learning). Here the **entropy** is used as **a measure of how confident the classifier model is in its prediction**.

The diagrams show a comparison of entropy values of predictions from two classifier models. The diagram on the right predicts image of a horse with a relatively high confidence (lower entropy) while the classifier on the left can not really distinguish (higher entropy) whether it's a Horse, a Cow, or a Giraffe.

## Solution 6 - Computer Science

In terms of compression and information theory, the entropy of a source is the average amount of information (in bits) that symbols from the source can convey. Informally speaking, the more unlikely a symbol is, the more surprise its appearance brings.

If your source has two symbols, say `A`

and `B`

, and they are equally likely, then each symbol conveys the same amount of information (one bit). A source with four equally likely symbols conveys two bits per symbol.

For a more interesting example, if your source has three symbols, `A`

, `B`

, and `C`

, where the first two are twice as likely as the third, then the third is more surprising but is also less likely. There's a net entropy of 1.52 for this source, as calculated below.

You calculate entropy as the "average surprise", where the "surprise" for each symbol is its probability times the negative binary log of the probability:

```
binary
symbol weight probability log surprise
A 2 0.4 -1.32 0.53
B 2 0.4 -1.32 0.53
C 1 0.2 -2.32 0.46
total 5 1.0 1.52
```

The negative of the binary log is used (of course) because logs of values between 0 and 1 (exclusive) are negative.

## Solution 7 - Computer Science

### Super SIMPLE definition

The word entropy can be defined in one sentence:

#### "The amount of information needed to describe a system."

Imagine for an example the expansion of the universe: From the beginning, all matter was collected in a small point before the big bang, so we could have described the system with "all matter is within one point." While today significantly more information is required to describe the system (the Universe, that is), one would need to describe all planetary positions, their movement, what's on them etc..
In terms of information theory, the definition also works: E.g: The more letters you add to a password (the system), the more information is needed to describe the password. Then you can measure it in different units, eg bits or characters, like
"hello" = 5 characters entropy = 40 bits of entropy (if charsize is 8 bits).

From this also comes that the more information you have the more ways you can arrange that information in. If you have 40 bits there are 2^40 different ways they can be arranged. If we are talking passwords here then the more possible arrangements of the information (bits) the longer it is going to take cracking (with brute force or dictionary attacks).

## Solution 8 - Computer Science

In simpler words, Entropy defines randomness. It’s more like how unpredictable something is. In more technical words, “In computing, entropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. This randomness is often collected from hardware sources, either pre-existing ones such as mouse movements or specially provided randomness generators.” as defined by wikipedia.

One can now easily conclude the meaning of entropy in respect to a file as the measurement of the how much disordered the bytes are in a file. There are various units used for defining entropy like nat, shannon or hartley. Well, most common unit used is Shannon. The range of values a file’s entropy must come in as per Shannon’s algorithm is 0 to 8. So, when the entropy value is zero, one can say the outcome is certain. On contrary, when the entropy value is 8, the outcome is most unpredictable it could be. The formula given by Shannon to measure randomness in outcome of events is:

```
Entropy = ∑ pi log(1/pi)
```

where *i* is the event with probability *pi*.

This equation will always result in between 0 to 8.

For more information, go through the link: https://www.talentcookie.com/2016/02/file-entropy-in-malware-analysis/

## Solution 9 - Computer Science

Entropy is like a hash code for virus researchers as well. Less entropy you get, it would mean that it is likely encrypted or compressed code which could be potentially be a virus.

A standard binary would have a higher entropy than a compressed or encrypted one.

## Solution 10 - Computer Science

Entropy has many meanings typically in Computer Science. It depends on the context. In security entropy means how much randomality you place, for instance when you generate a private key many applications ask you to move the mouse around to generate entropy. This generates entropy by taking the "human" element of randomality and adds it to the hashing process of generating the key.

Now there is also a defnition for software engineering of entropy. This definition represents out of date code, or code that has had many developers writing it. Typically used in reference to when it is near time to refactor your software project. "The code for this project has an enourmous amount of entropy because many of the individuals who maintained it are not on the project currently".

Here is a third example usage that I remembered too. In the topic of simulated annealing (as far as computer science is concerned), entropy is described as how much decay has happened during the evaluation of the algorithm.

I guess to answer your question though, there is not a concrete definition of the word 'entropy' except for the ones that you can find in a dictionary. How computer science tends to apply that term depends on the context of the term being used and what it is being applied to.

## Solution 11 - Computer Science

It's easy to make a big deal out of entropy. To my mind it is a pretty simple and useful concept.

Basically it quantifies what, on average, you will learn from an event, like flipping a coin, taking a branch instruction, or indexing an array.

Like a comparison operation in the middle of a search algorithm has a certain probability P of taking one branch, and 1-P of taking the other.

Suppose P is 1/2, as it is in a binary search. Then if you take that branch, you know 1 bit more than you did before, because log(2/1), base 2, is 1. On the other hand, if you take the other branch you also learn 1 bit.

To get the average amount of information you will learn, multiply what you learn on the first branch times the probability you take that branch, plus what you learn on the second branch times the probability of that branch.

1/2 times 1 bit, plus 1/2 times 1 bit, is 1/2 bit plus 1/2 bit, or total 1 bit of entropy. That's what you can expect to learn on average from that decision.

On the other hand, suppose you are doing linear search in a table of 1024 entries.

On the first == test, the probability of YES is 1/1024, so the entropy of YES at that decision is

```
1/1024 times log(1024/1)
```

or 1/1024 * 10 = about 1/100 bit.

So if the answer is YES, you learn 10 bits, but the chance of that is about 1 in a thousand.

On the other hand, NO is much more likely. It's entropy is

```
1023/1024 * log(1024/1023)
```

or roughly 1 times roughly zero = about zero.

Add the two together, and on average you will learn about 1/100 of a bit on that decision.

That's why linear search is slow. The entropy (how much you can expect to learn) at each decision is too small, since you're going to have to learn 10 bits to find the entry in the table.

## Solution 12 - Computer Science

Entropy in computer science commonly refers to how random a string of bits is. The following question is about making that precise:

https://stackoverflow.com/questions/2979174/how-do-i-compute-the-approximate-entropy-of-a-bit-string

## Solution 13 - Computer Science

In simple words if you know the probabilities of symbols in the langauge ,one can compute the average information content of symbol in the language .

Or

The entropy of a language is a measure of the information content of an average symbol in the language

Consider a fair coin ;

There are two symbols ,each with probability 1/2 so entropy is calculated as

h =-(1/2*log1/2 +1/2*log1/2)=1

## Solution 14 - Computer Science

entropy refers to the extent where a software is reshaped occasionally basing on customer requirements hence the cost for reshaping it to meet customer reqrments becomes maximum.

## Solution 15 - Computer Science

I've heard people misuse the thermodynamic definitions of entropy w.r.t CS.

E.g. Entropy is definitely increasing in this system.

When what they mean is this code is getting worse and worse!