Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
I dont think Shannon has had the credits he deserves. He should be right up there, near Darwin and Einstein, among the few greatest
scientists mankind has ever had the chance to have. Yet, hes hardly known by the public More than an explanation of Shannons
ideas, this article is a tribute to him.
Bits
Now, to understand how important his ideas are, lets go back in time and consider telecommunications in the 1940s. Back then, the
telephone network was quickly developing, both in North America and Europe. The two networks then got connected. But when a
message was sent through the Atlantic Ocean, it couldnt be read at the other end.
Why? What happened?
http://www.science4all.org/article/shannons-information-theory/
1/14
7/10/2016
As the message travelled through the Atlantic Ocean, it got weakened and weakened. Eventually, it was so weak that it was
unreadable. Imagine the message was the logo of Science4All. The following gure displays what happened:
At that time, it seemed to be impossible to get rid of the noise. There really seemed to be this fundamental limit to communication over
long distances. No matter when or how you amplify the message, the noise will still be much bigger than the message once it arrives in
Europe. But then came Claude Shannon
What did Shannon do?
Wonders! Among these wonders was an amazingly simple solution to communication. This idea comes from the observation that all
messages can be converted into binary digits, better known as bits. For instance, using the PNG format, the logo of Science4All can be
digitized into bits as follows:
Bits are not to be confused for bytes. A byte equals 8 bits. Thus, 1,000 bytes equal 8,000 bits.
This digitization of messages has revolutionized our world in a way that we too often forget to be fascinated by.
What do bits change to the communication problem?
http://www.science4all.org/article/shannons-information-theory/
2/14
7/10/2016
Now, instead of simply amplifying the message, we can read it before. Because the digitized message is a sequel of 0s and 1s, it can be
read and repeated exactly. By replacing simple ampliers by readers and ampliers (known as regenerative repeaters), we can now
easily get messages through the Atlantic Ocean. And all over the world, as displayed below:
This gure is just a representation. The noise rather occurs on the bits. It sort of make bits take values around 0 and 1. The reader then considers that values like 0.1
equal 0, and repeats and amplify 0 instead of 0.1.
Now, in the rst page of his article, Shannon clearly says that the idea of bits is J. W. Tukeys. But, in a sense, this digitization is just an
approximation of Shannons more fundamental concept of bits. This more fundamental concept of bits is the quantication of
information, and is sometimes referred to as Shannons Bits.
Shannons Bits
Obviously, the most important concept of Shannons information theory is information. Although we all seem to have an idea of what
information is, its nearly impossible to dene it clearly. And, surely enough, the denition given by Shannon seems to come out of
nowhere. But it works fantastically.
Whats the denition?
According to Shannons brilliant theory, the concept of information strongly depends on the context. For instance, my full rst name is
L Nguyn. But in western countries, people simply call me L. Meanwhile, in Vietnam, people rather use my full rst name.
Somehow, the word L is not enough to identify me in Vietnam, as its a common name over there. In other words, the word L has
less information in Vietnam than in western countries. Similarly, if you talk about the man with hair, you are not giving away a
lot of information, unless you are surrounded by soldiers who nearly all have their hair cut.
But what is a context in mathematical terms?
A context corresponds to what messages you expect. More precisely, the context is dened by the probability of the messages. In our
example, the probability of calling someone L in western countries is much less likely than in Vietnam. Thus, the context of messages
in Vietnam strongly differs from the context of western countries.
OK So now, whats information?
Well, we said that the information of L is greater in western countries
So the rarer the message, the more information it has?
Yes! If p is the probability of the message, then its information is related to 1/p . But this is not how Shannon quantied it, as this
quantication would not have nice properties. Shannons great idea was to dene information rather as the number of bits required to
write the number 1/p . This number is its logarithm in base 2, which we denote log (1/p) .
2
If youre uncomfortable with logarithms, read my article on these mathematical operators. You dont need a full understanding of logarithms to read through the rest of
the articles though. If you do know about logarithms, you have certainly noticed that, more often than not, Shannons number of bits is not a whole number.
Now, this means that it would require more bits to digitize the word L in western countries than in Vietnam, as displayed below:
http://www.science4all.org/article/shannons-information-theory/
3/14
7/10/2016
http://www.science4all.org/article/shannons-information-theory/
4/14
7/10/2016
Wow! Indeed! But can you explain how Shannons theory is applied to telecommuncations?
Yes! As Shannon put it in his seminal paper, telecommunication cannot be thought in terms of information of a particular message.
Indeed, a communication device has to be able to work with any information of the context. This has led Shannon to (re)-dene the
fundamental concept of entropy, which talks about information of a context.
Theres a funny story about the coining of the term coining of the term entropy, which Shannon rst wanted to call uncertainty function. But John von Neumann
gave him the following advice:
You should call it entropy, for two reasons. In the rst place your uncertainty function has been used in statistical
mechanics under that name, so it already has a name. In the second place, and more important, no one really knows
what entropy really is, so in a debate you will always have the advantage.
Shannons Entropy
In 1877, Ludwig Boltzmann shook the world of physics by dening the entropy of gases, which greatly conrmed the atomic theory.
He dened the entropy more or less as the logarithm of the number of microstates which correspond to a macrostate. For instance, a
macrostate would say that a set of particles has a certain volume, pressure, mass and temperature. Meanwhile, a microstate denes the
position and velocity of every particle.
http://www.science4all.org/article/shannons-information-theory/
5/14
7/10/2016
Find out more about entropy in thermodynamics with my article on the second law.
The brilliance of Shannon was to focus on the essence of Boltzmanns idea and to provide the broader framework in which to dene
entropy.
Whats Shannons denition of entropy?
Shannons entropy is dened for a context and equals the average amount of information provided by messages of the context. Since
each message is given with probability p and has information log (1/p), the average amount of information is the sum for all messages
of p log (1/p). This is explained in the following gure, where each color stands for a possible message of the context:
2
In the case of a continuous probability with a density function f , the entropy can be dened as the integral of f
provides a powerful understanding of information.
log (1/f )
2
6/14
7/10/2016
Shannons Equivocation
By considering a conditional probability, Shannon dened conditional entropy, also known as Shannons equivocations. Lets
consider the entropy of a message conditional to its introduction. For any given introduction, the message can be described with a
conditional probability. This denes a entropy conditional to the given introduction. Now, the conditional entropy is the average of this
entropy conditional to the given introduction, when this given introduction follows the probabilistic distribution of introductions.
Roughly said, the conditional entropy is the average added information of the message given its introduction.
Its getting complicated
I know! But if you manage to get your head around that, youll understand much of the greatest ideas of Shannon.
Does this denition even match common sense?
Yes! Common sense says that the added information of a message to its introduction should not be larger than the information of the
message. This translates into saying that the conditional entropy should be lower than the non-conditional entropy. This is a theorem
proven by Shannon! In fact, he went further and quantied this sentence: The entropy of a message is the sum of the entropy of its
introduction and the entropy of the message conditional to its introduction!
Im lost!
Fortunately, everything can be more easily understood on a gure. The amount of information of the introduction and the message can
be drawn as circles. Because they are not independent, they have some mutual information, which is the intersection of the circles. The
conditional entropies correspond to whats missing from the mutual information to retrieve the entire entropies:
http://www.science4all.org/article/shannons-information-theory/
7/14
7/10/2016
As you can see, in the second case, conditional entropies are nil. Indeed, once we know the result of the sensor, then the coin no longer
provides any information. Thus, in average, the conditional information of the coin is zero. In other words, the conditional entropy is
nil.
Waw This formalism really is powerful to talk about information!
It surely is! In fact, its so powerful that some of the weirdest phenomena of quantum mechanics like the mysterious entanglement
might be explainable with a generalization of information theory known as quantum information theory.
I dont know much about quantum information theory, but Id love to know more. If you can, please write an article on that topic!
As it turns out, the decrease of entropy when we consider concatenations of letters and words is a common feature of all human
languages and of dolphin languages too! This has led extraterrestrial intelligence seekers to search for electromagnetic signals from
outer spaces which share this common feature too, as explained in this brilliant video by Art of the Problem:
http://www.science4all.org/article/shannons-information-theory/
8/14
7/10/2016
In some sense, researchers assimilate intelligence to the mere ability to decrease entropy. What an interesting thing to ponder upon!
Shannons Capacity
Lets now talk about communication! A communication consists in a sending of symbols through a channel to some other end. Now,
we usually consider that this channel can carry a limited amount of information every second. Shannon calls this limit the capacity of
the channel. It is measured in bits per second, although nowadays we rather use units like megabits per second (Mbit/s) or megabytes
per second (MB/s).
Why would channels have capacities?
The channel is usually using a physical measurable quantity to send a message. This can be the pressure of air in case of oral
communication. For longer telecommunications, we use the electromagnetic eld. The message is then encoded by mixing it into a
high frequency signal. The frequency of the signal is the limit, as using messages with higher frequencies would profoundly modify
the fundamental frequency of the signal. But dont bother too much with these details. Whats of concern to us here is that a channel
has a capacity.
Can you provide an example?
Sure. Imagine there was a gigantic network of telecommunication spread all over the world to exchange data, like texts and images.
Lets call it the Internet. How fast can we download images from the servers of the Internet to our computers? Using the basic
formatting called Bitmap or BMP, we can encode images pixels per pixels. The encoded images are then decomposed into a certain
number of bits. The average rate of transfer is then deduced from the average size of encoded images and the channels capacity:
http://www.science4all.org/article/shannons-information-theory/
9/14
7/10/2016
In the example, using bitmap encoding, the images can be transfered at the rate of 5 images per second. In the webpage you are
currently looking at, there are about a dozen images. This means that more than 2 seconds would be required for the webpage to be
downloaded on your computer. Thats not very fast
Cant we transfer images faster?
Yes, we can. The capacity cannot be exceed, but the encoding of images can be improved. Now, what Shannon proved is that we can
come up with encodings such that the average size of the images nearly maps Shannons entropy! With these nearly optimal encodings,
an optimal rate of image le transfer can be reached, as displayed below:
This formula is called Shannons fundamental theorem of noiseless channels. It is basically a direct application of the concept of
entropy.
Noiseless channels? What do you mean?
I mean that we have here assumed that the received data was identical to whats sent! This is not the case in actual communication. As
opposed to what we have discussed in the rst section of this article, even bits can be badly communicated.
Shannons Redundancy
In actual communication, its possible that 10% of the bits get wrong.
Does this mean that only 90% of the information gets through?
http://www.science4all.org/article/shannons-information-theory/
10/14
7/10/2016
No! The problem is that we dont know which are the bits which got wrong. In your case, the information that gets through is thus less
than 90%.
So how did Shannon cope with noise?
His amazing insight was to consider that the received deformed message is still described by a probability, which is conditional to the
sent message. This is where the language of equivocation or conditional entropy is essential. In the noiseless case, given a sent
message, the received message is certain. In other words, the conditional probability is reduced to a probability 1 that the received
message is the sent message. In Shannons powerful language, this all beautifully boils down to saying that the conditional entropy of
the received message is nil. Or, even more precisely, the mutual information equals both the entropies of the received and of the sent
message. Just like the sensor detecting the coin in the above example.
What about the general case?
The relevant information received at the other end is the mutual information. This mutual information is precisely the entropy
communicated by the channel. Shannons revolutionary theorem says that we can provide the missing information by sending a
correction message whose entropy is this conditional entropy of the sent message given the received message. This correction message
is known as Shannons redundancy.
This fundamental theorem is described in the following gure, where the word entropy can be replaced by average information:
Im skipping through a bit of technical details here, as I just want to show you the main idea of redundancy. To be accurate, I should talk in terms of entropies per
second with an optimal encoding.
Shannon proved that by adding redundancy with enough entropy, we could reconstruct the information perfectly almost surely (with a
probability as close to 1 as possible). This idea is another of Shannons earthshaking idea. Quite often, the redundant message is sent
with the message, and guarantees that, almost surely, the message will be readable once received. Its like having to read articles again
and again to nally retrieve its information.
So redundancy is basically repeating the message?
There are smarter ways to do so, as my students sometimes recall me by asking me to reexplain reasonings differently. Shannon
worked on that later, and managed other remarkable breakthroughs. Similarly to theorems I have mentioned above, Shannons theorem
http://www.science4all.org/article/shannons-information-theory/
11/14
7/10/2016
for noisy channels provides a limit to the minimum quantity of redundancy required to almost surely retrieve the message. In practice,
this limit is hard to reach though, as it depends on the probabilistic structure of the information.
Does Shannon theorem explain why the English language is so redundant?
Yes! Redundancy is essential in common languages, as we dont actually catch most of whats said. But, because English is so
redundant, we can guess whats missing from what weve heard. For instance, whenever you hear I l*v* cake, you can easily ll the
blank. Whats particularly surprising is that we actually do most of this reconstitution without even being aware of it! You dont
believe me? Check the McGurk effect, explained here by Myles Power and Alex Dainis:
It wouldnt surprise me to nd out that languages are nearly optimized for oral communications in Shannons sense. Although there
denitely are other factors coming in play, which have to explain, for instance, why the French language is so more redundant than
English
Lets Conclude
What Ive presented here are just the few fundamental ideas of Shannon for messages with discrete probabilities. Claude Shannon then
moves on generalizing these ideas to discuss communication using actual electromagnetic signals, whose probabilities now have to be
described using probabilistic density functions. Although this doesnt affect the profound fundamental ideas of information and
communication, it does lead to a much more complex mathematical study. Once again, Shannons work is fantastic. But, instead of
trusting me, you probably should rather listen to his colleagues who have inherited his theory in this documentary by UCTV:
http://www.science4all.org/article/shannons-information-theory/
12/14
7/10/2016
Shannon did not only write the 1948 paper. In fact, the rst major breakthrough he did was back when he was a Masters student at
MIT. His thesis is by far the most inuential Masters thesis of all time, as it shows how exploiting boolean algebra could enable to
produce machines that would compute anything. In other words, in his Masters thesis, Shannon drew the blueprints of computers!
Shannon also made crucial progress in cryptography and articial intelligence.
I can only invite you to go further and learn more. This is whats commonly called open your mind. Im going to conclude with this,
but in Shannons language Increase the entropy of your thoughts!
40
More on Science4All
Entropy and the Second Law of Thermodynamics
Conditional Probabilities: Know what you Learn
More Elsewhere
Episode 2: Language of Coins (Information Theory) on Art of the Problem.
What is Information? Part 2a Information Theory on Cracking the Nutshell.
Without Shannon's information theory there would have been no internet on The Guardian.
The Legacy of Entropy on Everything about Data Analytics.
Comments
http://www.science4all.org/article/shannons-information-theory/
13/14