Communication 101: Information Theory Made REALLY SIMPLE

Claude Shannon’s 1948 paper “A Mathematical Theory of Communication” is the paper that made the digital world we live in possible. Scientific American called it “The Magna Carta of the Information Age.”

Shannon defined modern digital communication and determined things like how much information can be transmitted over a telephone line, the effects of noise on the signal, and the measures you have to take to get a perfect signal on the other end. It made the Internet possible.

Trouble is, it’s tough reading – college level material for engineers and math geeks. HOWEVER Shannon’s concepts are simple and easy to explain. In just a few minutes you’ll understand Shannon’s concepts and you’ll see that any 7th grader can easily grasp them.

1. This is what a communication system looks like:

communication_system1

An encoder receives input and encodes a message according to the rules of the code.

The code is transmitted across a communications channel.

The code is decoded by the decoder, also according to a fixed set of rules and the message is understood on the other side.

2. What is a Code?

The dictionary defines code as “a system of symbols for communication.”

I define Information as: “Communication between an encoder and decoder using agreed-upon symbols.”

Here, we are interested in digital codes. The most important thing in the system is the code itself.

Here are three examples and one non-example: The Genetic Code, ASCII, the U.S. ZIP code, and sunlight:

3codes1

As you see here, a code is a sequence of symbols that has specific meaning. DNA has a four-bit alphabet, the bits are A – C – G – T. Three bits come together to form a letter called a “triplet” or “Codon.” The Codon GGG is an instruction to make Glycine.

It’s very important to understand that GGG (three Guanines in a row) are not themselves Glycine. They are symbolic instructions to MAKE Glycine.

In other words, the message and the medium are not the same thing.

The message is stored in Adenine, Cytosine, Guanine and Thiamine, and those bases operate as symbols.

When strung together these symbols form instructions to build proteins; to place those proteins in specific locations; and to build assemblies such as bones, arms, eyes and blood vessels.

Interesting side note: Two other codons, GGC and GGA, also code for Glycine. Each amino acid can be represented by three different codons, not just one. This is called “redundancy”. DNA has a clever mechanism for reducing copying errors, namely that each amino acid has three Codons that code for it, not just one. It means that some copying errors (GGC instead of GGG) will not cause a problem.

This is very, very important – it’s why DNA has survived over 3 billion years intact. More about redundancy in a minute.

ASCII is the standard computer mapping between 1’s and 0’s and the English alphabet.

A = 1000001

B = 1000010

a = 1100001

b = 1100010

…and so on for the whole English alphabet. When you punch the letter “A” on your keyboard, the keyboard sends a 1000001 to the computer. On the other side, 1000001 is interpreted as “A” and displayed on the screen.

The 1’s and 0’s themselves are in turn coded into a series electrical impulses or magnetic bits on a hard drive. 1000001 might be stored on your hard disk as North / South spots of magnetism, i.e. NSSSSSN.

The letter A is not an electrical impulse, nor is it a magnetic field. The electrical impulse or magnetic field represents A. This is what is meant when we say “The message is independent of the medium it’s stored in.”

You can write the word “dog” with ink on a piece of paper but a dog is not ink or paper, and ink and paper are not a dog.

A long string of DNA like ACGGGTCTTTAAGATG——- that DNA pattern might build a claw, but that string of letters itself is not a claw and the claw is not a string of letters.

One of the most common questions I get about DNA and codes is:

“Why isn’t sunlight a code? Why isn’t radioactive decay a code? Why isn’t H2O a code?”

Sunlight is not a code because sunlight is just a stream of photons. There is no encoder in the sun. That photon does not symbolically represent some other thing. The sun does not send out digital streams of photons that obey the laws of a code.

The photon IS sunlight, it does not SAY sunlight. It does not give instructions for making sunlight. It doesn’t have any instructions at all. It’s just a photon. It represents nothing other than itself.

Now somebody will naturally say, “Yeah, but I can look at sunlight through a spectrum analyzer and recognize that it’s sunlight.” Yes, if you intelligently build a spectrum analyzer you can recognize patterns in the spectrum. However the spectrum is not a sequence of digital symbols.

Sunlight is not a code and neither is a rainbow. A rainbow is a spectrum of light and it represents nothing other than itself.

Radioactive decay is just atomic particles decomposing. There is no symbolic relationship. Same with water. Water is water.

The word “H2O” is a code HUMANS have devised to describe water. But as a molecule it just has two atoms that we call Hydrogen and one atom that we call Oxygen. That’s it. The molecule doesn’t represent anything other than itself.

3. Noise and Information Entropy

One of the most important things in a communication system is noise and how well the system deals with it. Noise destroys information. When we add noise to the communication system, here’s what it looks like:

Noise introduces uncertainty as to what the original message was. It was originally 1000001 (“A”) but the receiver doesn’t know that. It might think the message was 1000100 and give you a letter “d” instead.

When Claude Shannon worked out the math, he found something very surprising: The formula for noise in an information system was identical to the formula for entropy in thermodynamics.

Entropy is the irreversible process of useful energy becoming useless energy. The heat coming out of the exhaust pipe in your car is cooler and a whole lot less useful than the heat inside your engine, and that process is irreversible.

All audio engineers know that noise is also irreversible. Once it’s in your recording, you can’t get it back out. It’s in there for good. All you can do is try to disguise it.

Also, noise NEVER improves a signal. There are a few very narrow applications in digital signal processing where noise can be put to good use (i.e. dither and noise shaping) but noise always destroys information. It never creates it.

Shannon measured information in bits, the exact same way that we measure the size of computer files. So one thing that confuses a lot of people is that when you add noise to a signal, it adds bit information to the signal and the signal appears to have more information. In one sense it does – if you add noise to a signal, the signal does contain more data. But you can’t separate the noise out of the signal once it’s in.

Once your signal is lost it’s gone forever.

Also, there is no such thing anywhere in engineering or computer science as a percentage of the time that noise “accidentally” improves a signal.

Nor is there an “optimum” level of noise that you would want in a signal. The ideal amount of noise to have in a signal is ZERO.

Shannon pointed out that the best way to combat noise was through redundancy: Extra letters or numbers in the signal that help you fill in the blanks if there are missing letters.

For example “The quick brown fox jumps over the lazy dog” is still somewhat readable even if 1/3 of the letters are missing:

“Th q ic br wn fo jum s ove the l zy dog”

That’s because the English language is about 50% redundant. You can usually figure out what the original sentence was as long as at least half the letters are still there.

4. Layers of Communication

Think about this for a second:

Right now, you are seeing a detailed 2-dimensional image on your computer screen. It’s an image of my blog on your monitor.

But…

ALL the information you are looking at originally came into your computer on a wire or wireless network, via a 1-dimensional stream of 1’s and 0’s.

How does a single 1D stream of 1’s and 0’s get turned into 2D or even 3D object?

The answer is: Layers.

One of the essential aspects of communication systems is that the codes, the encoders and decoders have layers.

Layers operate like this – multiple encoders and decoders cascaded together:

Information is encoded from the top layer down, and it is decoded from the bottom layer up.

Here’s what I mean by this:

6 LAYERS OF ENCODING – TOP DOWN:

Meaning

is expressed by sentences

Which are made of words

Which are made of letters

Which are made of 1’s and 0’s

Which are sent on wire via electrons.

6 LAYERS OF DECODING – BOTTOM UP:

Your computer reads the 1’s and 0’s from the wire

And decodes them into letters

The letters form words

The words form sentences

The sentences give

Meaning.

That’s a linguistic explanation of layers.

Here’s a more mechanical version of communication layers:

You type a message into your keyboard

The keyboard encodes the message into ASCII characters

Which go into an email

Which is encoded into a TCP/IP packet

Which is transported across the Internet via copper & fiber

The packet is read from the copper & fiber

The email is decoded from the packet

The ASCII characters are turned into letters on a screen

Which your friend reads on her computer.

Now I want you to notice something that’s vitally important and overlooked by almost everyone:

Let’s say you create a Microsoft Word document. You can edit text in Microsoft Word. But if you edit the file in a plain text editor that’s NOT MS Word, you just see garbage. Then if you try to modify and save it, you’ll just wreck the file.

If you want to successfully edit an email message, you have to edit it in an email program.

If you want to change one of the information layers to make it say something different, you have to be IN that layer to change it.

If you skip layers and try to make changes you’ll just wreck everything.

This is why, in computer systems, there are error checking mechanisms in almost EVERY layer. Most people don’t even know they’re there. There’s an error checking mechanism between the keyboard and the PC. There are error checking mechanisms on the hard drives, in TCP/IP packets, in your Wi-Fi and in your email program.

There are also extensive error checking systems in DNA to make sure the data isn’t corrupted. Because it only takes a very small injury to the packet to irreparably damage the whole thing, to the point where it cannot be decoded at all. Even a tiny flaw in a strand of DNA can cause a birth defect, for example.

So let’s bring this back to biology.

These very simple ideas that Claude Shannon introduced have profound and far-reaching implications for Origin of Life Research and Evolutionary Theory:

1. All communication systems rely on prior agreement between encoder and decoder, otherwise no communication can take place.

This agreement between the two sides must be made in advance. It is abstract just like the symbols of the encoding / decoding table. The agreement begins as an immaterial idea, just like the information itself is immaterial.

All communication systems that we know of are designed. There are no known exceptions to this. And if you stop and think about communication, it’s abstract by its very nature. It’s symbolic. Symbols are immaterial. The symbols have to be chosen in advance. The meaning of the message is independent of the medium that carries it.

The very existence of information overturns the materialistic worldview. Materialistic philosophy has no explanation for the existence of information.

2. Communication systems possess something that matter and energy alone do not possess: Information and symbols. Information is an entity on par with matter and energy. Norbert Weiner, the MIT Mathematician and father of Cybernetics, said, “Information is information, neither matter nor energy. No materialism that denies this can survive the present day.”

3. All communication systems are implicitly purposeful. In DNA, the purpose of GGG is to manufacture Glycine. In ASCII, the purpose of 1000001 is to transmit the letter “A.” In the Zip code, the purpose of 68450 is to make sure letters get to Tecumseh, Nebraska.

4. In communication theory, noise is always your enemy. It is NEVER your friend. Random mutation is noise. Noise is information entropy which is the irreversible destruction of information. Therefore random mutations by definition cannot be the source of new information in evolution. There has to be a different explanation for evolution.

(People constantly say to me: “But once in awhile noise could introduce a beneficial mutation.” To that I say, “Try it. Prove it.” People who’ve actually done this with any real-life information system know better. Communication engineers definitely know better.)

5. Information is always created top-down, not bottom-up. Encoding is top down and decoding is bottom up. The highest layer is intent. The lowest layer is the matter or energy that stores the information, i.e. CD or bases in DNA or radio waves in a Wi-Fi network. (Electrical Engineers call this the “Physical Layer.” The term is taken from the OSI 7-Layer Model for computer networking)

6. Because information is created top-down, existing information has to be decoded first before it can be edited or changed in any beneficial way. Edits have to take place within the layer that they are intended to affect. Edits made on the wrong layer, or noise added, only destroys the information packet.

This means that within the genome, “cellular genetic engineering” must also be done top-down, not bottom-up. This completely overturns the traditional Darwinian assumption of random mutation. Random mutations ALWAYS destroy Internet packets and they always destroy DNA.

Beneficial mutations are engineered by the genome via intelligent algorithm, not random mutation.

Communication theory proves that living things were designed; that they are purposeful (teleological); that the information in DNA operates top-down not bottom up; and that evolution is internally directed by the cell and the genome, not by external damage from the environment.

This turns all the former assumptions of materialistic biology upside down. Everything we know about the information age, computers and the Internet shows us that living things are designed and evolve according to an internal genetic program, not random chance.

Perry Marshall