Is DNA a Code?


  1. Code is defined as the rules of communication between an encoder (a “writer” or “speaker”) and a decoder (a “reader” or “listener”) using agreed upon symbols.
  2. DNA’s definition as a literal code (and not a figurative one) is nearly universal in the entire body of biological literature since the 1960’s.
  3. DNA code has much in common with human language and computer languages
  4. DNA transcription is an encoding / decoding mechanism isomorphic with Claude Shannon’s 1948 model: The sequence of base pairs is encoded into messenger RNA which is decoded into proteins.
  5. Information theory terms and ideas applied to DNA are not metaphorical, but in fact quite literal in every way. In other words, the information theory argument for design is not based on analogy at all. It is direct application of mathematics to DNA, which by definition is a code.

Direct comparison between communication systems in Electrical Engineering and the DNA communication system:

Above: Claude Shannon’s communication model (From The Mathematical Theory of Communication, University of Illinois Press, 1998).

Above: Hubert Yockey’s DNA communication channel model. Notice that it contains the exact same components as Shannon’s – the two systems are isomorphic. My thesis is that communication systems of this type are always, without exception, products of design. (From Hubert Yockey, Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005.)

Please Define What You Mean By “Code.”

This question hinges on the definition of “code” and whether it is metaphorical when applied to DNA or whether it is technically identical to its use in human language. Until this question is addressed, this is nothing more than an empty assertion.

Mr. Marshall is begging the question until he provides a definition of “code” that circumscribes his examples but rules out stuff like the other examples [bee waggle dances, bird songs, whale songs, ant communication by pheromone].

I define “Coded information” as a system of symbols used by an encoding and decoding mechanism, which transmits a message that is independent of the communication medium.

Examples of code include English, Chinese, computer languages, music, mating calls and radio signals. Codes always involve a system of symbols that represent ideas or plans. Other examples include, yes, Bee waggle dances. Bird songs. Whale songs. And ant communication by pheromone.

Since all the above are derivatives of DNA, my challenge to naturalists is to cite a single example of coded information that occurs naturally – outside the realm of life, outside the realm of DNA. All you need is one example.

DNA is not merely a molecule with a pattern; it is a code, a language, and an information storage mechanism.”: FALSE: – DNA is only analogous to code.


The book Information Theory, Evolution and the Origin of Life is written by Hubert Yockey, the foremost living specialist in bioinformatics. The publisher is Cambridge University press. Yockey rigorously demonstrates that the coding process in DNA is identical to the coding process and mathematical definitions used in Electrical Engineering. This is not subjective, it is not debatable or even controversial. It is a brute fact:

“Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory (Shannon, 1948) and are not synonyms, metaphors, or analogies.” (Hubert P. Yockey, Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005)

The photograph below was taken on Saturday, March 3, 2007 at the Evolving Planet Exhibit at the Field Museum of Natural History in Chicago, Illinois:

According to the Field Museum, DNA base pairs are “codes, or instructions, that specify the characteristics of an organism, from a body’s sex to the color of a pea” and says “the discovery of DNA’s structure unlocked the chemical code to heredity.” (Notice that the words code and codes are not in quotation marks but are used quite literally.)

In human language, symbols are arbitrary. In DNA they are fixed by chemistry. This is a very big difference. By that criteria many other things are codes too: The spatial distribution of the sizes of pebbles below a rapids, the pattern and orientation of sand dunes, the layers inside a hailstone, and tree rings. All contain the transformation of one representation (time, for example) into another (tree rings). Tree rings also encode information about local climate in their varying widths. Mr. Marshall must give a formal definition of “code” rather than a series of examples that conceal significant differences.

The definition of code I have provided is sufficient and applies whether the code is arbitrary or not. Again, I define “Coded Information” as a system of symbols used by an encoding and decoding mechanism, which transmits a message representing an idea or plan.

If there are pebbles below a rapids, there are pebbles below a rapids. There is no coded information associated with them – unless you measure their size, in which case you have created information to describe the pebbles, based on your chosen symbols and units of measurement. Same with orientation of sand dunes, layers of hailstone. Those objects represent only themselves; there is no encoding and decoding mechanism within these material objects, such as there is in DNA. If someone says the layers of a hailstone are an encoding mechanism, I reply that there is no convention of symbols, nor is there a decoding mechanism.

The information in DNA is independent of the communication medium insofar as every strand of DNA in your body represents a complete plan for your body; even though the DNA strand itself is only a sequence of symbols made up of chemicals (A, G, C, T). We could store a CAD drawing of a hard drive on the same model of hard drive, but the medium and the message are two distinctly different things. Such symbolic relationships only exist within the realm of living things; they do not occur naturally.

If you disagree, all you need is one example.

DNA is subject to random mutation. What intelligently created code experiences random mutation?

All communication systems are subject to mutations, following the laws of probability. That’s why Ethernet and TCP/IP have error correction and redundancy features. DNA has error correction and redundancy features as well. Note that mutation, noise and entropy are all the exact same thing in communication theory – and they are all undesirable.

The analogy between language and DNA has problems. Perry says that a specific DNA strand completely specifies a particular cat. But I do not agree with this because environment has a role to play).

Environment most definitely has a role to play in the development of any biological organism. It also plays a role with my computer – someone could walk into the room and smash your computer with a big rock. Just because your monitor is broken or the blue pixels don’t work, doesn’t mean the picture of aunt Mildred on the hard drive isn’t a code. The computer can crash, a printer can freeze while it’s printing the picture; doesn’t change the fact that the JPG of Aunt Mildred uniquely specifies an image of Aunt Mildred.

Likewise, a strand of Aunt Mildred’s DNA uniquely specifies that she is a woman, that she has RH Negative Blood Type, two arms, two legs etc.

I just read the recent Scientific American article on m-RNA splicing – and I’d like to know what man-made code contains several different messages, obtained by excising introns in different ways?

The sentence “You have a green light” could mean:
-We’re sitting at a stoplight and the light just turned green
-You’re holding a green light bulb in your hand
-Your proposal just got accepted
Only one sentence, three completely different meanings, depending entirely on the context.

Where’s the encoding process in DNA? What process leads from the code to its interpretation?

Earlier, I replied that the genetic recombination of DNA from male and female is an example of an encoding process; and this does fit the dictionary definition of encoding. Yockey also describes the encoding of DNA to mRNA in the reference above (see the figures from Shannon’s book and Yockey’s book). I think his example is better than mine.

There is no encoding / decoding in DNA, it’s just a passive molecule. The ‘decoding’ occurs in transcription, which is initiated by RNA polymerase, which produces mRNA, and then mRNA is translated by ribosomes to polypeptide chains (proteins).

Which is to say the process is entirely biochemical, no less mechanical and no more intelligent than the flow of water.

You have described the encoding / decoding process in DNA exactly as Yockey has; he maps it to the Shannon communication model. This happens without the intervention of intelligence, just like your computer getting automatic virus updates. But water flows and hailstones and sand dunes do not encode and decode information in this way. Only intelligently designed systems map 1:1 to Shannon’s model.

When you quoted all those textbooks to say “DNA is a code” I think you confused the genetic code (which is a code) with DNA (which just happens to be what the genetic code is made of).

The proper formal terminology is “The pattern of base pairs in DNA are a code.” From the beginning I’ve been very clear about the difference between the message and the medium. The molecule itself is the medium; the ordering of the base pairs defines the code. The question that naturalism can’t answer is where the code came from. The question of where the molecule and chemicals came from is an important one, but outside the scope of this thread.

DNA is a molecule, not a code. If you narrowly consider a mathematical definition of code, then DNA is obviously not a code any more than gravity is. There is no source, no receiver, no probability space, no unique mapping, and most of all, no letters or alphabets in DNA. It’s a freaking molecule, not an abstract concept.

Go ahead and show me, by analogy, how DNA is like a code, then I will just as easily show you how gravity (or any other natural phenomenon) is like a code.

The statement that DNA is an encoding / decoding mechanism is not an analogy. See Yockey above, and the demonstration that DNA is isomorphic with an Electrical Engineering communication system. No naturalistic examples (ice, rocks etc.) match Shannon’s model, however.

You can make a code out of rock and sticks. However, it is completely incorrect to call the rocks and sticks a code. You can make a code out of one’s and zero’s . But it’s still not correct to call the 1’s and 0’s a code.

Correct. If you spell your name on your driveway with rocks and sticks, you have created a code. The rocks and sticks do not weigh any more than they did before, but now they possess coded information whereas they didn’t before. Illustrating the immaterial nature of information.

You say DNA contains a message, but you haven’t defined the decoder. What is this decoder that DNA ‘communicates’ with? Are you saying that:

god = Encoder

DNA = Message transmission medium (yet you say DNA is the code itself)

? = Decoder?

Again, read Yockey – in the application of DNA’s code for the growth of an organism, p. 33-35. And Yockey acknowledges that the source of the original encoding is unknown, a question which naturalistic theories cannot answer.

DNA is a replicator , not a decoder or transmitting medium.

That is quite incorrect, Read Yockey , or my above where he describes the encoding, transmission and decoding within the cell and its analogue to a recording tape.

Communication is meaningful only if transmitted between two entities that understand the code. If DNA is the transmitting medium, then there ought to be a decoder at the other end- where is it?

Re-quoting Yockey : “Figure 5.2 describes the DNA-mRNA-proteome communication system to show its isomorphism with the standard communication system of the communication engineer. The genome, or the ensemble of genetic messages, is generated by a stationary Markov process and recorded in the DNA sequence, which is isomorphic with the tape in a tape-recording machine (Turing, 1936).

In what way is it symbolic? What plan does DNA contain? What idea does it express?

A strand of DNA in a skin cell that falls from your body contains a plan for a human being (you), even though neither the skin cell nor the strand of DNA are human beings. This is what I specifically mean by the phrase “independent of the communication medium.” A book represents more than paper and ink, because it contains plans and ideas and instructions via coded information. Even if the topic of the book is paper or ink chemistry, or instructions for printing books, it still contains plans and ideas independent of the paper and ink it’s printed on.

A snowflake contains no coded information because it symbolically represents nothing (no plan, no idea, no instructions) other than itself, and because there is no encoding / decoding mechanism and no system of symbols.

Because decoding goes from DNA to protein, encoding would have to go from protein to DNA. But that mechanism is not known to exist.

Encoding in fact goes from the “soup” made of four bases to DNA. Quoting Wikipedia: “DNA is encoded with four interchangeable “building blocks”, called “bases”, which can be abbreviated A, T, C, and G; each base “pairs up” with only one other base: A+T, T+A, C+G and G+C; that is, an “A” on one strand of double-stranded DNA will “mate” properly only with a “T” on the other, complementary strand. Replication is performed by splitting (unzipping) the double strand down the middle via relatively trivial chemical reactions, and recreating the “other half” of each new single strand by drowning each half in a “soup” made of the four bases.”

Common dictionary definitions:

CODE: 3a: A system of symbols for communication 4: Genetic Code
(Webster’s 9th collegiate dictionary)

GENETIC CODE: the biochemical instructions that translate the genetic information present as a linear sequence of nucleotide triplets in messenger RNA into the correct linear sequence of amino acids for the synthesis of a particular peptide chain or protein. Cf. codon , translation
(Random House Unabridged Dictionary, emphasis mine)

DNA: A nucleic acid that carries the genetic information in the cell and is capable of self-replication and synthesis of RNA
(Houghton Mifflin Dictionary, emphasis mine)

DNA: Genetics.
deoxyribonucleic acid: an extremely long macromolecule that is the main component of chromosomes and is the material that transfers genetic characteristics in all life forms, constructed of two nucleotide strands coiled around each other in a ladderlike arrangement with the sidepieces composed of alternating phosphate and deoxyribose units and the rungs composed of the purine and pyrimidine bases adenine, guanine, cytosine, and thymine: the genetic information of DNA is encoded in the sequence of the bases and is transcribed as the strands unwind and replicate. Cf. base pair, gene, genetic code, RNA. (Random House Unabridged Dictionary, emphasis mine.)


Snowflakes and tornados and sand dunes and water molecules do not contain coded information because there is no system of symbols, no encoding / decoding mechanism, no transmission of a message (plan, idea or instructions) that is independent of the communication medium. In other words, these things represent nothing other than themselves.

DNA contains coded information because it is a system of symbols used by an encoding and decoding mechanism which transmits a message (a plan, an idea, instructions for assembling a complete organism). The plans, ideas and instructions are independent of the communication medium, because the DNA molecule represents something other than itself.

You are erroneous to state DNA carries symbolic representations of information–the DNA doesn’t know or care what symbols we assign to it.

Does DNA care what symbols we give it? Not at all. The 1’s and 0’s on your computer’s hard drive don’t care what you call them either, but they still symbolically represent something other than a voltage or magnetic field. They represent Excel spreadsheets and programs and photos of Uncle Herman and Aunt Mildred. And maybe other people too.

The decoding of the human genome is the interpretation of DNA’s base pairs, mapping them to specific biological functions. The reason we can make that genome map is because a direct relationship between genetic code and creature actually does exist; it’s not just our imagination. DNA codes for specific characteristics, which are discoverable and definable. We decode the genome because cracking the genetic code has utility. It enables us to change the code (thereby changing the creature), understand how biology works, achieve specified goals.

My bad – I had said: “Common dictionary definitions: CODE: 3a: A system of symbols for communication 4: Genetic Code (Webster’s 9th collegiate dictionary)…”

3a: is non-metaphorical 4: is metaphorical. The authors of the dictionary understand something which you seem not to realize: the difference between literal and figurative use of a term.

I omitted an important detail: Definition number 4 of CODE in Webster’s dictionary says “4 :GENETIC CODE” (the dictionary uses ALL CAPS, which I failed to carry over when I typed this in the first time). According to the explanatory chart in the beginning, all caps indicates the two terms are synonymous cross references. The explanatory notes on page 21 say, “A synonymous cross-reference indicates that a definition at the entry cross-referred to can be substituted as a definition for the entry or the sense or subsense in which the cross-reference appears.”

Therefore, according to the dictionary, DNA contains code.

A water molecule does not contain coded information because it represents nothing other than itself. Through chaos, water can form steam or clouds or condensation or snowflakes, depending entirely on the conditions. But unlike DNA, water molecules contain no code or instructions which specify in advance what any of these larger forms will take. Furthermore, we know from chaos theory that it’s sometimes impossible to determine these forms in advance. That’s why one snowflake is different from the next, even though the water molecules are identical.

Chaos produces patterns (snowflakes, sand dunes, stalactites, hurricanes, tornados) naturally, but chaos does not not produce symbols or coded information.

From definition 4 alone, DNA is a code. When you shift to definition 3, a system of symbols for communication obviously has a designer. A designer is a creator, so when you capitalize his name, the designer of DNA is the Creator, or God.

This is a conflation, which is a great source for humor, as one can mine the language for interesting uses, but it is invalid as a logical proof.

This is neither conflation nor fallacy of equivocation, because both 3 and 4 in the dictionary are consistent with my original definition of coded information as “A system of symbols used by an encoding and decoding mechanism, which transmits a message that is independent of the communication medium.” DNA and computer languages alike fit this definition, while purely physical phenomena like snowflakes do not.

Earlier, Faldage said, “It doesn’t even matter how we define “codes” as long as we don’t change definitions in mid syllogism.” He’s right, and I have been very careful to maintain a consistent definition of the word “code.” My references to the dictionary are likewise consistent. As a result we observe that, so far as anyone knows, coded information only exists in the realm of conscious minds and living things; there is no purely materialistic explanation for its origin.

You have not shown that DNA qualifies as a code under definition 3. Your mere assertion won’t cut it. The chemical interactions in DNA are not a message in any conventional sense. No communication has taken place. Nothing there is in any way symbolic. Our descriptions may be symbolic but the chemistry is not.

“The idea of encoding, of the accurate representation of one thing by another, occurs in other contexts as well. Geneticists believe that the whole plan for a human body is written out in the chromosomes of the germ cell. Some assert that the ‘text’ consists of an orderly linear arrangement of four different units, or ‘bases’ in the DNA forming the chromosome. This text in turn produces an equivalent text in RNA, and by means of this RNA text proteins made up of sequences of 20 amino acids are synthesized. Some cryptanalytic effort has been spent in an effort to determine how the 4 character message of RNA is re-encoded into the 20 character code of the protein. Actually, geneticists have been led to such considerations by the existence of information theory. The study of the transmission of information has brought about a new general understanding of the problems of encoding, an understanding which is important to any sort of encoding, whether it be the encoding of cryptography or the encoding of genetic information.”

(John R. Pierce, An Introduction to Information Theory: Symbols, Signals and Noise, 2nd edition, 1980)

Mr. Marshall is uninformed about evo-devo and epigenetics . The environment always affects growth so there is no 1:1 mapping from DNA to phenotype. DNA is not a code, it’s a recipe. Since there is no 1:1 mapping, the metaphor breaks down.

From Hubert Yockey :

“The genetic code has many of the properties of codes in general, specifically the Morse Code, the Universal Product Bar Code, ASCII, and the US Postal Code. I shall explain the relation of these codes to the genetic code in the following discussion. Every code, as the term is used in this book, can be regarded as a channel with an input alphabet A and an output alphabet B.

“Here is the formal definition of a code :

Given a source with probability space [Omega, A, p(A)] and a receiver with probability space [Omega, B, p(B)], then a unique mapping of the letters of alphabet A onto letters of alphabet B is called a code.
Here p(A) is the probability vector of the elements of alphabet A and p (B) is the probability vector of the elements of alphabet B. ( Perlwitz , Burks and Waterman, 1988)

“Nature has extended the primary four-letter alphabet to the six-bit, 64 member alphabet of the genetic code. Each amino acid except Trytophan and Methionine has more than one codon . Thus, the genetic code is redundant (not degenerate). The sloppy terminology designating the genetic code as degenerate is responsible for most of the misunderstanding of the genetic information processing system.

“The genetic code is distinct and uniquely decodable, because the single Methionine codon AUG, and sometimes the Leucine codons UUG and CUG, serve as a starting signal for the protein sequence and performs the same function as the long frame bars at the beginning of the postal message in the ZIP+4 code and the Universal Product Code. The codons UGA, UAA and UAG function usually as non-sense and stop the translation of the protein from the mRNA and initiate the release of the protein sequence from the mRNA ( Maeshiro and Kimura, 1998). They perform the same function as the long frame bar at the end of the postal bar code message (Bertram, 2001). Remember that non-sense does not mean nonsense or foolishness. Code letters are called non-sense because they have been given no sense or meaning assignment in the receiving alphabet.”

(From Hubert Yockey , Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005)

If, as you have already inferred, DNA uniquely determines any phenotype characteristic at all, then it does qualify as a code. As Yockey states in this one example (and there are others), it does.

Your point 1 (“DNA is a code”) is wrong. You are not familiar with DNA and its’ function. Saying that DNA is a code or a language, etc, is a misinterpretation. Scientists and biologists use these terms in an analogous way to explain to non scientific people what DNA does. DNA of itself does nothing but act something like a template, but not exactly like a template. It does not code, nor does it encode, nor does it decode. mRNA , in RNA and other proteins do “communication like” activities.

These words are communication LIKE. This is not really communication but something similar. I won’t attempt to explain this since it takes a number of book sized documents to describe the process adequately.

“Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory ( Shannon, 1948) and are not synonyms, metaphors, or analogies.”

(From Hubert Yockey , Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005).

The word ‘code’ is misleading because DNA is actually a template – not just to copy itself, but to make proteins.


“The genome is sometimes called a ‘blueprint’ by people who have never seen a blueprint. Blueprints, no longer used, were two-dimensional, a poor metaphor indeed, for the linear and digital sequence of nucleotides in the genome. The linear structure of DNA and mRNA is often referred to as a template. A template is two-dimensional, it is not subject to mutations, nor can it reproduce itself. This is a poor metaphor as anyone who has used a jigsaw will be aware. One must be careful not to make a play on words.”

(From Hubert Yockey , Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005).

Perry unfortunately has been mislead by the word code (and probably by anti-evolutionary material about Information Theory).

Yockey’s work is far from being anti-evolutionary material about information theory; Yockey is in fact an evolutionist.

DNA is not communicating with anything else, it’s just making copies of itself and controlling the cell. There is no receiver and no message.

“Figure 5.2 [see top of this web page – Perry Marshall] describes the DNA-mRNA-proteome communication system to show its isomorphism with the standard communication system of the communication engineer. The genome, or the ensemble of genetic messages, is generated by a stationary Markov process and recorded in the DNA sequence, which is isomorphic with the tape in a tape-recording machine (Turing, 1936).

“The decoding of the genetic message from the DNA alphabet to the mRNA alphabet is called transcription in molecular biology. mRNA plays the role of the channel, which communicates the genetic message to the ribosomes , which serve as the decoder. The genetic message is decoded by the ribosomes from the 64 letter mRNA alphabet to the 20 letter alphabet of the proteome. This decoding process is called translation in molecular biology… ( Ribosomes ) act like the reading head on a tape machine (Turing, 1936). The protein molecule, which is the destination, is also a tape. Thus, the one-dimensional genetic message is recorded in a sequence of amino acids, which folds up to become a 3-dimensional active protein molecule. One is reminded of the linear signals that fold up to show a 2-dimensional picture on the television screen.”

(From Hubert Yockey , Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005)

DNA’s communication is no symbolic, it’s pure chemistry. There’s no higher level code. If I create a self replicating machine that makes copies of itself using the resources around it, that doesn’t require a code.

Instructions, by definition, require a mapping from probability space A to probability space B. Therefore any set of specific instructions is necessarily a code.

My self replicating machine would not be in communication with any other machines or any people.

Parts of the machine must still communicate with other parts, to read and carry out the instructions. Therefore, communication is taking place.

DNA does not specify the geometry of a plant, or the shape of the optical centers of a cat, and it cannot control the events leading to the death of an organism.

DNA uniquely specifies what kind of plant it is, and that it a cat is a cat. A sequence of symbols only has to uniquely determine one thing in order to qualify as a code.

Perry, you have completely missed the point. All the texts you quote are allegorical or else analogies. From John R. Pierce to the end you are using texts that are making a comparison, not expressing the actual idea. Yes, DNA can be compared to a code but these are just analogies for uneducated lay people.

This objection has already been answered. Please carefully re-read my previous post, quoting Yockey : “Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory ( Shannon , 1948) and are not synonyms, metaphors, or analogies.”

There are thousands of cats, thousands of oak trees, and DNA doesn’t uniquely determine any particular cat or oak tree.

The DNA for a particular cat uniquely determines that one cat in the exact same sense that a crime scene investigator uses DNA to uniquely identify one criminal. Any particular strand of DNA, by itself, uniquely specifies many characteristics. This is well-established. Gravity, by itself uniquely specifies nothing in advance, and by all formal definitions in information theory, cannot be defined as a code . To say otherwise is to conflate two completely different things. Gravity is not a code. It’s a force.

You literalist, don’t you know that when biology textbooks use words like “code”, they’re just dumbing down complex ideas for unsophisticated people?

I went through a literal stack of biology textbooks at the Oak Park IL library yesterday, just to make 100% sure my terminology is consistent with scientific convention. The words “code” and “symbol” are not used metaphorically in any of them. Science textbooks and papers are written very literally, and the word genetic code in biology is just as literal as the word protein. When you look up these definitions in biology textbooks, they don’t say DNA is a “code” with quotation marks, or that it contains “information” with quotation marks, or that it is like a code, or that it contains something like information. They say that DNA is the basis for genetic code, and that it contains real, measurable information, and that the code uniquely determines real proteins.

“The problem of how a sequence of four things (nucleotides) can determine a sequence of twenty things (amino acids) is known as the ‘coding’ problem.” –Francis Crick

A review of the literature quickly shows that this terminology has been standard for over 40 years. See Yockey’s earlier statement about this. This is not a matter of opinion, this is a matter of rigorous definitions from information theory. (HRG, I should also point out that you have been insisting that DNA is not literally a code, while at the very same time arguing that gravity is literally a code.)

DNA is a set of instructions only in the same sense that chemistry itself is a set of instructions. All molecules know or decode is the laws of physics.

The bits and bytes on your hard drive don’t “know” anything either, they simply obey the laws of physics. It’s a purely electro-mechanical process. But they still have to be programmed to do what they do. Computer programs don’t emerge naturally, they are designed. The information on your computer cannot be reduced to so many pounds of magnetic material or silicon. Similarly, you and I cannot be reduced to so many pounds of carbon atoms. A book cannot be reduced to paper and ink.

Sorry, but I just don’t agree that DNA is a code in the same sense that Morse Code, language etc. are codes.

You’ll have to take that up with Yockey , Shannon, and the biology textbooks and publishers of the journals. I have thoroughly demonstrated, based on precise definitions and authoritative sources, that DNA is a code in the same sense as other codes.

Let’s make a 1:1 comparison between DNA and English: DNA can express many organisms (cats, dogs, humans) and so can English (books, poems, songs, speeches).

All specific DNA strands specify specific phenotypes. And all English sentences specify particular books, poems, songs, speeches, conversations etc.

With DNA, male and female contribute half the genetic material to create a new individual. Compare that to human language and it’s absurd: The Gettysburg Address and King’s I Have A Dream speech become “The Gettysburg Dream” or “I Have an Address.”

That is correct. Also note that it takes a pretty elegant process to produce a new document that is a beautiful, functional combination of its two parent documents.

DNA has an error rate of 1 error per genome per 1000 replications, so tell me, what language has x error/paragraph/x copies?

Every packet of data that has ever traversed the Internet, every sentence spoken in a noisy room. Actually the error rate of DNA is usually less than man-made communication systems – it has built in mechanisms to minimize errors.

Let’s assume Perry’s assertion ‘DNA is a code’ is correct. He says since all codes we know the origin of are designed, then DNA is designed. I don’t know the name of this logical fallacy is called, but I think it’s a sweeping generalization that’s unwarranted.

The reasoning in my syllogism is identical to the reasoning through which science has concluded that “Matter and energy cannot be created or destroyed.” I don’t believe one can prove conservation of matter of energy from a prior governing mathematical principle; we can only recognize that no exception to this law has ever been physically observed. But a useful difference between matter/energy and coded information is that we create coded information every day – but also we observe that only minds create coded information.

Even identical twins are not 100% identical, therefore DNA doesn’t uniquely specify a phenotype.

We observe that DNA uniquely specifies certain characteristics (i.e. blood type, male or female, number of legs, etc). Therefore according to formal mathematical definition, DNA is a code.

To fit the formal definition of a code, DNA need only uniquely specify one or more characteristics (male/female, blood type, etc). It does, therefore by Perlwitz’s definition DNA is a code. The fact that you refer to them as identical twins (even though they obviously are not absolutely identical in the fullest possible degree) is an everyday example of the fact that their DNA uniquely specifies much of what they do have in common. Sex, blood type, number of arms and legs, and for monozygotic identical twins, a very very long list of other things.

Perry, I wish you’d stop quoting text books and definitions and actually start having a discussion with us.

Unfortunately, it is not possible for us to have a productive discussion without a proper definition of terms. Since the extensive papers, dictionaries and mathematical definitions I have cited thus far have not been sufficient to persuade some in this forum that DNA is literally a code, I am happy to provide you with further support for my thesis, and more textbook definitions so that all can agree:

The genetic code is a set of 64 base triplets (nucleotide bases, read in blocks of three). A codon is a base triplet in mRNA. Different combinations of codons specify the amino acid sequence of different polypeptide chains, start to finish.
-Cell Biology and Genetics, Starr and Taggart, Wadsworth Publishing, 1995

Genetic Code: The sequence of nucleotides, coded in triplets (codons) along the messenger RNA, that determines the sequence of amino acids in protein synthesis. The DNA sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence.
-50 years of DNA, Clayton and Dennis, Nature Publishing, 2003

“The problem of how a sequence of four things (nucleotides) can determine a sequence of twenty things (amino acids) is known as the ‘coding’ problem.” –Francis Crick

“The unique mark of a living organism, shared with no other known entity, is its possession of a genetic program that specifies that organism’s chemical makeup. The program has two essential and related features: first, it is ‘read’ by the organism, and the instructions embodied therein expressed, second, it is replicated with high fidelity whenever the organism reproduces….DNA carries genetic specificity. This structure immediately suggests that genetic specificity, the “information” that distinguishes one gene from another, resides in the sequence of nucleotides.

“Genetic information flows in linear fashion from the sequence of bases in DNA to that of amino acids in proteins. The parallel with letters and words is inescapable… the quantity of information transmitted can be estimated with the aid of algorithms derived from wartime researches on the fidelity of communications.”

“The most compelling instance of biochemical unity is, of course, the genetic code. Not only is DNA the all but universal carrier of genetic information (with RNA viruses the sole exception), the table of correspondences that relates a particular triplet of nucleotides to a particular amino acid is universal. There are exceptions, but they are rare and do not challenge the rule.”

-The Way of the Cell, Franklin M. Harold, Oxford University Press, 2001

“A code is a set of rules governing the order of symbols in communication. This defines a code, regardless of the nature of the symbols, be they alphabetic letters, voice sounds, dots and dashes, DNA bases, amino acids, nerve impulses, or what have you. Codes are generally expressed as binary relations or as geometric correspondences between a domain and a counterdomain; one speaks of mapping in the latter case. Thus, in the International Morse Code, 52 symbols consisting of sequences of dots and dashes map on 52 symbols of the alphabet, numbers and punctuation marks; or in the genetic code, 61 of the possible symbol triplets of the RNA domain map on a set of 20 symbols of the polypeptide counterdomain.

“In intercellular communication the domains and counterdomains are the signal molecules and their receptors, and the code is like the base-pair rules of the first-tier code of the DNA, a simple rule between pairs of molecules of matching surfaces.

Why There are no Double-Entendres in Biological Communication: The basic information for the encoding in intercellular communication (a high-class encoding complying with Shannon’s Second Theorem) is all concentrated in the interacting molecular surfaces. And this information is what makes the communications unambiguous. We can now define an unambiguous communication: a communication in which each incoming message or signal at a receiver (or retransmitter) stage is encoded in only one way; or, stated in terms of mapping, a communication in which there is a strict one to one mapping of domains, so that for every element in the signal domain there is only one element in the counterdomain.

“The table in Figure 7.9 tells us at a glance that a given amino acid may have more than one coding triplet: UUA, UUG, CUU, CUC, CUA, CUG, for instance, are all synonyms for leucine. A code of this sort is said to be “degenerate.” That is OK despite the epithet, so long as the information flow goes in the convergent direction, as it normally does. The counterdomain here consists of only one element, and so a given triplet codes for no more than one amino acid. Thus, there is synonymity, but no ambiguity in the communications ruled by the genetic code.”

-The Touchstone of Life: Molecular Information, Cell Communication and the Foundations of Life, by Werner R. Loewenstein, Oxford University Press, 1999

“(George) Gamow devised a scheme, illustrated by means of playing cards, that involved sets of three adjacent nucleotides per amino acide unit (“triplet” code) in a sequence of overlapping triplets. That proposal spurred Francis Crick and his colleagues to examine the coding problem more critically and to use knowledge gained from genetic experiments to test the possible validity of Gamow’s scheme and its variants. By 1961 they had concluded that the nucleotides of each triplet did not belong to any other triplet (“nonoverlapping” code); that sets of triplets are arranged in continuous linear sequence starting at a fixed point on a polynucleotide chain, without breaks (“commaless” code), thus determining how a long sequence is to be read off as triplets; and that more than one triplet can code for a particular amino acid (“degenerate” code).

-Proteins, Enzymes, Genes: The Interplay of Chemistry and Biology, Joseph S. Fruton, Yale University Press 1999

“The genome of any organism could from then on be understood in a detailed way undreamt of 20 years earlier. It had been revealed as the full complement of instructions embodied in a series of sets of three DNA nitrogenous bases. The totality of these long sequences were the instructions for the construction, maintenance, and functioning of every living cell. The genome was a dictionary of code words, now translated, that determined what the organism could do. It was the control center of the cell. Differences among organisms were the result of differences among parts of these genome sequences.”

-The Human Genome Project: Cracking the Genetic Code of Life, by Thomas F. Lee, Plenum Press, 1991

“The three-nucleotide, or triplet code, was widely adopted as a working hypothesis. Its existence, however, was not actually demonstrated until the code was finally broken…

“With a knowledge of the genetic code, we can turn our attention to the question of how the information encoded in the DNA and transcribed into mRNA is subsequently translated into a specific sequence of amino acids in a polypeptide chain. The answer to this question is now understood in great detail… instructions for protein synthesis are encoded in sequences of nucleotides in the DNA molecule.”

-Biology, 5th Edition, by Curtis & Barnes, Worth Publishers, 1989

Information only exists in our imagination, but it does not have any independent, objective existence outside of the human mind.

The coded information in the sequence of base pairs directs the growth of a zygote into the form of a full adult. This information is just as real as the adult it codes for, and does not require an external observer to form a mental abstraction of it in order to be real.

Information theory and codes are mere abstractions and not reality.

If your computer is set to turn the monitor off after 30 minutes of non-use, then the monitor turns off. The information that commands this to happen is just as real as the monitor, and just as real as event of the monitor turning off. It cannot perform this task successfully without the information. Information is a real entity, not merely a human abstraction.

TCP/IP is a code with no physical presence. So it does not interact with its environment. A program that uses TCP/IP may get sensory data from a piece of hardware, but it’s ridiculous to say the code does this all by itself. All codes require a channel. TCP/IP could be written down on paper, and it can be used as a language between you and me. But DNA by comparison can only operate within the laws of physics in a cell. The code is not symbolic. And it has no semantical aspects. Your analogy is false.

TCP/IP affects its environment because it brought you the message on the screen that you are reading right now. A hypothetical DNA strand with completely randomized sequence of base pairs contains no semantical content and builds no real organism; A DNA strand for a mouse has semantical instructions for the building of a mouse. Semantics = meaning, and DNA ostensibly has semantical content.

Your code definition is not general enough to be used in this discussion because DNA and molecules can’t communicate symbolically. They only operate according to the brute laws of physics.

Your computer only knows physics too but it still communicates symbolically. The data on your computer cannot be explained purely in terms of the materials your computer is made of; which is as good an illustration of any as to why purely materialistic interpretations fail. Many years ago a discussion about this topic would seem hopelessly abstract to most people, but now we live in the information age. We all know exactly what information is and we all understand that information is the entity that defines living things, man-made things, and all designs.

You have to use Shannon’s general definition of a code and the expanded definitions, not only Perlwitz’s.

My original definition from Post #28 will suffice. And Perlwitz’s fits with it nicely.

You’re saying the transmission of a message is independent of the communication medium, and that’s nonsense. If you have cable TV, you use your remote control to turn off the cable. If you have a dish, you use your remove control to turn off the dish. You use your remote to change TV channels. Are you going to try and tell me that the TV show is independent of the communication medium?

I didn’t say the transmission of the message was independent of the medium. I said the message was independent of the medium. It’s the same television show regardless of which TV you watch it on, and whether it came to you through the antenna or the cable company. Whether the screen is a tube or LCD. The message is distinct from the medium that carries the message. An image of Barbara Walters is not glass, it’s not air, not wire.

You don’t understand, Perry. Yes, computers obey the laws of physics but they are further directed by our design in them. Bits an bytes operate according to Boolean logic, not just physics. This is artificial and of course it’s evidence of design. However DNA has no such artificial constraints, therefore it is not intelligently designed.

DNA obeys a host of rules as well, and as Yockey points out, these rules obey the laws of physics but are not derivable from them.

Quote from Schneider, developer of the EV program :
There are many codes in molecular biology besides the genetic code since every molecular machine has its own code.

It’s good to see now, after six weeks, that everyone finally agrees that DNA really does contain a code. Nontheless any machine we say has “coded information” must contain an encoding/decoding mechanism to apply here.

If there’s still anyone who asserts that DNA is not a code, take up this issue with the authors and publishers of these books – Oxford University Press, Yale University Press, Francis Crick, George Gamow , etc. I have presented not only volumes of material evidence that DNA is a code, I have also provided proof based on formal mathematical definitions.

I agree with you, and Yockey , that the process of transcription, translation and protein folding is equivalent to the decoding of a code.

Indeed. And technically speaking, this is not even a matter of opinion. Yockey puts this forth quite rigorously, based on logic, Shannon’s work, and formal definitions.

On the other hand Yockey’s representation of transcription as an encoding process is more of a stretch. But still it’s not unreasonable. It’s not like cryptography where the message is scrambled and then de-scrambled. But this doesn’t explain the origin of the code in the first place. So if the evolution of information is a stochastic process, then why isn’t mutation a sufficient explanation for the origin of the code?

Because pure stochastic processes have never been observed to create codes and coding systems, and only in exceedingly rare circumstances do they improve them. In general, information theory says that random mutation (= noise = entropy) only degrades information. Observe that both man-made digital communication systems (TCP/IP and Ethernet for example) and DNA have extensive redundancy and error-correction mechanisms to prevent mutations, noise, and information entropy.

Yockey explains how DNA contains code, but he doesn’t go so far as to call this a higher level communication as in two humans communicating via a consciously created code.

He doesn’t have to. My earlier reply regarding automatic virus updates covers this. In fact that’s one of my major points: A coding system can operate for eons without any conscious intervention. But direct observation tells us that only conscious minds create coding systems in the first place.

Perry, You say- This is consistent with my definition of “coded information” as “a system of symbols used by an encoding and decoding mechanism, which transmits a message that is independent of the communication medium.”


The transmission of a code begins with a sender’s message that is changed to a different format by the sender, transmitted, received, decoded (returned to its original form) and read. I don’t know of any example of this that was not made by a human.

Dogs barking and mating calls of birds fit this definition. My computer automatically getting virus updates from the Internet also fits this definition. So do the viruses themselves.

Yockey says the genome is generated by a stationary Markov process. Doesn’t this suggest that the origin was random as well?

The probabilistic element he is referring to is the fact that external factors (noise/mutation) can corrupt the encoding/decoding process, as we all well know. That’s true in any communication system. Note that this does not describe the origin of the information, only the corruption of information after it exists.


How is an atheist who says DNA only appears to be a code, any different than a Young Earth Creationist who says the universe only appears to be 14 billion years old?


Additional references from peer reviewed standard scientific literature, explaining why DNA is literally and not figuratively code:

“biological information is distinctive because it possesses a type of causal efficacy [23,24]—it is the information that determines the current state and hence the dynamics (and therefore also the future state(s)).3 In this paper, we postulate that it is the transition to context-dependent causation—mediated by the onset of information control—that is the key defining characteristic of life. We therefore identify the transition from non-life to life with a fundamental shift in the causal structure of the system, specifically a transition to a state in which algorithmic information gains direct, context-dependent, causal efficacy over matter.”

“In terms of computer language, in living systems chemistry corresponds to hardware and information (e.g. genetic and epigenetic) to software [27].”

“Explaining the chemical substrate of life and claiming it as a solution to life’s origin is like pointing to silicon and copper as an explanation for the goings-on inside a computer.”

“The instructional, or algorithmic, nature of biological information was long ago identified as a key property, and an early attempt to formalize it was made by von Neumann. He approached the problem by asking whether it was possible to build a machine that could construct any physical system, including itself. Identifying the parallels between biological systems, such as the human nervous system, and computers, and drawing inspiration from Turing’s work on universal computation, von Neumann [56] sought a formalism that would include both natural and artificial systems.”

“A key feature of Turing machines is that both the state of the machine and the current symbol on the tape being read in, are necessary to determine the future evolution of the system. As such, the algorithm encoded on the tape plays a prominent role in the time evolution of the state of the machine. At least superficially, this appears to be very similar to the case presented by biological systems where the update rules change in response to information read-out from the current state (as we discuss below, both are an example of top-down causation via information control). However, it is not obvious exactly how Turing’s very abstract formalism might map onto biological systems. This was the problem von Neumann wished to solve.”

“By analogy with Turing’s universal machine, he therefore devised an abstraction called a universal-constructor (UC), a machine capable of taking materials from its host environment to build any possible physical structure (consistent with the available resources and the laws of physics) including itself. An important feature of UCs is that they operate on universality classes.4 In principle, an UC is capable of constructing any object within a given universality class (including itself, if it is a member of the relevant class). An example of such a universality class relevant to biological systems is the set of all possible sequences composed of the natural set of 20 amino acids found in proteins. The relevant UC in this case is the translation machinery of modern life, including the ribosome and associated tRNAs along with an array of protein assistants.5 This system can, in principle, construct any possible peptide sequence composed of the coded amino acids (with minor variations across the tree of life as to what constitutes a coded amino acid [58]).”

The algorithmic origins of life
Sara Imari Walker
and Paul C. W. Davies
Interface Focus (published by the Royal Society)
Published:06 February 2013

Crick, F. (1968). “The Origin of the Genetic Code.” Journal of Molecular Biology, 38, 367–379.

303 Crick, F. (1962, December 11). “Nobel Lecture: On the Genetic Code.” Retrieved from

Ji, S. (1999). “The Linguistics of DNA: Words, Sentences, Grammar, Phonetics, and Semantics.” Annals of the New York Academy of Sciences, 870, 411–417.

500 Andrianantoandro, E., Basu, S., Karig, D. K., & Weiss, R. (2006, May 16). “Synthetic Biology: New Engineering Rules for an Emerging Discipline.” Molecular Systems Biology, 2:2006.0028.

Perez, J.-C. (2010). “Codon Populations in Single-Stranded Whole Human Genome DNA Are Fractal and Fine-Tuned by the Golden Ratio 1.618.” Interdisciplinary Sciences, 2(3), 228–240.

Sadovsky, M. G. (2006). “Information Capacity of Nucleotide Sequences and Its Applications.” Bulletin of Mathematical Biology, 68, 785–806.

Yockey, H. P. (2000). “Origin of Life on Earth and Shannon’s Theory of Communication.” Computers & Chemistry, 24, 105–123.

Searls, D. B. (2002). “The Language of Genes.” Nature, 420, 211–217.

Witzany, G. (2008). “Bio-Communication of Bacteria and Their Evolutionary Roots in Natural Genome Editing Competences of Viruses.” Open Evolution Journal, 2, 44–54.

Andrianantoandro, E., Basu, S., Karig, D. K., & Weiss, R. (2006, May 16). “Synthetic Biology: New Engineering Rules for an Emerging Discipline.” Molecular Systems Biology, 2:2006.0028

Shannon, C. E. (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal, 27, 379–423.

Origin of life on earth and Shannon’s theory of communication
Hubert P. Yockey

Yockey’s 2005 book, published by Cambridge University Press draws significantly from this paper: