The Origin of Information: How to Solve It
Technology Prize for Origin of Information
$100,000 For Initial Discovery
$5 million USD if Patentable
Non-Disclosure Agreements Required
From the book Evolution 2.0: Breaking the Deadlock Between Darwin and Design (BenBella)
Natural Code LLC is a Private Equity Investment group formed to identify a naturally occurring code. Our mission is to discover, develop and commercialize core principles of nature which give rise to information, consciousness and intelligence.
Natural Code LLC will pay the researcher $100,000 for the initial discovery of such a code. If the newly discovered process is defensibly patentable, we will secure the patent(s). Once patents are granted, we will pay the full prize amount to the discoverer in exchange for the rights. Our investment group will locate or develop commercial applications for the technology.
The discoverer will retain a small percentage of ongoing ownership of the technology. Prize amount as of August 2017 is $5 million. The prize caps at $10 million.
Code is absolutely necessary for replication and for life. Code is necessary for cells to have instructions to build themselves; code is necessary for reproduction. Code that has the ability to change itself is required before any kind of evolution can occur.
So… where did the information in DNA come from? This is one of the most important and valuable questions in the history of science. Currently, no one knows the answer.
“Information” is defined as digital communication between an encoder and a decoder, using agreed upon symbols. To date, no one has shown an example of a naturally occurring encoding / decoding system, i.e. one that has demonstrably come into existence without a designer.
To solve this problem is far more than an object of abstract religious or philosophical discussion. It would demonstrate a mechanism for producing coding systems, thus opening up new channels of scientific discovery.
Such a find would have sweeping implications for Artificial Intelligence research. This would provide a solution to the most perplexing problem currently faced by the Origin Of Life field, namely the origin of coded information.
How could the genetic code (or any coding system) come into being? This would represent a landmark discovery in the history of science and alter our fundamental understanding of the universe. Essential Components of a Communication System (after Shannon, 1948):
DNA matches the pattern in the above diagram. Cosmic Fingerprints is seeking discovery and proof of a naturally occurring code, which also matches this pattern. The following specification defines the criteria for identifying a naturally occurring code. All of these criteria must be met to qualify for the prize:
1. Humans can design the experiment, with all manner of state-of-the-art laboratory equipment, ideal conditions etc. They just can’t cheat: the submitted system cannot be pre-programmed with any form of code whatsoever.
2. Since the origin of DNA is unknown, the submitted system cannot be a direct derivative of DNA or produced by a living organism. Bee waggles, dogs barking, RNA strands and mating calls of birds don’t count. Such codes are products of animal intelligence, genetically hard-coded and/or instinctual.
3. The origin of the submitted system must be documented such that its process of origin can be observed in nature and/or duplicated in a real-world laboratory according to the scientific method.
4. The submitted system must be digital, not analog.
5. The submitted system must have the three integral components of communication functioning together: encoder, code, decoder.
6. The message passed between encoder and decoder must be a sequence of symbols from a finite alphabet.
7. A symbol is a group of k bits considered as a unit. We refer to this unit as a message symbol mi (i=1, 2, …. M) from a finite symbol set or alphabet. The size of the alphabet M is M = 2^k where k is the number of bits in the symbol. For a binary symbol, k = 1, M = 2. For a quaternary symbol in DNA, k = 2, M = 4.
8. A character is a group of n symbols considered as a unit. We refer to this unit as a message character ci (i=1, 2, …. C) from a finite word set or vocabulary. The maximum size of the character set C is C = M^n. For a standard computer byte, M = 2, n = 8, C=256. For a triplet group of quaternary symbols in DNA, M = 4, n = 3, C=64.
9. The submitted system must be labeled with values of both encoding table and decoding table filled out.
10. For the submitted system, it must be possible to objectively determine whether encoding and decoding have been carried out correctly. For example when you press the “A” key on the keyboard, a letter “A” is supposed to appear on the screen and there is an observable correspondence between the two. In defining biological gender, a combination of X and Y chromosomes should correspond to male, while XX should correspond to female. For any given system, a procedure should exist for determining whether input correctly corresponds to output.
All submissions must explicitly demonstrate their solution conforms to each of the above ten steps in order to qualify.
(Above definitions adapted from Digital Communications: Fundamentals and Applications by Bernard Sklar, page 13, Prentice Hall, 2nd edition, 2001)
Isomorphism between Shannon’s Communication System and DNA:
Example Communication Systems: Example #1: The ASCII Code Keyboard > ASCII > Computer Screen: When you press the letter “A” on the keyboard, the letter is encoded into ASCII and decoded by the computer and a letter “A” appears on the screen. ASCII characters contain 7 symbols, so n = 7. The ASCII character set C is 2^7 or 128 characters. Encoding tables for ASCII (letter on keyboard > binary code):
|Input (letter on keyboard)||Encoded Message|
The complete ASCII table is available at http://en.wikipedia.org/wiki/Ascii#ASCII_printable_characters Decoding tables for ASCII (binary code > letter on screen or printer):
|Encoded Message||Output (displayed as an arrangement of pixels on screen or printer)|
Example #2: The Genetic Code Nucleotides > mRNA > Proteins: Base pairs are grouped into codons and encoded (transcribed) into messenger RNA, then decoded (translated) by the ribosomes into proteins. The DNA symbol unit is a nucleotide, forming a 4 letter alphabet of Adenine, Cytosine, Guanine, or Thymine. Each base pair contains k = 2 bits of information. A character consists of n = 3 symbol units. Character set C is 4^3 which is 64 characters. DNA’s redundancy scheme maps these 64 characters to 20 amino acids. Encoding tables for DNA (base pairs > mRNA):
|Nucelotides (Input)||Amino Acid (Encoded Message)|
The complete genetic code chart is available at http://en.wikipedia.org/wiki/Genetic_code#RNA_codon_table Decoding tables for DNA (amino acids > proteins):
|Amino Acid Sequence (encoded message) Legend of Amino Acid Abbreviations||outputPeptide/Protein* (organism name)||#**|
|MRTGNAN||Microcin C7 (EC)||7|
|DRVYIHPF||Angiotensin 2 (HS)||8|
|RPKPQQFFGLM||Substance P (HS)||11|
|GGAGHVPEYFVGIGTPISFYG||Microcin J25 (EC)||21|
|RSCCPCYWGGCPWGQNCYPEGCSGPKV||Neurotoxin 3 (AS)||27|
|APLEPVYPGDNATPEQMAQYAADLRRYINMLTRPRY||Pancreatic Hormone (HS)||36|
|KCNTATCATQRLANFLVHSSNNFGAILSSTNVGSNTY||Islet amyloid polypeptide (HS)||37|
|CTPGSRKYDGCNWCTCSSGGAWICTLKYCPPSSGGGLTFA||Serine protease inhibitor 3 (SG)||40|
|DDGLCYEGTNCGKVGKYCCSPIGKYCVCYDSKAICNKNCT||Pollen allergen Amb t 5 (AT)||40|
|VGIGGGGGGGGGGSCGGQGGGCGGCSNGCSGGNGGSGGSGSHI||Microcin B17 (EC)||43|
|ATYNGKCYKKDNICKYKAQSGKTAICKCYVKKCPRDGAKCEFDSYKGKCYC||Antifungal protein (AG)||51|
|GIVEQCCTSICSLYQLENYCN-FVNQHLCGSHLVEALYLVCGERGFFYTPKT||Insulin A-B chains (HS)||51|
|DIPEVVVSLAWDESLAPKHPGSRKNMACYCRIPACIAGERRYGTCIYQGRLWAFCC||Neutrophil defensin 1 (HS)||56|
|CSSNAKIDQLSSDVQTLNAKVDQLSNDVNAMRSDVQAAKDDAARANQRLDNMATKYRK||Major outer membrane lipoprotein (EC)||58|
|RPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGA||Pancreatic trypsin inhibitor (BT)||58|
|EEYVGLSANQCAVPAKDRVDCGYPHVTPKECNNRGCCFDSRIPGVPWCFKPLQEAECTF||Trefoil factor 3 (HS)||59|
|IRCFITPDITSKDCPNGHVCYTKTWCDAFCSIRGKRVDLGCAATCPTVKTGVDIQCCSTDNCNPFPTRKRP||Long neurotoxin 1 (NK)||71|
*Mature form; **#: number of amino acids. Source: http://www.uniprot.org
This is only a partial listing of the simplest proteins. There are about a million known proteins, many of them extremely complex. More information on protein structures is available at http://www.uniprot.org and http://www.ncbi.nlm.nih.gov/. Both ASCII and DNA are formal communication systems according to Shannon’s model because they encode and decode messages using a system of symbols. DNA is not like a communication system, or analogous to a communication system; it is formally defined as a communication system.
“Information, transcription, translation, code, redundancy, synonymous, messenger, editing, and proofreading are all appropriate terms in biology. They take their meaning from information theory (Shannon, 1948) and are not synonyms, metaphors, or analogies.” (Hubert P. Yockey, Information Theory, Evolution, and the Origin of Life, Cambridge University Press, 2005).
Similar tables are easily made for other codes and communication systems, like HTML, bar codes, postal codes, Morse code, computer file formats and programming languages.
Miller-Urey Experiment and the Origin of Life
The 1953 “Miller-Urey” experiment*** produced organic compounds from gases thought to be present in earth’s early atmosphere. It is widely cited in textbooks as an explanation of how early life was formed in the ocean.
This experiment only attempted to explain where a handful of the chemicals came from, and it certainly didn’t begin to explain how replication got started. Still, it provided useful insights.
If the Miller-Urey experiment had produced encoding, decoding, and information transmission as defined here, it would most certainly qualify as meeting this challenge. Prize money will be awarded to the first person who demonstrates a naturally occurring communication system that meets the engineering specification outlined in this document.
Submissions must be identical in format to the above examples of ASCII and DNA. Submissions must include a definition of all symbols, alphabet and the associated encoding / decoding tables.
If you have a submission, the first step is to visit the official prize website. ***Miller, Stanley L. (May 1953). “Production of Amino Acids Under Possible Primitive Earth Conditions” Science 117 (3046): 528.