In a recent debate with PZ Myers, a prominent old-school Darwinist, my rival asserted that “90 percent of our DNA is junk.”
I insist very little of it is junk. I have no doubt some is broken, corrupted, redundant or whatever – but if the broken stuff were fixed, we would all be healthier.
We know exactly what 3% of it does – it codes for proteins.
In 2012, the ENCODE project team announced that at least 80% of our DNA has some discernible function. Exactly what most of that does remains a mystery.
Meanwhile, some parts of DNA are clearly far more essential than others. One experiment deleted 1% of a mouse genome with no obvious defect.
80/20, the Pareto Principle, offers us a very useful shortcut for predicting how useful it is. 80/20 applies to practically all complex systems. The 80/20 Curve (www.8020curve.com) allows you to calculate, quickly and easily, how things are going to stack up.
If you have 10,000 files on your hard drive and they take up 10 gigabytes, how big is each file? 80/20 tells you with remarkable accuracy. If you have 1,000 customers and your company has a million dollars revenue, how much money do you get from each customer?
80/20 will tell you that too, and when you compare it to your sales reports, you’ll find it’s scary accurate. Ditto with populations of countries.
This is the basis of my book 80/20 Sales & Marketing. The Appendix shows a range of examples, from donations to a church to wealth in the Forbes 400 to dairy production in Wisconsin.
So what does the 80/20 curve predict about DNA? It says 20% of the coding sequences do 80% of the work. To get a more exact answer, you go to “specify the total output of all members” and enter:
Number of members: 100
Total output of all members: 100
What these numbers are saying is simply that 100% of the DNA (100 members) collectively gives us 100% of the DNA’s function; the total adds up to 100%. Now the tool will generate a curve that tells you the relative contribution of different parts.
Enter the parameters and you get a curve like this:
Mouse over the curve and you get to see the contribution of each percentile section.
- The top 1%, or the 99th percentile, contributes 14.2%.
- The 98th percentile contributes 7.8%.
- The 97th percentile contributes 5.5%.
Add those three together, and it says the part of DNA we fully understand, the 3% that codes for proteins, tells 27.6% of the story.
So almost three fourths of the story lies elsewhere, in the “non-coding” regions.
You can use the feature “By rank” on the bottom left of 8020curve.com and it will predict how much function comes from the least important 50% of the genome:
In other words, the least vital half of our DNA contributes 18%. The most vital half of our DNA contributes 82% of function. The vital stuff really is vital. The tiniest error in a Hox gene spells disaster. Cystic Fibrosis comes from one missing codon.
80/20 also predicts that the bottom 1% of our DNA contributes 0.27%. That some parts are nearly negligible.
My experience with 80/20 suggests that all of the bottom 10% will be even less useful than the curve predicts. The bottom ten segments are each probably less than 0.1% of the function. Since most systems can at least work with 80% of their parts, I’m guessing you could lose half your DNA and still stay alive.
Now of course you wouldn’t want that. Any more than you want to have your tonsils taken out or get your appendix removed. (Those were labeled “vestigial organs” for a long time. It turns out they aren’t so useless after all.) You could even survive with one kidney (like my dad did) but it would suck.
We have good reasons to believe that mysterious portions of DNA provide redundancy. For example the genetic code itself is 3:1 redundant. For most amino acids there are three codons that generate the same amino acid instead of one. This turns out to be a highly optimized scheme (optimal to the tune of one in a million) for minimizing copying errors. We know that cells can replace damaged DNA with segments copied from other chromosomes.
We know for example that salamanders often have many duplications in their genomes and have huge amounts of extra DNA. Same is true for onions. So what’s going on here?
As I describe in chapter 16 of Evolution 2.0, hybrids (as in horse + donkey = mule) double the number of chromosomes. Then a process called Hybrid Dysgenesis deals with instability in the genome and removes sequences that are unnecessary. The organism and its descendents begin to use re-arranged segments of the genome for other purposes. Over time the genome gets smaller and functions are added.
A similar thing happens with retroviruses. It now looks very much like mammal placentas were built using large pieces of code from viruses; and viruses are actually a MAJOR source of biodiversity.
Anyone who proclaims your genome is 90% is junk is in the lazy end of the science community. They are lying on hammocks, making egotistical pronouncements about things they barely understand. Junk DNA is junk science.
Mostly the only guys still trying to defend “Junk DNA” are atheists doing damage control. The creationists had been insisting it’s useful since the 1970s. ENCODE put egg on the Darwinists’ faces, and evidence continues to flow in the wrong direction for the Junk DNA crowd.
I have yet to meet anybody who is willing to voluntarily delete 90% of their DNA.
I’m with Eibi Nevo, the Israeli scientist who said, “The future of evolutionary biology lies in better understanding regulation of the non-coding areas that have been wrongly or unjustifiably called junk DNA.”
I estimate we’ll spend the next 100 years figuring out what it all does. Every year the percentage of so-called “junk” will decline until nobody cares to talk about it anymore.
How useful any one piece of DNA is spans a huge range… from a hybrid where half the chromosomes are mostly switched off, to non-coding DNA whose function is mysterious.
Since the definition of “junk” is: