Weasel program

The weasel program, Dawkins' weasel, or the Dawkins weasel is a thought experiment and a variety of computer simulations illustrating it.

The thought experiment was formulated by Richard Dawkins, and the first simulation was written by him; various other implementations of the program have been written by others. Dawkins made a disclaimer noting that this experiment was not intended to show how real evolution works, but to attempt to show the improvement gained by a selection mechanism in an evolutionary process. Dawkins acknowledged that to some extent his model is "misleading in important ways".

Overview
In chapter 3 of his book The Blind Watchmaker, Dawkins gave the following introduction to the program, referencing the well-known infinite monkey theorem:

"I don't know who it was first pointed out that, given enough time, a monkey bashing away at random on a typewriter could produce all the works of Shakespeare. The operative phrase is, of course, given enough time. Let us limit the task facing our monkey somewhat. Suppose that he has to produce, not the complete works of Shakespeare but just the short sentence 'Methinks it is like a weasel', and we shall make it relatively easy by giving him a typewriter with a restricted keyboard, one with just the 26 (capital) letters, and a space bar. How long will he take to write this one little sentence?"

The scenario is staged to produce a string of gibberish letters, assuming that the selection of each letter in a sequence of 28 characters will be random. The number of possible combinations in this random sequence is 2728, or about 1040, so the probability that the monkey will produce a given sequence is extremely low. Any particular sequence of 28 characters could be selected as a "target" phrase, all equally as improbable as Dawkins's chosen target, "METHINKS IT IS LIKE A WEASEL".

A computer program could be written to carry out the actions of Dawkins's hypothetical monkey, continuously generating combinations of 26 letters and spaces at high speed. Even at the rate of millions of combinations per second, it is unlikely, even given the entire lifetime of the universe to run, that the program would ever produce the phrase "METHINKS IT IS LIKE A WEASEL".

Dawkins intends this example to illustrate a common misunderstanding of evolutionary change, i.e. that DNA sequences or organic compounds such as proteins are the result of atoms randomly combining to form more complex structures. In these types of computations, any sequence of amino acids in a protein will be extraordinarily improbable (this is known as Hoyle's fallacy). Rather, evolution proceeds by hill climbing, as in adaptive landscapes.

Dawkins then goes on to show that a process of cumulative selection can take far fewer steps to reach any given target. In Dawkins's words:

"We again use our computer monkey, but with a crucial difference in its program. It again begins by choosing a random sequence of 28 letters, just as before ... it duplicates it repeatedly, but with a certain chance of random error – 'mutation' – in the copying. The computer examines the mutant nonsense phrases, the 'progeny' of the original phrase, and chooses the one which, however slightly, most resembles the target phrase, METHINKS IT IS LIKE A WEASEL."

By repeating the procedure, a randomly generated sequence of 28 letters and spaces will be gradually changed each generation. The sequences progress through each generation:


 * Generation 01:  WDLTMNLT DTJBKWIRZREZLMQCO P
 * Generation 02:  WDLTMNLT DTJBSWIRZREZLMQCO P
 * Generation 10:  MDLDMNLS ITJISWHRZREZ MECS P
 * Generation 20:  MELDINLS IT ISWPRKE Z WECSEL
 * Generation 30:  METHINGS IT ISWLIKE B WECSEL
 * Generation 40:  METHINKS IT IS LIKE I WEASEL
 * Generation 43:  METHINKS IT IS LIKE A WEASEL

Dawkins continues:

"The exact time taken by the computer to reach the target doesn't matter. If you want to know, it completed the whole exercise for me, the first time, while I was out to lunch. It took about half an hour. (Computer enthusiasts may think this unduly slow. The reason is that the program was written in BASIC, a sort of computer baby-talk. When I rewrote it in Pascal, it took 11 seconds.) Computers are a bit faster at this kind of thing than monkeys, but the difference really isn't significant. What matters is the difference between the time taken by cumulative selection, and the time which the same computer, working flat out at the same rate, would take to reach the target phrase if it were forced to use the other procedure of single-step selection: about a million million million million million years. This is more than a million million million times as long as the universe has so far existed."

Algorithm
Richard Dawkins did not provide the source code for his program. We will use a standard genetic algorithm with no crossing over that could run as follows.

The set character set are the set of uppercase letters and the blank space. Perhaps Dawkins may have used another strategy to create new generations. He may have used the technique of crossing-over, choosing the best scores and generating new strings from them by crossing-over. This will only alter the rate of convergence of the algorithm.

Criticism
Dawkins's "weasel program" has been the subject of much debate. A minor problem with Dawkins algorithm is that it was designed not considering the actual rates of occurrence of mutations and natural selection in nature, resulting in a convergence rate much faster than that could occur in effect. Intelligent Design proponent William Dembski states that choosing a prespecified target sequence as Dawkins does is deeply teleological. Dembski has criticized its assumption that the intermittent stages of such a progression will be selected by evolutionary principles, and asserts that many genes that are useful in tandem would not have arisen independently.

Pre-selected goal
The first problem with the algorithm is that he has a pre-selected goal. Royal Truman pointed out that "Once a letter falls into place, Dawkin's program ensures it won't mutate away". The examples of Dawkins in his book "The Blind Watchmaker" and in the magazine New Scientist seem to lead the reader to this conclusion. This is not the truth about the algorithm. And why? because this is a genetic algorithm that keeps the best phrases based upon an overall score of the sentence and eventually one letter in the right place can mutate in a phrase that many other letters go to the right place, so the overall score of the phrase is better. But the fact is that the algorithm was balanced in a way that the possibility of a phrase with a correct letter mutate into a wrong letter and the other letters stay equal is very small.

But in Darwin´s evolution, nature does not have any goal. When one fixes on a goal, he is in fact playing the role of God. The changes that supposedly lead to an evolution process are caused by random mutations, genetic drift and the process of natural or sexual selection. To fix a goal is to make the process of natural selection completely control the final form of the phrase created. In each of its features. This actually complicates the work of evolutionists because they have to find in each of the genes of all creatures a cause based on natural selection to explain the appearance of that trait.

Perfect natural selection
Jonathan Sarfati points out that in the Dawkins´s program, natural selection is perfect :

"a slightly closer match is the only one that is selected to reproduce for the next generation; it is as if anything else is a lethal genetic combination."

Unrealistic advantages to evolution
Walter ReMine points out that in his experiment, Dawkins disallows recessive mutations, ordinary epistasis, polygenic and pleiotropic effects which would normally, slow down evolution.

In his article The probability of a nucleotide sequence to be formed by random mutations Ulrich Utiger, a theoretical physicist, lists other unrealistic characteristics of Dawkins' program. Furthermore, he shows that despite these characteristics favoring natural selection the number of generations necessary to produce the genetic divergence between the Homo and Pan genera provided by the Weasel algorithm largely trespasses the number of generations provided by empirical data. This proves natural selection a null hypothesis that should be dismissed by the scientific community.

Intermediate forms
Another flaw in the argument Dawkins made when using this algorithm, is that all intermediate forms of life are viable and reproduce as well as others. Timothy G. Standish, professor of biology at Andrews University in Berrien Springs, Michigan, pointed out that "Changing even one amino acid in a protein can alter its function dramatically". he continues saying: "This idea of natural selection fixing amino acids as it constructs functional proteins is also unsupported by the data. Cells do not churn out large pools of random proteins on which natural selection can then act. If anything, precisely the opposite is true. Cells only produce the proteins they need to make at that time. Making other proteins, even unneeded functional ones, would be a wasteful thing for cells to do, and in many cases, could destroy the ability of the cell to function."

Now let's look at this issue from the completely opposite side. Consider that the only viable forms were:


 * MWR SWTNUZMLDCLEUBXTQHNZVJQF (the first string) and
 * METHINKS IT IS LIKE A WEASEL (the target string).

All other intermediate forms would be nonviable and would not leave offspring. In this extreme case (the extreme opposite to what was proposed by Dawkins) the chance of evolution (at a rate of change per character than 5%) would be: 0.0528 or 3.725 x 10-37. An extremely low probability and the algorithm would run for a huge amount of years without achieving the result.

Of course, for the sake of justice, none of the two alternatives accurately depicts what happens in nature. But how to quantify the number of viable and nonviable intermediate forms? This is an important issue, as the more viable intermediate forms that exist, the greater the probability of intermediate forms to appear and the lower the run-time of the algorithm. Then, is using an extreme case to illustrate a possible phenomenon a scientific approach?

Another question that follows from this reasoning is a paradox: The greater the number of intermediate forms the lower the required number of mutations needed to achieve that intermediate form, therefore, the lower the time of the algorithm to advance one step. But a large number of intermediate forms does not fit very well with the fossil record. The lower the number of intermediate forms the greater the required number of mutations needed to achieve that intermediate form and therefore the greater the time for the algorithm to advance one step.

Les Ey and Don Batten have an approach very similar to that. They call it the "Error Catastrophe model".

Transitions between viable beings
Another related issue is what happens when need to jump from one viable form to another viable form. Suppose that only one word in the sentence "me thinks it is like a weasel" would be changed, and the new phrase formed was grammatically correct. Through mutations, one would form invalid sentences of English along the way before he could finally get the sentence changed into a grammatically correct form.

Experiment
To illustrate the problem of the intermediate forms we developed an experiment based on the algorithm available in EvoAlgo.java. We choose the implementation in Java to perform the test. Every time we ran the program, for each position, we withdrew a letter as a possibility. The program code segment changed is available here. For example, when running the program for the first time all combinations were allowed.

When running the program for the second time in the first position were not allowed the blank space. In the second position were not allowed the letter "A". In third position were not allowed the letter "B" and so on.

When running the program for the third time in the first position were not allowed the letters "Z " ("Z" and blank). In the second position were not allowed the letter " A" (blank and "A"). In third position were not allowed the letters "AB" ("A and "B") and so on.

when running the program for the seventh time in the first position were not allowed the letters "UVWXYZ ". In the second position were not allowed the letters "VWXYZ A". In third position were not allowed the letters "WXYZ AB" and so on until the 28th position.

It is important to note that, besides the letters permitted, two more are added (if they are not in the set): the letter corresponding to the position of the target sentence and the letter corresponding to the position of the current sentence. For instance, in the case of 10 letters excluded, the number of letters allowed can vary from 17-19 depending on whether the current letter of the given position and/or the target letter of the current position belongs to the set or not. So, in this case 8-10 letters will not enter in each position.

The results are recorded in the table below :

It is very easy to see that as we increase the not feasible combinations, the number of generations and time to process required increases exponentially. For instance, with thirteen letters allowed per location the number of valid sentences is 1328 (~1.55 x 1031) in a universe of 2728 (~1.19 x 1040) possible sentences.



The last execution of the program (with 17 letters not allowed) resulted in 63,659,913 generations to achieve the desired result. The running time of the algorithm was ~14 hours. With even fewer letters allowed the execution would take much more time (exponential growing).

Fixed length
Another flaw in the Dawkins´s algorithm is keeping the number of a characters fixed in each step. This is unlike what occurs in nature, nor even with what would be expected of monkeys typing on a typewriter. Why would the monkeys hit the exact number of letters (28) in his attempt? In nature, chromosomes experience insertions, deletions and inversions. This can result in copies of greater length or shorter. For instance:

ME THANKS YOU VERY MUCH IT ISNT LIKE IN NATURE or

NOT CONVINCED ME, THANKS

The smallest living being
Bacteria with a smaller number of genes has approximately 500 genes. Even for the smallest living thing, some functions must exist before it can reproduce: obtain food or materials, synthesize genetic material and procure energy. Karp lists some common functions to cells :


 * Cells possess a genetic program and means to use it.
 * Cells are capable of producing more of themselves.
 * Cells acquire and utilize energy.
 * Cells Carry out a Variety of Chemical reactions.
 * Cells engage in mechanical activities.
 * Cells are capable of self-regulation.

This requires a minimum number of genes. Some scientists stated in recent theoretical and experimental work on the so-called “minimal complexity” required to sustain the simplest possible living organism suggests a lower bound of some 250-400 genes and their corresponding proteins. To jump from inanimate matter to the simplest form of life there are no intermediate steps. Thus, for the simplest living being, the Dawkins algorithm does not even apply. And in this particular case, the criticism that in the infinite monkey theorem the time would be unimaginably huge, does apply.

Parallel with DNA
Although Dawkins has not done this parallel in his example, when one consider the DNA in place of letters of the alphabet, more problems arise. Here we will use an example based on the example exposed in the book of I. L. Cohen. There are 20 amino acids (there are more than 20, but for the sake of simplicity lets consider the common ones) that are read through the RNA genetic code. We can represent these amino acids by a letter of the alphabet. Each of these amino acids is formed from a set of three nucleotides called codons or triplets. Suppose we have an RNA sequence that reads:

AUG UCU UAC AAG GCC GUG UAA

This message translates to:

AUG - start reading, UCU - attract amino acid Serine, UAC - attach amino acid Tyrosine, AAG - attach amino acid Lysine, GCC - attach amino acid Alanine, GUG - attach amino acid Valine, UAA - stop production.

Assuming now that the fifth nucleotide code (Cytosine) is destroyed. Now the message will translate very differently from before even though only one nucleotide have changed from one RNA to another:

AUG UUU ACA AGG CCG UGU AA_

This message translates now to:

AUG - start reading, UUU - attract amino acid Leucine, ACA - attach amino acid Threonine, AGG - attach amino acid Arginine, CCG - attach amino acid Proline, UGU - attach amino acid Cysteine, AA_ - it depends upon the next nucleotide.

Back to the example of Hamlet's phrase, the change in a letter, (for instance, the letter 'A' of the word 'AT' ) (assuming that each letter was composed of three nucleotides) could mean changing the whole sentence. For instance:

ME THINKS AT IS LIKE A UEASEL would change, for instance, to:

ME THINKS ZUAJTAMJLFABAWFBTFMA

This example shows how easy it is to ruin a code with a simple change.