Evolution simulation result

Paul C. Anagnostopoulos · Jun 24, 2006

So, as some of you know, I have rewritten Tom Schneider's Ev program in Java. Ev simulates the evolution of a genetic control mechanism to demonstrate that the information content of a genome can increase by evolution. Here is the link:

http://www.lecb.ncifcrf.gov/~toms/papers/ev/evj/

We've been hip deep in a conversation with a creationist about the probabilities involved in evolution of the control mechanism. He keeps hollering about 4^n, where n is the size of the genome. That is, indeed, a naive fit for the number of generations required to evolve a perfect control mechanism. However, we've been getting data that fits kn^2 (if mutations are fixed per genome) or kn^1 (if mutations vary with size of genome).

So we were running a series of simulations with a population of 96 creatures and 1 mutation per 256 bases, merrily generating data that fit kn^1, until we hit a genome of size 32,768 bases. Suddenly it's taking forever to evolve a perfect creature. The creationist goes wild, telling us we've finally hit the 4^n curve. However, since it took only about 40,000 generations to evolve perfect creatures with genomes of 23,101 bases, it didn't seem likely that it should suddenly skyrocket out of sight at 32,768.

Then it hit me: genetic load. A population of 96 may simply not be enough to support such large genomes with 128 mutations per creature per generation. To test this hypothesis, I'm running a series of simulations with a population of 12. I'm now at a genome size of 8,192 and it appears that it will not converge on a perfect creature. Stay tuned.

Is this really a genetic load issue? I think what is happening is that the population is so low that there is a high probability of damaging the genomes of every creature on every generation, so that they can never evolve to a perfect state. The number of binding errors dropped from 16 (the number of sites) to around 4 and now it just hovers around 4.

Obscure, perhaps, but fascinating nonetheless.

~~ Paul

GreedyAlgorithm · Jun 24, 2006

Interesting. I will play with this, it looks fun!

T'ai Chi · Jun 24, 2006

Paul C. Anagnostopoulos said:
So, as some of you know, I have rewritten Tom Schneider's Ev program in Java.

You are quite an intelligent designer.

jj · Jun 24, 2006

Paul C. Anagnostopoulos said:
So, as some of you know, I have rewritten Tom Schneider's Ev program in Java. Ev simulates the evolution of a genetic control mechanism to demonstrate that the information content of a genome can increase by evolution. Here is the link:

http://www.lecb.ncifcrf.gov/~toms/papers/ev/evj/

We've been hip deep in a conversation with a creationist about the probabilities involved in evolution of the control mechanism. He keeps hollering about 4^n, where n is the size of the genome. That is, indeed, a naive fit for the number of generations required to evolve a perfect control mechanism. However, we've been getting data that fits kn^2 (if mutations are fixed per genome) or kn^1 (if mutations vary with size of genome).

So we were running a series of simulations with a population of 96 creatures and 1 mutation per 256 bases, merrily generating data that fit kn^1, until we hit a genome of size 32,768 bases. Suddenly it's taking forever to evolve a perfect creature. The creationist goes wild, telling us we've finally hit the 4^n curve. However, since it took only about 40,000 generations to evolve perfect creatures with genomes of 23,101 bases, it didn't seem likely that it should suddenly skyrocket out of sight at 32,768.

Then it hit me: genetic load. A population of 96 may simply not be enough to support such large genomes with 128 mutations per creature per generation. To test this hypothesis, I'm running a series of simulations with a population of 12. I'm now at a genome size of 8,192 and it appears that it will not converge on a perfect creature. Stay tuned.

Is this really a genetic load issue? I think what is happening is that the population is so low that there is a high probability of damaging the genomes of every creature on every generation, so that they can never evolve to a perfect state. The number of binding errors dropped from 16 (the number of sites) to around 4 and now it just hovers around 4.

Obscure, perhaps, but fascinating nonetheless.

~~ Paul

Paul, you have a huge problem if you have more than an average of one mutation per critter. You must have a population wherein most of the critters are unmodified, and a few of them (per-capita) per generation are mutated.

CP489 · Jun 24, 2006

I played with the program, and I am waaay too uneducated to understand what is going on. Any specific page on your site where I could find an explanation?

Paul C. Anagnostopoulos · Jun 24, 2006

jj said:
Paul, you have a huge problem if you have more than an average of one mutation per critter. You must have a population wherein most of the critters are unmodified, and a few of them (per-capita) per generation are mutated.

Nah. The number of mutations varies with the length of the genome, more or less (at least for point mutations).

Anyway, Ev lets you specify a fixed number of mutations per genome, or lets you specify 1 mutation per b bases. The genome size vs. generations data fits kn^2 when using the first method, kn^1 when using the second.

~~ Paul

pchams · Jun 24, 2006

Sorry, I haven't followed the thread or math yet, but why do you use a population of 96?
Is this not well below accepted standards (or real life situations) of a viable genetic population?
Just links will do for this newbie.
Thanks

Paul C. Anagnostopoulos · Jun 24, 2006

CP489, read this:

http://www.lecb.ncifcrf.gov/~toms/papers/ev/evj/evj-guide.html

But let me give a quick explanation. Each creature has a single chromosome of a given length. On it are (a) a gene that produces a "protein" that binds to the DNA; (b) a specified number of sites that should be bound; (c) all remaining DNA is junk.

On each generation, the gene is tested against every position on the chromosome and mistakes are counted. It is a mistake if it binds to a position that is not a binding site, or if it does not bind to a position that is a binding site. The creatures are sorted by their mistake counts and the best half replicates and replaces the worst half. Mutations are applied on each generation, of course.

If you run the standard model, you can watch the mistake count of the best creature decrease slowly toward zero. There is a box you can check to pause when the mistake count first equals zero.

The Genetic Sequence tab shows the chromosome. The gene is at the beginning. It is actually a weighting matrix and a threshold (in blue). The binding sites are arrayed throughout the rest of the chromosome (in red or green).

Once a perfect creature has evolved, uncheck the Perform selection box and watch it deteriorate.

~~ Paul

Paul C. Anagnostopoulos · Jun 24, 2006

Pchams said:
Sorry, I haven't followed the thread or math yet, but why do you use a population of 96?
Is this not well below accepted standards (or real life situations) of a viable genetic population?

It is small, but sufficient to evolve the control mechanism. The higher the population, the faster the evolution, but the slower the simulation.

~~ Paul

CP489 · Jun 24, 2006

Paul C. Anagnostopoulos said:
CP489, read this:

http://www.lecb.ncifcrf.gov/~toms/papers/ev/evj/evj-guide.html

But let me give a quick explanation. Each creature has a single chromosome of a given length. On it are (a) a gene that produces a "protein" that binds to the DNA; (b) a specified number of sites that should be bound; (c) all remaining DNA is junk.

On each generation, the gene is tested against every position on the chromosome and mistakes are counted. It is a mistake if it binds to a position that is not a binding site, or if it does not bind to a position that is a binding site. The creatures are sorted by their mistake counts and the best half replicates and replaces the worst half. Mutations are applied on each generation, of course.

If you run the standard model, you can watch the mistake count of the best creature decrease slowly toward zero. There is a box you can check to pause when the mistake count first equals zero.

The Genetic Sequence tab shows the chromosome. The gene is at the beginning. It is actually a weighting matrix and a threshold (in blue). The binding sites are arrayed throughout the rest of the chromosome (in red or green).

Once a perfect creature has evolved, uncheck the Perform selection box and watch it deteriorate.

~~ Paul

Thanks bunches, I'll come back to this this weekend when I'm not operating on 22 hours with no sleep.

This Guy · Jun 24, 2006

Paul C. Anagnostopoulos said:
So, as some of you know, I have rewritten Tom Schneider's Ev program in Java. Ev simulates the evolution of a genetic control mechanism to demonstrate that the information content of a genome can increase by evolution. Here is the link:

http://www.lecb.ncifcrf.gov/~toms/papers/ev/evj/

We've been hip deep in a conversation with a creationist about the probabilities involved in evolution of the control mechanism. He keeps hollering about 4^n, where n is the size of the genome. That is, indeed, a naive fit for the number of generations required to evolve a perfect control mechanism. However, we've been getting data that fits kn^2 (if mutations are fixed per genome) or kn^1 (if mutations vary with size of genome).

So we were running a series of simulations with a population of 96 creatures and 1 mutation per 256 bases, merrily generating data that fit kn^1, until we hit a genome of size 32,768 bases. Suddenly it's taking forever to evolve a perfect creature. The creationist goes wild, telling us we've finally hit the 4^n curve. However, since it took only about 40,000 generations to evolve perfect creatures with genomes of 23,101 bases, it didn't seem likely that it should suddenly skyrocket out of sight at 32,768.

Then it hit me: genetic load. A population of 96 may simply not be enough to support such large genomes with 128 mutations per creature per generation. To test this hypothesis, I'm running a series of simulations with a population of 12. I'm now at a genome size of 8,192 and it appears that it will not converge on a perfect creature. Stay tuned.

Is this really a genetic load issue? I think what is happening is that the population is so low that there is a high probability of damaging the genomes of every creature on every generation, so that they can never evolve to a perfect state. The number of binding errors dropped from 16 (the number of sites) to around 4 and now it just hovers around 4.

Obscure, perhaps, but fascinating nonetheless.

~~ Paul

OK, I'm no programmer. I have played around with it a bit. But not in a long time. So, I won't be offended when you laugh at what I'm gonna suggest

Could there be a problem with the (dang, I forget the term I'm looking for)..the number of digits(?) being used for the calculations (I'm thinking single/double precision type thing

From 32,767 to 32,768 the binary number goes from 15 to 16 digits (32,767 Dec = 111111111111111 Bin, 32,768 Dec = 1000000000000000 Bin). Could that be causing the program (script?) to have problems? Perhaps with one of the derived numbers?

I tried to find the Java source, to scan over and see if I could determine anything, but I'm too stupid to even find it.

I wouldn't think there would be a problem of this nature until you hit 65536, and go over to 17 Binary digits. But well, I hada ask :blush:

I know nothing about Java script writing, and little about any others. But I know I ran into a few problems with double/single precision numbers in my days of writing basic programs at home, for fun (and anguish

Anyway, for some reason that number (32,768) rang a bell in my feeble mind, and when I checked and saw what happens in binary there, well, I just had to ask. Sorry

Paul C. Anagnostopoulos · Jun 24, 2006

Guy said:
From 32,767 to 32,768 the binary number goes from 15 to 16 digits (32,767 Dec = 111111111111111 Bin, 32,768 Dec = 1000000000000000 Bin). Could that be causing the program (script?) to have problems? Perhaps with one of the derived numbers?

Yes, I thought of that. As far as I can tell, there are no 16-bit integers where there should be 32-bit integers.

But to test my genetic load hypothesis, I'm running a model with a population of 12. The generations increased reasonably through a genome size of 5,775, then went through the roof at size 8,192 (800,000 generations). I'm now running a size of 11,550. When I'm done with a population of 12, I'll try 24.

Anyway, for some reason that number (32,768) rang a bell in my feeble mind, and when I checked and saw what happens in binary there, well, I just had to ask. Sorry.

Your mind is not that feeble, since its suspicion matched that of my extraordinary mind.

~~ Paul

This Guy · Jun 24, 2006

Paul C. Anagnostopoulos said:
Yes, I thought of that. As far as I can tell, there are no 16-bit integers where there should be 32-bit integers.

But to test my genetic load hypothesis, I'm running a model with a population of 12. The generations increased reasonably through a genome size of 5,775, then went through the roof at size 8,192 (800,000 generations). I'm now running a size of 11,550. When I'm done with a population of 12, I'll try 24.

Your mind is not that feeble, as its suspicion matched that of my extraordinary mind.

~~ Paul

hehe cool, I don't feel so dumb now

Paul C. Anagnostopoulos · Jun 24, 2006

WIth a population of 12, a genome of size 11,550 bases took 2,820,000 generations to evolve a perfect creature. I'm now running a genome size of 16,384. I'm interested to see if I can get to a genome size where a perfect creature can never evolve.

~~ Paul

rockoon · Jun 24, 2006

I'm thinking that the population is way too small and that the mutation rate is way too high.

Consider:

A 100% mutation rate is equivilent to a completely random search, which is obviously bad. The higher the mutation rate is, the more information gets lost during Reproduction.

I realize that there is some debate in the GA world about how important Mutation is, but one thing is still certain: The important "work" is done by Crossover because that is how a population gets out of local maxima. If you use Mutation to do it, your mutation rate is in theory too high.

Think of Mutation like a hill climbing operation. You can increase the rate at which the population will climb hills by increasing the Mutation rate, but you can only go so far with it. Also, there are much better (more efficient) hill climbing algorithms although for your purposes I am guessing that you want to stick with simulating evolution.

Think of Crossover like a hill choosing operation. If Mom and Dad are on diffrent hills, then Crossover is going to produce a member who is on neither hill.. thereby "examining" new territory in the search space. The overall efficiency of the Crossover operation is controlled by population diversity which quite frankly means that a bigger population is better.

The practical use of GA's is for finding the right hill, and when population diversity decreases to a certain point (they are all sitting on one or only a few hills) the practical computer scientist turns to algorithms better suitable for hill climbing (gradient descent and so forth)

Paul C. Anagnostopoulos · Jun 24, 2006

But why is there a sudden jump in the number of generations required to evolve a perfect creature? As the genome size is increased (by a factor if sqrt(2) in this case), the mutations/per base are increased by the same factor. So the probability of a mutation on any given base stays the same. Perhaps there really is no sudden increase, but rather the curve is fairly flat at the beginning and then rises sharply.

Or, perhaps it's a question of the sheer number of mutations. Even though they should be uniformly spread over the chromosome, there is some chance they bunch at the gene and/or binding sites. If there are enough of them, they trash the gene/sites completely.

Clearly the "actual curve" involves both the genome size and the population. Seems to me the number of generations is proportional to the genome size and inversely proportional to the population. So after I collect more data, I'll try fitting genome/population vs. generations.

I should not forget that whatever this turns out to be may have nothing to do with real evolution.

~~ Paul

rockoon · Jun 24, 2006

Paul C. Anagnostopoulos said:
But why is there a sudden jump in the number of generations required to evolve a perfect creature? As the genome size is increased (by a factor if sqrt(2) in this case), the mutations/per base are increased by the same factor. So the probability of a mutation on any given base stays the same. Perhaps there really is no sudden increase, but rather the curve is fairly flat at the beginning and then rises sharply.

I believe the question you should concern yourself with is the probability that the mutations applied to a new member of the population are beneficial.

Increasing the size of the genome doesnt necessarily mean you should increase the number of mutations.

Imagine a genome that is (N = 2^32 = 4,294,967,296) in size... how many mutations will you be doing?

A million of them maybe? That certainly doesnt sound like a good idea. The probability that a million mutations will make it a better genome is extremely extremely small. To make this point obvious, imagine that the pre-mutation genome is precisely 1 change away from perfection...

hammegk · Jun 24, 2006

T'ai Chi said:
You are quite an intelligent designer.

Or getting more adept at reverse engineering; may need a few more Years, though ...

Paul C. Anagnostopoulos · Jun 24, 2006

Rockoon said:
Increasing the size of the genome doesnt necessarily mean you should increase the number of mutations.

Imagine a genome that is (N = 2^32 = 4,294,967,296) in size... how many mutations will you be doing?

In nature, it appears that the number of point mutations to a genome varies with its size. Certainly the rate I'm using is much too high, but it allows the simulations to proceed at a reasonable rate.

However, I have been completely ignoring another factor: the binding site information capacity. These simulations use a 6-base binding site, so its information capacity (Rcapacity) is 12 bits. The number of bits of information needed to just match the binding sites (Rfrequency) varies as the genome size, ranging from 4 bits for 256 bases to 11 bits for 32,768 bases. These sudden jumps in generations that I'm seeing occur around the point where the evolved information (Rsequence) is 9.5--11 bits. Clearly a perfect creature cannot evolve at all if Rfrequency > Rcapacity, but what happens as Rfrequency approaches Rcapacity?

~~ Paul

rockoon · Jun 25, 2006

Paul C. Anagnostopoulos said:
In nature, it appears that the number of point mutations to a genome varies with its size. Certainly the rate I'm using is much too high, but it allows the simulations to proceed at a reasonable rate.

There are multiple ways to skin a cat. Are you sure that you've chosen the best method?

My experience with GA's is from an applied/practical standpoint - problem solving rather than a theoretical research into evolution - A GA, to me, is just another search algorithm that happens to be fairly well suitable to large search spaces with lots of local maxima's.

Often I run a GA without *ANY* mutation at all. They still converge quickly given a large initial (random) population.

Paul C. Anagnostopoulos said:
However, I have been completely ignoring another factor: the binding site information capacity. These simulations use a 6-base binding site, so its information capacity (Rcapacity) is 12 bits. The number of bits of information needed to just match the binding sites (Rfrequency) varies as the genome size, ranging from 4 bits for 256 bases to 11 bits for 32,768 bases. These sudden jumps in generations that I'm seeing occur around the point where the evolved information (Rsequence) is 9.5--11 bits. Clearly a perfect creature cannot evolve at all if Rfrequency > Rcapacity, but what happens as Rfrequency approaches Rcapacity?

~~ Paul

You are using a bunch of terms I don't grok here... "binding site" and so forth... sounds to me like you are trying to simulate actual evolution and I am afraid I really don't have any experience with that. In computer science a GA is nothing more than a search function and simply doesnt mess around with complexities that don't actualy aid in the search.

Evolution simulation result

Nap, interrupted.

Muse

Penultimate Amazing

Penultimate Amazing

Critical Thinker

Nap, interrupted.

Muse

Nap, interrupted.

Nap, interrupted.

Critical Thinker

Master Poster

Nap, interrupted.

Master Poster

Nap, interrupted.

Graduate Poster

Nap, interrupted.

Graduate Poster

Banned

Nap, interrupted.

Graduate Poster