Tomatoville® Gardening Forums


Notices

Forum area for discussing hybridizing tomatoes in technical terms and information pertinent to trait/variety specific long-term (1+ years) growout projects.

Reply
 
Thread Tools Display Modes
Old September 17, 2015   #1
crmauch
Tomatovillian™
 
crmauch's Avatar
 
Join Date: May 2013
Location: Honey Brook, PA Zone 6b
Posts: 399
Default Numbers of Plants (Probabilities)

I thought I'd post some calculations on the number of plants needed to recover a set of genes. These calculations are most useful for recovering recessive genes, but can be used to "verify" that your dominant gene is pure breeding.
(Carol Deppe's book "Breed Your Own Vegetable Varieties" has a nice chart that can be used but I thought figuring out your own probabilities and knowing what you need to plant in order to have a probability of success would be useful).

This differentiates between Chance and Probability. Chance is what percentage outcome there is, while probability is degree of likelihood that that that chance will occur (i.e. there is 50% Chance when you flip a coin that you'll get tails, but you can flip multiple times and only get heads, how many flips will it take to get a 99% probability of getting a tail result?)

(My assumptions about dominant and recessive in this may not be completely accurate and are for example only):

Single Gene

Let's say you know you have a cross between a red tomato (RR) and a yellow tomato (rr). The resulting new plant is red (Rr) and you want to grow out it's seeds and get a yellow (rr) tomato.

The selfed (Rr) tomato will yield 3 genotypes and 2 phenotypes. The genotypes are (RR, Rr, and rr). The phenotypes are (Red and Yellow). If you used a Punnett Square, you'd know you have a 25% chance of getting a yellow. If you grew 4 plants, what would your probability to have a yellow show up?

The equation is:

ProbSuccess_2.jpg

where "n" is the number of plants.
So if we plug in the numbers P(yellow) = 1 - (Chance(red))**n = 1 - (0.75)**4 = 68% probability of getting a yellow.

Let's say you know the probability you want, but need to know the number of plants to grow to get that probability?

Then the equation becomes:

ProbNumPlant_3.jpg

(If your calculator has a log() function (base 10) instead of the natural log ln you can use that instead - just use it on top and bottom of the equation

So if we wanted an 80% Probability of getting a yellow it would be like this:

80%: n = ln(1 - P(yellow)) / ln(Chance(red)) = ln(1-0.8)/ln(0.75) = 6 plants

So for

95%: n ~ 10 plants
99%: n ~ 16 plants

How would you use this to verify your tomato was pure breeding for red?:

99.9%: n ~ 24 plants (If you grew 24 plants and no yellows showed up, you could be fairly certain no yellow genes lurk in the genome of your tomato).

Two Genes

[note that determinate is usually referenced as sp and indeterminate as sp+, but I'm going to use I as (indet) and i as (determinate).]

So assume your starting tomato is a red indeterminate (RrIi) with yellow and determinate recessive genes and you want to recover a yellow determinate (rrii):

There are 16 genetic possibilities on the Punnett Square, 4 phenotypes: (red, indet; red, det; yellow, indet; yellow det) with a 1/16 chance (0.0625) of a yellow determinate.

Number of plants for a certain probability:
80%: n ~ 24 plants
95%: n ~ 46 plants
99%: n]~ 71 plants

Three Genes

Things kind of 'blow' up at 3 genes where the numbers required to 'recover' 3 genes gets very large. Let's say we want a yellow, determinate, potato leaf. (the standard gene nomenclature for potato-leaf is c (where C is regular leaf). So we'd be looking for rriicc. There are 64 genetic possibilities on the Punnett Square. and only 1/64th (or 0.015625) chance of getting all three recessives in one plant.

Number of plants for a certain probability:
60%: n ~ 58 plants
70%: n ~ 76 plants
80%: n ~ 102 plant
90%: n ~ 146 plants
95%: n ~ 190 plants
99%: n ~ 292 plants

It's probably much easier (unless you have a lot of land, much time, or are a professional breeder). to breed for 2 characteristics and then back or out cross for the 3rd characteristic).

Note you can also search over time -- Grow 30 plants this year, 30 the next until you reach the probability you're looking for (and hopefully you'll hit your result on the way).

Further Reading:
These two related sites about probability were useful for me to develop this:
http://montessorimuddle.org/2015/03/...probabilities/
and
http://montessorimuddle.org/2013/05/...-are-the-odds/

This was assuming 'regular' Mendelian genetics w/ simple dominance, no linkage, etc. There are at least 12 exceptions to standard Mendelian genetics:

http://anthro.palomar.edu/mendel/mendel_3.htm

(An interesting note at the bottom where one plant can reacquire genes had by its ancestors, that are not in its DNA!)

Hope this was helpful/useful.

BTW, on the coin flip question? If you want a 99% probability of getting a tail result how many flips would you need to perform? (of course you might get it sooner):

n = ln(1-.99)/ln(.5) = 6+ or 7 flips

Last edited by crmauch; September 17, 2015 at 03:52 PM. Reason: misplaced decimal (Incorrect calculations)
crmauch is offline   Reply With Quote
Old September 17, 2015   #2
Fusion_power
Tomatovillian™
 
Fusion_power's Avatar
 
Join Date: Feb 2006
Location: Alabama
Posts: 2,250
Default

Nice writeup. Linkage can be both helpful and harmful. Many of the paste genes are on chromosome 5. If you are selecting for paste tomatoes, most of the required traits will follow the chromosome. This means that chromosome 5 linkage might change having to calculate for 5 genes into a calculation more like for 1 gene. On the other hand, if two genes are tightly linked and one of them is wanted and the other must be excluded, then you may wind up having to grow thousands of plants to get just one with the desired crossover.
Fusion_power is offline   Reply With Quote
Old September 17, 2015   #3
bower
Tomatovillian™
 
bower's Avatar
 
Join Date: Feb 2012
Location: Newfoundland, Canada
Posts: 6,794
Default

Pretty cool calcs, Chris.
My gut tells me that 6 is a good number for nailing the one in four.
With 6 plants, the chance of two recessives is also nearly 1 in 3, a fair chance if you'd really like to find more than one. The chance of finding 2 recessives in 6 plants is as good as the chance of failing to find one recessive in 4 plants.
With a dozen plants, you have a little better than 50% chance of finding two recessives. So for gamblers or optimists who don't have much space, the dozen is the magic number where your chance of finding two is better than your chance of failure.

One thing that surprised me is that the p(fail) for growing 4 plants to find 1 in 4 is .3164, but the p(fail) for growing 16 plants to find 1 in 16 is .3560 (as jotted from my extremely basic calculator using your first equation).
This hardly seems fair. My gut (again) tells me their odds should be equal, or if anything, skewed the other way. Why? because four is such a small sample, the chance of not living up to its formally calculated odds is more likely in reality, I think. ?????
bower is offline   Reply With Quote
Old September 18, 2015   #4
crmauch
Tomatovillian™
 
crmauch's Avatar
 
Join Date: May 2013
Location: Honey Brook, PA Zone 6b
Posts: 399
Default

Quote:
Originally Posted by bower View Post
Pretty cool calcs, Chris.
Thanks!


Quote:
Originally Posted by bower View Post
My gut tells me that 6 is a good number for nailing the one in four.
With 6 plants, the chance of two recessives is also nearly 1 in 3, a fair chance if you'd really like to find more than one. The chance of finding 2 recessives in 6 plants is as good as the chance of failing to find one recessive in 4 plants.
I have to work the numbers to understand (I'm slow that way);

(Finding 1 recessive in 6 plants)
P = 1 - (0.75)^6 = 82%
Those are pretty good odds!


P = 1 - (0.9375)^6 = 32%
I agree with your statement, but I really like my probabilities above 50%


Quote:
Originally Posted by bower View Post
With a dozen plants, you have a little better than 50% chance of finding two recessives. So for gamblers or optimists who don't have much space, the dozen is the magic number where your chance of finding two is better than your chance of failure.
P = 1 - (0.9375)^12 = 54% -- Yep. Correct.

Quote:
Originally Posted by bower View Post
One thing that surprised me is that the p(fail) for growing 4 plants to find 1 in 4 is .3164, but the p(fail) for growing 16 plants to find 1 in 16 is .3560 (as jotted from my extremely basic calculator using your first equation).
P(f) = 1 - (1 - (0.75)^4) = 0.3164

P(f) = 1 - (1 - (0.9375)^16) = 0.3561

Again you are correct.

Quote:
Originally Posted by bower View Post
This hardly seems fair. My gut (again) tells me their odds should be equal, or if anything, skewed the other way. Why? because four is such a small sample, the chance of not living up to its formally calculated odds is more likely in reality, I think. ?????
'Artoo, you know better than to trust a strange gut!'
I do think you have a point that 'sampling error' could be a factor at small sample sizes. (sorry just wanted to use this emoticon).
crmauch is offline   Reply With Quote
Old September 25, 2015   #5
BreedersUnited
Tomatovillian™
 
Join Date: May 2015
Location: North Dakota
Posts: 10
Default

So here's my question:

This is a useful equation, but what I'm interested in is if you have a sense of allele frequencies in a population (based on previously reported data) and you can hypothesize gene action (dominance vs. epistasis), can you still utilize this equation to determine the needed number of plants?
BreedersUnited is offline   Reply With Quote
Old September 26, 2015   #6
carolyn137
Moderator Emeritus
 
carolyn137's Avatar
 
Join Date: Jan 2006
Location: Upstate NY, zone 4b/5a
Posts: 21,169
Default

I am extremely allergic to math and always have been. All the kids take calculus in HS now but I was supposed to take it in my freshman year at college and it got to planning for my Senior year when my advisor told me I could not graduate unless I took it. So there I was as a senior in freshman calculus, I passed the darn course minimally and then burned the book in a wonderful flaming ceremony.

All to say, I think many here know who Keith Mueller is, Fusion for sure, and many b'c of the varieties Keith has bred. I think Fusion would agree with me that Keith probably knows more tomato genetics than most folks. He got his MS degree with Dr. Randy Gardner in NC and has continued to delve into tomato genetics through the years.

And I do think the way he presents #'s of plants to set out and probabilities might help others who are reading this who may also hate math, kudos to you Chris, you do.

First, his home website page

http://www.kdcomm.net/~tomato/

And on that first page he lists the varieties he's bred, which many of you will know and what else he's doing

http://www.kdcomm.net/~tomato/releases/

Next, his way of illustrating segregation which has helped many and then follow from this page to the next pages having to do Fgens and probabilities.

http://kdcomm.net/~tomato/gene/genes.html

Click on Segregation at the bottom of the above page and keep going from there to the next page, etc.

Keith's website is a treasure of information so when any of you have time click on Culture, click on how to make crosses, click on all the other links he gives that might interest you.

I hope the above helps someone somewhere.

Carolyn
__________________
Carolyn
carolyn137 is offline   Reply With Quote
Old September 26, 2015   #7
BreedersUnited
Tomatovillian™
 
Join Date: May 2015
Location: North Dakota
Posts: 10
Default

Thanks Carolyn for the above links but for me that indormation is fairly repetitive and doesn't really answer my question.

I work in other crops that are primarily outcrossing and while I'm no population geneticist, I find myself applying many principles in my work. I'm more interested at determining the number of plants needing to be screened if you have a sense of allele frequencies (whole populations) rather than just evaluating progenies from one cross.

The traits of interest are both dominant (single gene) and quantitavely inherited.
BreedersUnited is offline   Reply With Quote
Old September 26, 2015   #8
carolyn137
Moderator Emeritus
 
carolyn137's Avatar
 
Join Date: Jan 2006
Location: Upstate NY, zone 4b/5a
Posts: 21,169
Default

Quote:
Originally Posted by BreedersUnited View Post
Thanks Carolyn for the above links but for me that indormation is fairly repetitive and doesn't really answer my question.

I work in other crops that are primarily outcrossing and while I'm no population geneticist, I find myself applying many principles in my work. I'm more interested at determining the number of plants needing to be screened if you have a sense of allele frequencies (whole populations) rather than just evaluating progenies from one cross.

The traits of interest are both dominant (single gene) and quantitavely inherited.
No, it doesn't answer your question and I'm sorry I didn't note it was a suggestion for those who had posted above your question and anyone else reading but not posting..

I didn't know the answer to your question and hope that others will be able to answer you.

Carolyn
__________________
Carolyn
carolyn137 is offline   Reply With Quote
Old September 26, 2015   #9
bower
Tomatovillian™
 
bower's Avatar
 
Join Date: Feb 2012
Location: Newfoundland, Canada
Posts: 6,794
Default

Hi BreedersUnited,
Welcome to T'ville.

I'm not a population geneticist either, but I believe the probability equation should be the same for an outcrossing population as it is for an inbred population. If you know your allele frequency, that's the same as knowing by Mendelian ratios what frequency is expected in a tomato cross, as in the example.

It may be a different story for QTL's though. At least, in tomato, where QTL's are involved there are usually multiple genes involved. And we don't necessarily know how many, maybe have an idea based on whatever has been published lately. The picture keeps changing as more research is done. Obviously, the more complex the genetics of the trait, the more plants you would need to grow to find the trait sought for - fruit size is one such QTL governed trait that was recently discussed here, for example.

If you said more about the crops you're working with, and the QTL's involved, and what traits you're screening for, what kind of 'screening'... maybe there's someone who can give you a better answer.
bower is offline   Reply With Quote
Old September 26, 2015   #10
BreedersUnited
Tomatovillian™
 
Join Date: May 2015
Location: North Dakota
Posts: 10
Default

Thanks bower.

I think the assumption here is that the population is in HWE. I know that not true, assortative mating occurs readily based on flowering time and migration occurs beacsuse it's wind-pollinated. So ASSUMING that the population is in HWE, the equation should be:

n= log(prob. of success) / log(1-frequency of success)

I would be evaluating seedlings germinated from these populations meaning that I have to take that into account. So...

n= log(prob. of success) / log[1- (allele freq. x F)].

F is then equal to the degree of inbreeding amongst the progeny. Making the assumption that everyone has equal chances and the plants are random-mating, this should be relatively low.

Last edited by BreedersUnited; September 26, 2015 at 07:47 PM.
BreedersUnited is offline   Reply With Quote
Old September 26, 2015   #11
bower
Tomatovillian™
 
bower's Avatar
 
Join Date: Feb 2012
Location: Newfoundland, Canada
Posts: 6,794
Default

Quote:
Originally Posted by BreedersUnited View Post
Thanks bower.

I think the assumption here is that the population is in HWE. I know that not true, assortative mating occurs readily based on flowering time and migration occurs beacsuse it's wind-pollinated. So ASSUMING that the population is in HWE, the equation should be:

n= log(prob. of success) / log(1-frequency of succes)

I would be evaluating seedlings germinated from these populations meaning that I have to take that into account. So...

n= log(prob. of success) / log[1- (allele freq. x F)].

F is then equal to the degree of inbreeding amongst the progeny. Making the assumption that everyone has equal chances and the plants are random-mating, this should be relatively low.
So.. if I understand correctly, you are just using Hardy-Weinberg as a means of estimating the allele frequencies, - or are you testing for HWE to assess the degree of inbreeding?
I don't know much about it, but afaict HWE test requires a large sample size. And there can be problems even with large samples, for example:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199378/
A formula is given involving sample size n in the background section:
http://www.scielo.br/scielo.php?pid=...pt=sci_arttext
It's been so long since I studied Statistics, I can barely read that stuff.
So many years in the surplus information storage space aka my 'gut', has been lossy.
bower is offline   Reply With Quote
Old September 26, 2015   #12
BreedersUnited
Tomatovillian™
 
Join Date: May 2015
Location: North Dakota
Posts: 10
Default

No, I have data from prior studies that I will utilize to determine allele frequencies of the populations I'm sampling from. My primary assumption is that the population is in HWE and the allele frequencies have not changed since the study was published.

Given that information, I will also make the assumption that the individuals in question are not (or only slightly inbred) and thus, I should be able to determine a number of plants to be screened based on the parameters of gene action and allele frequencies.

Does that make sense?
BreedersUnited is offline   Reply With Quote
Old September 26, 2015   #13
bower
Tomatovillian™
 
bower's Avatar
 
Join Date: Feb 2012
Location: Newfoundland, Canada
Posts: 6,794
Default

For sure, that makes sense to me.

What I like about Chris's formulas is that they're simple and besides giving a number of plants, also gives the percent confidence for that size of sample.

For practical reasons we have to start with a reasonable lower limit to the number of plants. If the results aren't as you expected, or cause any concern, you can always go to a larger sample size to double check.

Then again, an expert may tell you that you have to follow some completely different formula. I hope not.

Would love to hear about your tests and results if you ever have time to tell us about it...
bower is offline   Reply With Quote
Old September 27, 2015   #14
Fusion_power
Tomatovillian™
 
Fusion_power's Avatar
 
Join Date: Feb 2006
Location: Alabama
Posts: 2,250
Default

There is an article from ifas about a breeding program to move tomato mosaic virus tolerance into an elite breeding line. Long story short, 20,000 plants was not enough to break the linkage.

So my answer to BreedersUnited's question is that doing the math is only effective if you understand the genetics involved.

Breeding for multiple low effect genes rapidly runs into either linkage or else the number of plants becomes larger than most growers can support.

I know a guy who is currently working in the breeding industry who could probably help. He is a skilled population geneticist. I'll have to decide if I want to bring him into this discussion. I don't know if he would even be interested in participating on this forum.
Fusion_power is offline   Reply With Quote
Old September 27, 2015   #15
BreedersUnited
Tomatovillian™
 
Join Date: May 2015
Location: North Dakota
Posts: 10
Default

Quote:
Originally Posted by Fusion_power View Post
There is an article from ifas about a breeding program to move tomato mosaic virus tolerance into an elite breeding line. Long story short, 20,000 plants was not enough to break the linkage.

So my answer to BreedersUnited's question is that doing the math is only effective if you understand the genetics involved.

Breeding for multiple low effect genes rapidly runs into either linkage or else the number of plants becomes larger than most growers can support.

I know a guy who is currently working in the breeding industry who could probably help. He is a skilled population geneticist. I'll have to decide if I want to bring him into this discussion. I don't know if he would even be interested in participating on this forum.
I get what you're saying about numbers, and I've run several permutations on how many plants might be needed based on the different traits of interest and the number of genes involved.

What I'm looking for is confirmation of my hypothesis about how the equation can be utilized. I would like to validate it but I don't want to be headed on a wild goose-chase if I can somehow circumvent the entire issue altogether.

I would first look at single gene traits and afterwards expand my screens to more complex, polygenic traits.
BreedersUnited is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 08:29 AM.


★ Tomatoville® is a registered trademark of Commerce Holdings, LLC ★ All Content ©2022 Commerce Holdings, LLC ★