INTRODUCTION TO ASSOCIATION MAPPING

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Planning breeding programs for impact
Genetic Terms Gene - a unit of inheritance that usually is directly responsible for one trait or character. Allele - an alternate form of a gene. Usually.
Association Mapping as a Breeding Strategy
Identification of markers linked to Selenium tolerance genes
Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
ASSOCIATION MAPPING WITH TASSEL Presenter: VG SHOBHANA PhD Student CPMB.
Genome-wide association mapping Introduction to theory and methodology
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 19: Causes and Consequences of Linkage Disequilibrium March 21, 2014.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Microevolution Chapter 18 contined. Microevolution  Generation to generation  Changes in allele frequencies within a population  Causes: Nonrandom.
Signatures of Selection
THE EVOLUTION OF POPULATIONS
Population Genetics (Ch. 16)
Lecture 5 Artificial Selection R = h 2 S. Applications of Artificial Selection Applications in agriculture and forestry Creation of model systems of human.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Evolution of Populations
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Module 7: Estimating Genetic Variances – Why estimate genetic variances? – Single factor mating designs PBG 650 Advanced Plant Breeding.
Hidenki Innan and Yuseob Kim Pattern of Polymorphism After Strong Artificial Selection in a Domestication Event Hidenki Innan and Yuseob Kim A Summary.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Non-Mendelian Genetics
PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium.
Genetic Linkage. Two pops may have the same allele frequencies but different chromosome frequencies.
I. I.Microevolution Evolution occurs when populations don’t meet all the H-W assumptions Process by which a population’s genetic structure changes = microevolution.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
Microevolution – BioH Ch 16 Where did all organisms come from? Why such variety? 1.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
1 Population Genetics Definitions of Important Terms Population: group of individuals of one species, living in a prescribed geographical area Subpopulation:
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Evolution of Populations. The Smallest Unit of Evolution Natural selection acts on individuals, but only populations evolve – Genetic variations contribute.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Chap 23 Evolution of Populations Genotype p2p2 AA 2pqAa q2q2 aa Phenotype Dominantp 2 + 2pq Recessiveq2q2 Gene pA qa p + q = 1 p 2 + 2pq + q 2 = 1.
Use of breeding populations to detect and use QTL Jean-Luc Jannink Iowa State University 2006 American Oat Workers Conference Fargo, ND24 July 2006.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Evolution of populations Ch 21. I. Background  Individuals do not adapt or evolve  Populations adapt and evolve  Microevolution = change in allele.
Evolution of Populations. Individual organisms do not evolve. This is a misconception. While natural selection acts on individuals, evolution is only.
Evolution of Populations
Evolution for Beginners. What is evolution? A basic definition of evolution… “…evolution can be precisely defined as any change in the frequency of alleles.
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Population Genetics Chapter 23. Levels of Organization Atoms - CHNOPS Molecules – Carbs, Proteins, Lipids, Nucleic Acids Organelles – Nucleus, Ribsomes,
Genetic Linkage.
MULTIPLE GENES AND QUANTITATIVE TRAITS
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Genetic Linkage.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Conclusions of Hardy-Weinberg Law
MULTIPLE GENES AND QUANTITATIVE TRAITS
The ‘V’ in the Tajima D equation is:
Basic concepts on population genetics
Genome-wide Association Studies
Linkage Genes that are physically located on the same chromosome are said to be “linked”. Linked genes are said to be “mapped” to the same chromosome.
The Evolution of Populations
Genetic Drift, followed by selection can cause linkage disequilibrium
Genetic Linkage.
Linkage Genes that are physically located on the same chromosome are said to be “linked”. Linked genes are said to be “mapped” to the same chromosome.
The Evolution of Populations
Population Genetics: The Hardy-Weinberg Law
Presentation transcript:

INTRODUCTION TO ASSOCIATION MAPPING

We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic data of the lines for several traits And now What?

We will take advantage of the Linkage Disequilibrium (LD) to identify genetic regions associated with our trait of interest Association mapping is also called Linkage Disequilibrium mapping

Identify associations between markers and phenotypes without the need to develop specific populations Marker Distance Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8 Line 9 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16 _3_0363_ A B _1_1061_ 0.8 _3_0703_ 1.5 _1_1505_ _1_0498_ _2_1005_ 3.8 _1_1054_ _2_0674_ 6 _1_0297_ 8.8 _1_0638_ 10.7 _1_1302_ 11.4 _1_0422_ _2_0929_ 15.3 _3_1474_ 15.4 _1_1522_ 17.3 _2_1388_ _3_0259_ 18.1 _1_0325_ _2_0602_ 20.8 _1_0733_ 23.9 _2_0729 _1_1272_ _2_0891_ 26.1 _2_0748_ 26.6 _3_0251_ 27.4 _1_0997_ 35.5 _1_1133_ 41.8 _2_0500_ 42.5 _3_0634_ 43.3 10 Desease severity 5

Definition of Linkage Disequilibrium is very simple: is the ‘non-random association of alleles at different loci’ A B b a Locus 1 Locus 2 A B a b Locus 1 Locus 2 Equilibrium Disequilibrium

Random mating population with loci segregating independently Disequilibrium Equilibrium Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Non random mating population LD due to selection, mutation, drift/sampling, population structure Random mating population with loci segregating independently

How do we measure LD? The LD is measured with a parameter called D. If alleles at different loci are not inherited independently, then: PAB ≠ PA x PB and DAB = PAB – PA x PB (PA and PB are allele frequencies and PAB is the haplotype frequency) Standarized measures of LD: D’ and r2 for D < 0 for D > 0

Haplotype frequencies: PAB= 9/30 PaB= 6/30 PAb= 1/30 Pab= 14/30 Line 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Locus 1 a A Locus 2 b B Allele frequencies: PA= 10/30 Pa= 20/30 PB= 15/30 Pb= 15/30 Haplotype frequencies: PAB= 9/30 PaB= 6/30 PAb= 1/30 Pab= 14/30 DAB = PAB – PA x PB = 9/30 – (10/30 x 15/30) = 0.13

Spring barley – Two rows – Chromosome 5H Distance (bp)

1.5 kb (diverse inbred lines) >100 kb (Elite lines Barley Extension of LD Humans 80kb (Europeans) 5kb (Nigerians) Outcrossing Cattle > 10 cM Arabidopsis 250 kb Selfing Maize 1 kb (Diverse maize) 1.5 kb (diverse inbred lines) >100 kb (Elite lines Barley Up to 100kb Flint-Garcia et al., Annu. Rev. Plant Biol. 2003. 54:357–74

Factors that increase LD: Factors that decrease LD: mutation mating system (self-pollination), population structure admixture relatedness (kinship) small founder population size or genetic drift selection (natural, artificial, and balancing) Factors that decrease LD: high recombination and mutation rate recurrent mutations outcrossing

Allele b appears on gamete carrying A A and b will appear together Mutation: provides the original material for producing polymorphism that will be in LD A B a Locus 1 Locus 2 A b Allele b appears on gamete carrying A A and b will appear together A B a

Mating system: Generally LD decays more rapidly in outcrossing species compared to selfing, where individuals are likely to be homozygous In selfing species, most recombination occurs between identical haplotypes, as a result of high individual homozygosity, and thus these events do not reduce LD Selfing reduces the rate at which LD breaks down When loci are closely linked in a selfing population they remain in high LD for many generations Selfing, little or no recombination Outcrossing = 0.00 Selfing = 0.99 Little recombination = 0.05 High recombination = 0.5 Outcrossing, high recombination

Drift / Sampling Selection In small populations the effects of genetic drift results in the loss of rare allelic combination, which increases LD. Sampling increases or reduces certain allelic combinations by chance Selection Strong selection at a locus is expected to reduce diversity and increase LD in the surrounding region Selection operating on a gene will increase LD and reduce diversity in the vicinity of that gene. Alleles flanking the selected gene will be fixed. Can cause LD also between unlinked loci: typical result of coselection of loci during breeding for multiple traits

LOD LOD

LOD LOD

What information we need to know the association mapping analysis? Genotypic: Linkage disequilibrium decay Number of markers and Marker density Quality of the data: missing values, minor allele frequency Phenotypic: Quantitative or qualitative traits Heritability of the trait, repeatability Population: Structure Kinship

r2 r2 Genotypic Information: Linkage disequilibrium decay. The power of detection is highly influenced by the LD between the QTL and the marker r2 r2 10 kb 100 kb Physical distance Physical distance

Marker density The extend of LD shows the expected r2 at a given distance According to it, it is important to chose an adequate marker density to increase the power of detection r2 r2 10 kb 100 kb Physical distance Physical distance

Quality of the data: Number of individuals: with small samples sizes, the probability of a significant association between maker and QTL is high. Marker Distance Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8 Line 9 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16 _3_0363_ A B _1_1061_ 0.8 _3_0703_ 1.5 _1_1505_ _1_0498_ _2_1005_ 3.8 _1_1054_ _2_0674_ 6 _1_0297_ 8.8 _1_0638_ 10.7 _1_1302_ 11.4 _1_0422_ _2_0929_ 15.3 _3_1474_ 15.4 _1_1522_ 17.3 _2_1388_ _3_0259_ 18.1 _1_0325_ _2_0602_ 20.8 _1_0733_ 23.9 _2_0729 _1_1272_ _2_0891_ 26.1 _2_0748_ 26.6 10 Desease severity 5

Quality of the data: Number of individuals: with small samples sizes, the probability of a significant association between maker and QTL is high. Marker Distance Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8 _3_0363_ A B _1_1061_ 0.8 _3_0703_ 1.5 _1_1505_ _1_0498_ _2_1005_ 3.8 _1_1054_ _2_0674_ 6 _1_0297_ 8.8 _1_0638_ 10.7 _1_1302_ 11.4 _1_0422_ _2_0929_ 15.3 _3_1474_ 15.4 _1_1522_ 17.3 _2_1388_ _3_0259_ 18.1 _1_0325_ _2_0602_ 20.8 _1_0733_ 23.9 _2_0729 _1_1272_ _2_0891_ 26.1 _2_0748_ 26.6 10 Desease severity 5

Quality of the data: Minor allele frequency Line Locus 1 Locus 2 Line Phenotype: heading date b B 21 20 19 18 17 26 25 24 23 22 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 63 58 154 64 57 153 60 149 151 152 59 150 62 Locus 1 Locus 2 Locus 1: Average allele b: 78.8 Average allele B: 152 Locus 2: Average allele b: 87.7 Average allele B: 89.3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 a A b B Two loci can be completely unlinked and still show high LD

Quality of the data: Missing data Line Phenotype: heading date 21 20 19 18 17 26 25 24 23 22 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 63 58 154 64 57 153 60 149 151 152 59 150 62 Locus 1 B b Locus 2 Locus 1: Average allele b: 76.2 Average allele B: 102.8 Locus 2: Average allele b: 87.7 Average allele B: 89.3 b - b b B b B b b B - b b - b B b b - b - b b b b b

What information we need to know the association mapping analysis? Genotypic: Linkage disequilibrium decay Number of markers and Marker density Quality of the data: missing values, minor allele frequency Phenotypic: Quantitative or qualitative traits Heritability of the trait, repeatability Population: Structure Kinship

h2=Vgenotipic/Vphenotypic Quantitative or qualitative traits One or more QTL involved The higher the effect of the QTL, the higher the power of detection Quantitative traits: usually many genes involved of small effect The problem of epistatic traits Heritability of the trait, repeatability h2=Vgenotipic/Vphenotypic

The problem of epistatic traits Line VRN1 VRN2 Phenotype: heading date 1 a c 62 VRN1 and VRN2 located in different chromosomes 2 A c 152 3 a c 59 4 a c 58 5 A D 60 6 a c 60 No association between individuals genes (VRN1 or VRN2) and heading date 7 a D 57 8 a c 64 9 A c 151 10 a D 59 11 a D 58 12 a c 152 However, late heading date only when haplotype Ac is present 13 a c 60 14 A c 151 15 a c 58 16 A c 149 17 A D 64 18 a c 58 19 A c 154 20 a c 58 21 a D 63 22 a c 60 23 A c 153 24 a c 58 25 a c 57 26 a c 64

What information we need to know the association mapping analysis? Genotypic: Linkage disequilibrium decay Number of markers and Marker density Quality of the data: missing values, minor allele frequency Phenotypic: Quantitative or qualitative traits Heritability of the trait, repeatability Population: Structure Kinship

Population Structure: The classical example of interference by population structure Study of type 2 diabetes in 2 tribes of Native Americans from Arizona A correlation between a haplotype at the immunoglobulin G locus and reduced diabetes However on further analysis it was found that those with diabetes had a lower proportion of European ancestry And that the haplotype associated with reduced diabetes was more prevalent in Europeans When the analysis was restricted to individuals with similar European ancestry, the association was no longer detected. Knowler WC, et al. 1988. Am. J.Hum. Genet. 43:520–26

Population Structure Similar structure exists in plants Breeding history of many important crop species and limited gene flow have created complex stratification within the germplasm. Different geographic origin of the germplasm causes population structure (usually natural selection tends to fix alleles at many loci related to adaptation). Also the destination of the crop, growth habit, certain morphological traits. This is a common cause of spurious associations

How can we allocate individuals to sub-populations? First, we need to know in advance how many sub-populations there are. If unknown, this can be estimated: The allocation process is repeated for different possible numbers and the best fitting selected.

The computer program STRUCTURE Uses computationally intensive methods to partition individuals into populations. Many individuals or lines will not belong uniquely to one, but will be the descendents of crosses between two or more ancestral populations. STRUCTURE also estimates the proportion of ancestry attributable to each population.

The effect of kinship: y = Xß + Qv + Zu + e Xß includes all fixed effects: population means, environments, and marker allele effects Q is a subpopulation incidence matrix; v are estimates of subpopulation mean effects There is a degree of relatedness not captured by population structure: u is the polygenic effect gnerated by othre loci unlinked to the one being tested