Recombination Mapping SNP mapping

Slides:



Advertisements
Similar presentations
The genetic dissection of complex traits
Advertisements

What is an association study? Define linkage disequilibrium
Planning breeding programs for impact
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Qualitative and Quantitative traits
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
ASSOCIATION MAPPING WITH TASSEL Presenter: VG SHOBHANA PhD Student CPMB.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
QTL Mapping R. M. Sundaram.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Observing Patterns in Inherited Traits
SNPs DNA differs between humans by 0.1%, (1 in 1300 bases) This means that you can map DNA variation to around 10,000,000 sites in the genome Almost all.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Multifactorial Traits
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Non-Mendelian Genetics
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
INTRODUCTION TO ASSOCIATION MAPPING
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Mapping and cloning Human Genes. Finding a gene based on phenotype ’s of DNA markers mapped onto each chromosome – high density linkage map. 2.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Why you should know about experimental crosses. To save you from embarrassment.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
upstream vs. ORF binding and gene expression?
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Power to detect QTL Association
Genome-wide Associations
Medical genomics BI420 Department of Biology, Boston College
Lecture 9: QTL Mapping II: Outbred Populations
Medical genomics BI420 Department of Biology, Boston College
Presentation transcript:

Recombination Mapping SNP mapping Trait Mapping Recombination Mapping SNP mapping BIO520 Bioinformatics Jim Lund

Why do we care about variations? underlie phenotypic differences cause inherited diseases allow tracking human history (ancient and modern)

Traits Mendelian Quantitative single locus, few alleles high penetrance, high expressivity eg color, enzyme, molecular, genetic diseases (CF, hemophilia…) Quantitative multiple allele, multilocus variable penetrance, expressivity epistasis, environmental effects eg. blood pressure, weight, IQ...

Traits How do we find their basis? Association of variance in trait with variance in gene Genetic linkage

Low LD -> Recombination Basic Concepts Parent 1 Parent 2 A B a b A B a b X A B a b A b A B A B a b OR a B A B A b a b A B a B A b A B a b A B etc… High LD -> No Recombination (r2 = 1) SNP1 “tags” SNP2 Low LD -> Recombination Many possibilities

Mapping Issues Need many arbitrary, polymorphic markers for dense map Molecular markers: RFLP, STS, SNP Need many progeny 100 progeny for 1 cM map 1000/0.1 cM map, 100 kb in mouse Map distance varies (the ratio of kb/cM not constant) centromere suppression inversion suppression

Genetic crosses Model organisms, e.g. Fungi, no problem Humans rare woman who will bear >5, >10 children controlled breeding problematic

Alternate Mapping Pedigree analyses Population-based mapping likelihood estimation The original method, now less common Population-based mapping association studies linkage disequilibrium

Pedigree Analysis Likelihood Method (LOD scores) LOD  3-4, 1/1000 – 1/10000 odds of linkage genome-wide p-value of p < .05 Hard to extend to <1 cM According to Lander and Kruglyak (1995), the genome-wide false-positive rate, αT* is related to the pointwise false positive rate, αT by the equation alphaT* = [C + 9.2 pGT] T is the threshold lod score; C = 23, the number of chromosomes, and G = 33, the total length of the genome in Morgans. The parameter ρ measures the crossover rate, and takes different values depending on the relationship being studied, so that the formula cannot be simply applied to complex pedigrees. For affected sib pairs the formula suggests genome-wide lod score thresholds of 3.6 for IBD testing and 4.0 for IBS testing. Note that the formula applies strictly only to large samples and to stringent thresholds.

Cloning Human Genes Quantitative Traits! Positional Positional/Candidate Candidate Only Functional Quantitative Traits!

Complex diseases Association mapping Disease gene: D, d Marker: M, m M associated with D if the probability of an individual having the disease given that they have allele M is much greater than the chance of having the disease if the individual has allele m. Written as: P(D|M) > P(D|m) Linkage between the gene and marker increases the likelihood of association. Association can be caused by Causation Population subdivision Statistical artifact Linkage disequilibrium D M1 M2 M3 M4 M5 M6

Association Mapping Pedigree sampled Many Meiosis (>104) Resolution: 10-5 Morgans (Kbases) Limited by number of markers r M D 2N generations

Gene Mapping & the single mutation case D M At time t Now

Complicating factors Major Disease Causing Mutation. Minor Disease Causing Mutation + has the disease. + + + + + + Oversampled Non-genetic cause Incomplete penetrance

Alzheimers & Apolipoproteins E

Definition of QTL? A quantitative trait locus (QTL) is the location of individual or multiple loci that affects a trait that is measured on a quantitative (linear) scale. Examples of quantitative traits are blood pressure and grain yield (measured on a balance). These traits are typically affected by more than one gene, and also by the environment. Thus, mapping QTL is not as simple as mapping a single gene that affects a qualitative trait (such as an inborn error of metabolism). QTL is the acronym fro Quantitative Trait Loci, genes which underlie quantitative traits (Gelderman, 1975). http://gnome.agrenv.mcgill.ca/tinker/pgiv/whatis.htm

QTLs-interesting traits Heritability often ~0.5 Traits like: Heart disease Depression Type II diabetes High blood pressure Arthritis Most diseases!

QTLs-simple problems 30,000 markers 2 QTLS near one another P-value=0.01 299 false hits, 1 real one Correct for multiple testing 2 QTLS near one another “ghost” QTL between them

Factors that lead to success in mapping QTLs Simple, easily quantified trait Genes of major effect distinct chromosomal loci Well-defined map Large numbers of progeny inbred outbred

Significance Thresholds by Permutation Churchill and Doerge, 1994 Permute the data (create the null hypothesis) H0: there is no QTL in the tested interval H1: there is QTL in the tested interval Perform interval mapping 3. Repeat (1) and (2) many times Choose Threshold                                                                                                                                                                                                                                                        

Human SNPs About 10 million SNPs exist in human populations where the rarer SNP allele has a frequency of at least 1%. A set of associated SNP alleles in a region of a chromosome is called a "haplotype". SNPs are arranged in groups SNPs within groups show little recombination Nonrandom association of SNPs results in only a few common haplotypes Patterns capture most of the variation in a region The HapMap will describe the common patterns of genetic variation in humans. The HapMap Project will identify the associations between SNPs and identify the SNPs that tag them (tagSNPs).

SNPs identification methods Pairwise sequence comparison Deep resequencing High throughput mismatch detection methods Denaturing high-performance liquid chromatography (DHPLC) Single-strand Conformational Polymorphism (SSCP)

HapMap Blocks of adjacent SNPs that show little recombination are called haplotype blocks. Mean haplotype block length is tens of kb. HapMap project started examining 270 individuals from 4 ethnic groups. Now expanding to a more comprehensive sample. Characterization of haplotype blocks means that fewer SNPs will need to be typed. 500,000 SNPs will identify 90% of haplotype blocks.

HapMap Glossary LD (linkage disequilibrium): For a pair of SNP alleles, it’s a measure of deviation from random association (i.e., a measure of lack of recombination). Measured by D’, r2, LOD Phased haplotypes: Estimated distribution of SNP alleles. Alleles transmitted from Mom are in same chromosome haplotype, while Dad’s form the paternal haplotype. Tag SNPs: Minimum SNP set to identify a haplotype. r2= 1 indicates two SNPs are redundant, so each one perfectly “tags” the other.

HapMap International Consortium HapMap Project Phase 1 Phase 2 Phase 3 Samples & POP panels 269 samples (4 panels) 270 samples 1,115 samples (11 panels) Genotyping centers HapMap International Consortium Perlegen Broad & Sanger Unique QC+ SNPs 1.1 M 3.8 M (phase I+II) 1.6 M (Affy 6.0 & Illumina 1M) Reference Nature (2005) 437:p1299 Nature (2007) 449:p851 Draft Rel. 1 (May 2008)

Phase 3 Samples * Population is made of family trios

SNP databases dbSNP (NCBI) SNP frequency information 12 million human SNPs 5 million validated SNPs http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=overview SNP frequency information Mapped to the current genome build HapMap (haplotypes)

How to use markers to find disease? genome-wide, dense SNP marker map question: how to select from all available markers a subset that captures most mapping information (marker selection, marker prioritization) problem: genotyping cost precludes using millions of markers simultaneously for an association study depends on the patterns of allelic association (haplotypes) in the human genome

The promise for medical genetics within blocks a small number of SNPs are sufficient to distinguish the few common haplotypes  significant marker reduction is possible CACTACCGA CACGACTAT TTGGCGTAT chromosome if the block structure is a general feature of human variation structure, whole-genome association studies will be possible at a reduced genotyping cost blocks this motivated the HapMap project Gibbs et al. Nature 2003

The promise for medical genetics Discover genes contributing to complex diseases Use these markers to test for inherited disease risk Find SNPs associated with drug side effects Make drugs safer. Rescue drugs abandoned due to significant side effects.

Pathway of Drug Development Lead or Target (Clinical Candidate) Animal Model Testing Toxicity, Efficacy Phase I Pre-Clinical (toxicity) Phase II (efficacy) Phase III (efficacy) NDA (new drug application) $100M 2000 $0.5M 100 $0.5M 20 $5M 3 $50M 2 1

Why pharmacogenomics? Where do you find the next profitable drug? The 19/20 drugs that failed AFTER phase 1, but are still efficacious! How do you decrease the cost of clinical trials? Don’t enroll people of the “wrong” genotype! Only give drugs to patients likely to benefit and at a low genetic risk of side effects!