1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North.

Slides:



Advertisements
Similar presentations
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Advertisements

Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
METHODS FOR HAPLOTYPE RECONSTRUCTION
Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Origins of Host Specific Populations of Puccinia triticina Revealed by SNP Markers (Preliminary) M. Liu and J. A. Kolmer USDA-ARS Cereal Disease Laboratory,
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
BGRS 2006 SEARCH FOR MULTI-SNP DISEASE ASSOCIATION D. Brinza, A. Perelygin, M. Brinton and A. Zelikovsky Georgia State University, Atlanta, GA, USA 123.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Methods in genome wide association studies. Norú Moreno
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Genome Wide Haplotype analyses of human.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Forest Approach to Genetic Studies Heping Zhang Presented at IMS Genomic Workshop, NUS Singapore, June 8, 2009 And Xiang Chen, Ching-Ti Liu, Minghui Wang,
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Statistical Analysis of Candidate Gene Association Studies (Categorical Traits) of Biallelic Single Nucleotide Polymorphisms Maani Beigy MD-MPH Student.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
The Haplotype Blocks Problems Wu Ling-Yun
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Gil McVean Department of Statistics
Of Sea Urchins, Birds and Men
Linking Genetic Variation to Important Phenotypes
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Caroline Durrant, Krina T. Zondervan, Lon R
Forest Approach to Genetic Studies
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Presentation transcript:

1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North Carolina State University

2 Simple Disorder vs. Complex Disorder Peltonen and McKusick (2001). Science

3 Complex Disorders  Liability genes = genes containing variants increasing disease liability  Goal: look for such genes  Rely more on the epidemiological evidences Association analysis  Case-control studies  Detect liability genes by searching for association between disease status and genetic variants

4 Genetic Markers  Instead of studying the whole DNA sequences, we look at a subset of them---genetic markers  SNP: Single Nucleotide Polymorphism Pro: dense; bp Con: binary variants Resolved by considering adjacent SNPs jointly

5 Haplotype-based Association Analysis  Haplotype = maker sequence  Haplotye-based association analysis TCTC CACA CaseControl Hap 1 Hap 2 Hap 3. Hap k T C T C C A C A

6 Haplotype-based Association Analysis  Problem: findings are not replicable Under-powered (Lohmueller et. al 2003; Neal and Sham 2004 )  Solution: 1. Use large samples (Lohmueller et. al 2003) 2. Reduce the dimension of the parameter space

7 Dimensionality  Haplotype distribution within a block Daly et al. (2001) Nature Genetics  Method I: Truncating : tag SNPs

8  Evolutionary tree of haplotypes  Minimize the haplotype distance within clusters Method II: Clustering (Molitor et al. 2003; Durrant et al. 2004)

9 Method II: Clustering

Method II: Clustering

11  Observed Hap ={ 000, 001, 010, 100,110, 101, 011, 111 } Method III: Cladistic Grouping (Templeton 1995) (Seltman et al. 2003) Cladogram

12  Include all samples  Incorporate both haplotype distance and age High frequency  ancient (Crandall & Templeton 1995) Low frequency  young  Allow uncertainty in inferring the underlying evolutionary relationship Desired Features

13 Possible Hap = { 000, 001, 010, 100, 110, 101, 011, 111 } { 110 }  (2)   * (i) t =  (i) t +  (i+1) t B (i+1 ) { 000, 010, 111, 100 } { 001, 011, 101 }  (1)  (0) B (2) B (1) Proposed Approach: Cladistic Clustering p 1-p q1q1 q2q2 1-q 1 -q 2   * t =  t  B =  (0) t  (1) t  (2) t B (2) B (1) B (1) I

14 Issues 1.Determine major nodes  (0) 2.Construct conditional allocating matrix B (i)

{ 110 } { 000, 010, 100, 111 } { 001, 011, 101 }  B (2) = C = (           ) c c c c  (2)  (1)  (0) Conditional Allocating Matrix B ( i )   * (1) t =  (2) t B (2) +  (1)t  [0,1  likelihood of one step movement B (2)           

16  B (1) =   * t =  (0) t +  (1) t B (1) +  (2) t B (2) B (1) Conditional Allocating Matrix B ( i )   111    010 

17 Determine    Information criteria Net Information (Shannon’s Information content)

18 Net Information and  (0)

19 Association Analysis Based on  *  Coalescent simulation (Hudson’s 2002) : Prevalence = 0.01 Relative Risk = 2 Frequencies of liability Allele = (0.1, 0.3, 0.5) Location of liability allele = ( hot spot, blocky, very blocky ) Draw 200 cases and 200 controls  Test of homogeneity based on  * cs and  * cn

20 Power and Type I error Gene Pelc Gene IL01RB

21 Summary  Provide a mechanism of cladistic clustering by  *  B Combine the ideas of Truncating and Clustering Based on evolutionary relationship without reconstruct cladogram Incorporate haplotype frequencies and distance in cluster assignment One-step conditional regrouping can accommodate multiple step regrouping: self-repeating, algebraic multiplicative Reserve  (0) based on information criteria   * increases test efficiency Increased power even for large samples and haplotypes in block regions

22 End of Slides

23 Approach  Two stages: Stage I: (Where) Identify the susceptible regions across genome (multiple testing problem) Approaches based on haplotype similarity Stage II: (Which) Determine and pinpoint the specific liability variants Study individual effects of groups of haplotypes

24 I. Haplotype Similarity Van Der Meulen and te Meerman 1997; Bourgain et al ; Tzeng et al. 2003ab Search for extra haplotype sharing among cases Pro: 1 degree of freedom Con: not study individual haplotype effect Usage: good for genome screening Strategies of Reducing Degrees of Freedom

25 Strategies of Reducing Degrees of Freedom Freq (%) 1AC A CCCCCGGG C C G A CT T G.TATTA A C. T.T.A...A A T C C. T.T.A T T G.TATTA.... 1ACG 2.A. 3T.. 4..A 5TA. (1) T.. (6) T.. tag SNP II.Haplotype Tagging (Johnson et al. 2001) Pro: efficiently capture the major diversity Con: discard rare haplotypes

26 III. Haplotype Clustering Molitor et al. 2003; Seltman et al 2001, 2003; Durrant et al 2004 Similar haplotypes induce similar liability effect Cluster haplotypes and perform analysis based on clusters of haplotypes Pro: incorporating all data Con: may cluster two major haplotypes in the same group Strategies of Reducing Degrees of Freedom

27 Approach  Two stages: Stage I: (Where) Identify the susceptible regions across genome (multiple testing problem)  Approaches based on haplotype similarity Stage II: (Which) Determine and pinpoint the specific liability variants  Study individual effects of groups of haplotypes

28 Haplotype Grouping  Focus on Stage II  Combine the pros of haplotype tagging and clustering

29 Power and Type I error Gene Pelc Gene IL01RB