Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.

Slides:



Advertisements
Similar presentations
Crossing Over in Meiosis
Advertisements

Linkage analysis Nasir Mehmood Roll No: Linkage analysis Linkage analysis is statistical method that is used to associate functionality of genes.
Mapping genes with LOD score method
Key Terms Foldable CH. 5 Heredity
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Fig. 4-1 Chapter 4 overview. Genetic recombination: mixing of genes during gametogenesis that produces gametes with combinations of genes that are different.
Instructor: Dr. Jihad Abdallah Linkage and Genetic Mapping
6.6 Meiosis and Genetic Variation KEY CONCEPT Independent assortment and crossing over during meiosis result in genetic diversity.
Gene Linkage and Genetic Mapping
Chromosome Mapping in Eukaryotes
Parents can produce many types of offspring
Linkage and Gene Mapping. Mendel’s Laws: Chromosomes Locus = physical location of a gene on a chromosome Homologous pairs of chromosomes often contain.
AN INTRODUCTION TO RECOMBINATION AND LINKAGE ANALYSIS Mary Sara McPeek Presented by: Yue Wang and Zheng Yin 11/25/2002.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Meiosis and Genetic Variation
Chapter 11 Review Section Assessments.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Yuan CMSC 838 Presentation Parallelisation of IBD computation for determining genetic disease map.
Mapping Basics MUPGRET Workshop June 18, Randomly Intermated P1 x P2  F1  SELF F …… One seed from each used for next generation.
Chapter 15: Chromosomal Basis of Inheritance AP Biology.
6.6 Meiosis and Genetic Variation KEY CONCEPT Independent assortment and crossing over during meiosis result in genetic diversity.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Section 2: Sexual Reproduction
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
9.6 Meiosis increases genetic variation among offspring
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Non-Mendelian Genetics
Linkage & Gene Mapping in Eukaryotes
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 15: Linkage Analysis VII
6.6 Meiosis and Genetic Variation Independent assortment and crossing over during meiosis result in genetic diversity.
Population Dynamics Humans, Sickle-cell Disease, and Malaria How does a population of humans become resistant to malaria?
Parallel Genetic Algorithms By Larry Hale and Trevor McCasland.
Genetics – Study of heredity is often divided into four major subdisciplines: 1. Transmission genetics, deals with the transmission of genes from generation.
Genetics Lecture II Meiosis The formation of gametes aka. Gametogenesis.
Sexual Reproduction and Genetics Section 1: Meiosis Section 2: Mendelian Genetics Section 3: Gene Linkage and Polyploidy Chapter 10 Sexual Reproduction.
1 THE WORK OF GREGOR MENDEL OBJECTIVES: 11.1 Describe how Mendel studied inheritance in peas. Summarize Mendel’s conclusion about inheritance. Explain.
6.6 Meiosis and Genetic Variation KEY CONCEPT Independent assortment and crossing over during meiosis result in genetic diversity.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Constrained Hidden Markov Models for Population-based Haplotyping
Meiosis and Genetic Variation
Vocabulary Words for section 6.2 These words are highlighted in red.
Sexual reproduction creates unique combinations of genes.
Sexual reproduction creates unique combinations of genes.
Sexual reproduction creates unique combinations of genes.
Linkage Genes that are physically located on the same chromosome are said to be “linked”. Linked genes are said to be “mapped” to the same chromosome.
Sexual Reproduction and Genetics
Sexual reproduction creates unique combinations of genes.
Homework #4 is due 12/4/07 (only if needed)
Sexual reproduction creates unique combinations of genes.
Key Idea: Meiosis differs from Mitosis…Why
Sexual reproduction creates unique combinations of genes.
Meiosis.
Meiosis.
Gene mapping March 3, 2017.
IBD Estimation in Pedigrees
Sexual reproduction creates unique combinations of genes.
Sexual reproduction creates unique combinations of genes.
Sexual reproduction creates unique combinations of genes.
Sexual reproduction creates unique combinations of genes.
Sexual reproduction creates unique combinations of genes.
Gene Linkage and Crossing Over
Sexual reproduction creates unique combinations of genes.
Gene Variation Chapter 6.6.
Presentation transcript:

Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003

CMSC 838T – Presentation Introduction u Goals  Link Genes to specific loci in the genome  Decrease time and memory requirements through parallelization u Motivation  Locate genes for specific phenotypes  Test for inherited diseases and risk factors  Gene therapy

CMSC 838T – Presentation Talk Overview u Introduction u Talk Overview u Genetic Linkage Problem u Previous Work u Parallel Genehunter u Evaluation u Observations

CMSC 838T – Presentation Genetic Linkage Problem u Sexual Reproduction  Offspring created by two haploid gametes  Gametes are produced from diploid/polyploid cells during meiosis

CMSC 838T – Presentation Genetic Linkage Problem u Recombination occurs in two ways 1. Random segregation of chromatids 2 x 23 human chromosomes => 2 23 possible haploid combinations Genes on different chromosomes recombine with probability

CMSC 838T – Presentation Genetic Linkage Problem u Recombination occurs in two ways 1. Random segregation of chromatids 2. Crossover between homologous pairs of chromosomes Genes on the same chromosome recombine with probability depending on their distance and location on the chromosome

CMSC 838T – Presentation Genetic Linkage Problem Given  This model of recombination  Data for a particular pedigree (family) l Phenotype information for each individual l Genetic markers for each individual  Recombination frequencies for each pair of markers Can we apply probabilistic methods to  Reconstruct the inheritance patterns  Link phenotypes to the markers

CMSC 838T – Presentation Previous Work u Fisher, Haldane, Smith, Morton ( ) Methods to infer genetic maps using maximum likelihood estimators u Elston, Stewart (1971) Genetic Linkage Algorithm l Linear in pedigree size l Exponential in number of markers u Lander, Green (1987) Genetic Linkage Algorithm l Linear in number of markers l Exponential in pedigree size

CMSC 838T – Presentation Previous Work u Genehunter (2001)  Implementation of Lander & Green  Analyzes a pedigree containing n non-founders  The inheritance of a gene by one non-founder can be summarized by two bits  The entire pedigree’s inheritance pattern can be summarized by a 2n bits

CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 1 : For each marker, calculate the probability of each of the possible inheritance pattern. Store probabilities in a vector of size 2 2n 0: grandfather’s chromatid 1: grandmother’s chromatid Pr([0,0]) =.5 Pr([0,1]) =.5 Pr([1,0]) = 0 Pr([1,1]) = 0

CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 2 : For each marker, calculate the conditional probably of each inheritance pattern conditional on all of the markers to the left, and to the right For two markers’ inheritance vectors, each disagreeing bit requires a crossover event The probability of transitioning between inheritance vectors i, j differing in d bits is

CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 2 : For each marker, calculate the conditional probably of each inheritance pattern conditional on all of the markers to the left, and to the right M i,j = cost of transitioning between inheritance vectors i&j P 1, P 2 = probability vectors for every inheritance pattern given markers 1 and 2 respectively P 2|1 = P 2 (M P 1 ) Calculate the probabilities of each marker’s inheritance conditional on all others by Markov Chain or FFT convolution

CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 3 : For each marker, calculate the probability of unknown gene being located at specific locations Hypothesizes phenotype has a gene located at a particular location. By default tries 5 evenly-spaced locations between consecutive pairs of markers Calculates P D, the probabilities of each inheritance pattern for based on this phenotype (as in step 1) For a location between markers i&i+1, p= P D P x|1...i P x|i+1...m u Space Requirement: O(2 2n ) O(2 2n-f ) exploiting symmetry of f founders u Time Requirement: O(m2 2n ) O(m2 2n-f ) with f founders

CMSC 838T – Presentation Parallel Genehunter u Approach  Parallelize the 3 Genehunter steps separately  Divides each 2 2n -sized marker vector evenly among the P processors l allows greater distribution of memory than assigning O(m/P) entire vectors to each processor

CMSC 838T – Presentation Parallel Genehunter u Parallelization of step 1 For each marker, calculate the probability of each of the possible inheritance pattern Each processor calculates the probabilities for a particular 2 2n / P inheritance patterns for ever marker

CMSC 838T – Presentation Parallel Genehunter u Parallelization of step 2 For each marker, calculate the conditional probably of each inheritance pattern conditional on all of the markers to the left, and to the right  FFT convolution l As in serial genehunter, 2 2n x 2 2n matrix-vector multiplication is replaced FFT-based convolution: 1. 2 forward 1D FFTs on 2 2n -length vectors 2. element-by-element multiplication 3. inverse FFT l Each 1D FFT is equivalent to a 2D FFT on a P x 2 2n / P matrix l There are well-known distributed algorithms for this FFT using all-to-all communication.  Dot Product in P 2|1 = P 2 (M P 1 ) l trivially parallelized: each processor has the same portion of each vector.

CMSC 838T – Presentation Parallel Genehunter u Parallelization of step 3 For each marker, calculate the probability of unknown gene being located at specific locations  computing P x|1...i and P x|i+1...m l FFTs parallelized as in step 2  Final dot product p = (P D P x|1...i P x|i+1...m ) l parallelized as in step 2 u each processor holds all the same portion of each vector

CMSC 838T – Presentation Evaluation u Experimental Environment  Input data sets l 51 family member pedigree l {19,21,24}-bit data sets (# bits = 2n-f )  Computing Facilities l Cplant Cluster (Sandia National Laboratories) u DEC Alpha EV6 processors u Myrinet connection

CMSC 838T – Presentation Evaluation u Runtimes For 19,21 and 24 bit problems

CMSC 838T – Presentation Evaluation u Runtimes For 19,21 and 24 bit problems

CMSC 838T – Presentation Observations Pro: Performs Genehunter computation exactly Pro: Effective for “multipoint linkage” of phenotypes Con: Old-fashioned compared to protein-based methods (?) Pro: Distributes memory requirements Pro: More computers allows larger feasible inputs Con: Experiments based on 1 pedigree Pro: Efficient parallelization up to 32 or 64 processors Con: Only allows pedigrees to grow by only 3 or 4 individuals in equal time

CMSC 838T – Presentation References u Genetic Recombination Dr. Craig Woodworth, Genetic Recombination in Eukaryotes, Lecture Notes, ( u Genehunter K. Markianos, M.J. Daly, & L. Kruglyak. Efficient Multipoint Linkage Analysis Through Reduction of Inheritance Space. American Journal of Human Genetics 68, u Parallel Genehunter G. Conant, S. Plimpton, W. Old, A. Wagner, P. Fain, & G. Heffelfinger. Parallel Genehunter: Implementation of a Linkage Analysis Package for Distributed- Memory Architectures, Proceedings of the First IEEE Workshop on High Performance Computational Biology, International Parallel and Distributed Computing Symposium, 2002.

CMSC 838T – Presentation Questions?