Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003.

Similar presentations


Presentation on theme: "Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003."— Presentation transcript:

1 Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003

2 CMSC 838T – Presentation Introduction u Goals  Link Genes to specific loci in the genome  Decrease time and memory requirements through parallelization u Motivation  Locate genes for specific phenotypes  Test for inherited diseases and risk factors  Gene therapy

3 CMSC 838T – Presentation Talk Overview u Introduction u Talk Overview u Genetic Linkage Problem u Previous Work u Parallel Genehunter u Evaluation u Observations

4 CMSC 838T – Presentation Genetic Linkage Problem u Sexual Reproduction  Offspring created by two haploid gametes  Gametes are produced from diploid/polyploid cells during meiosis www.blc.arizona.edu/courses/181gh/rick/genetics1/

5 CMSC 838T – Presentation Genetic Linkage Problem u Recombination occurs in two ways 1. Random segregation of chromatids 2 x 23 human chromosomes => 2 23 possible haploid combinations Genes on different chromosomes recombine with probability www.gen.umn.edu/faculty_staff/hatch/1131/

6 CMSC 838T – Presentation Genetic Linkage Problem u Recombination occurs in two ways 1. Random segregation of chromatids 2. Crossover between homologous pairs of chromosomes Genes on the same chromosome recombine with probability depending on their distance and location on the chromosome

7 CMSC 838T – Presentation Genetic Linkage Problem Given  This model of recombination  Data for a particular pedigree (family) l Phenotype information for each individual l Genetic markers for each individual  Recombination frequencies for each pair of markers Can we apply probabilistic methods to  Reconstruct the inheritance patterns  Link phenotypes to the markers

8 CMSC 838T – Presentation Previous Work u Fisher, Haldane, Smith, Morton (1935 - 1955) Methods to infer genetic maps using maximum likelihood estimators u Elston, Stewart (1971) Genetic Linkage Algorithm l Linear in pedigree size l Exponential in number of markers u Lander, Green (1987) Genetic Linkage Algorithm l Linear in number of markers l Exponential in pedigree size

9 CMSC 838T – Presentation Previous Work u Genehunter (2001)  Implementation of Lander & Green  Analyzes a pedigree containing n non-founders  The inheritance of a gene by one non-founder can be summarized by two bits  The entire pedigree’s inheritance pattern can be summarized by a 2n bits

10 CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 1 : For each marker, calculate the probability of each of the possible inheritance pattern. Store probabilities in a vector of size 2 2n 0: grandfather’s chromatid 1: grandmother’s chromatid Pr([0,0]) =.5 Pr([0,1]) =.5 Pr([1,0]) = 0 Pr([1,1]) = 0

11 CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 2 : For each marker, calculate the conditional probably of each inheritance pattern conditional on all of the markers to the left, and to the right For two markers’ inheritance vectors, each disagreeing bit requires a crossover event The probability of transitioning between inheritance vectors i, j differing in d bits is

12 CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 2 : For each marker, calculate the conditional probably of each inheritance pattern conditional on all of the markers to the left, and to the right M i,j = cost of transitioning between inheritance vectors i&j P 1, P 2 = probability vectors for every inheritance pattern given markers 1 and 2 respectively P 2|1 = P 2 (M P 1 ) Calculate the probabilities of each marker’s inheritance conditional on all others by Markov Chain or FFT convolution

13 CMSC 838T – Presentation Previous Work u 3 steps of Genehunter: Step 3 : For each marker, calculate the probability of unknown gene being located at specific locations Hypothesizes phenotype has a gene located at a particular location. By default tries 5 evenly-spaced locations between consecutive pairs of markers Calculates P D, the probabilities of each inheritance pattern for based on this phenotype (as in step 1) For a location between markers i&i+1, p= P D P x|1...i P x|i+1...m u Space Requirement: O(2 2n ) O(2 2n-f ) exploiting symmetry of f founders u Time Requirement: O(m2 2n ) O(m2 2n-f ) with f founders

14 CMSC 838T – Presentation Parallel Genehunter u Approach  Parallelize the 3 Genehunter steps separately  Divides each 2 2n -sized marker vector evenly among the P processors l allows greater distribution of memory than assigning O(m/P) entire vectors to each processor

15 CMSC 838T – Presentation Parallel Genehunter u Parallelization of step 1 For each marker, calculate the probability of each of the possible inheritance pattern Each processor calculates the probabilities for a particular 2 2n / P inheritance patterns for ever marker

16 CMSC 838T – Presentation Parallel Genehunter u Parallelization of step 2 For each marker, calculate the conditional probably of each inheritance pattern conditional on all of the markers to the left, and to the right  FFT convolution l As in serial genehunter, 2 2n x 2 2n matrix-vector multiplication is replaced FFT-based convolution: 1. 2 forward 1D FFTs on 2 2n -length vectors 2. element-by-element multiplication 3. inverse FFT l Each 1D FFT is equivalent to a 2D FFT on a P x 2 2n / P matrix l There are well-known distributed algorithms for this FFT using all-to-all communication.  Dot Product in P 2|1 = P 2 (M P 1 ) l trivially parallelized: each processor has the same portion of each vector.

17 CMSC 838T – Presentation Parallel Genehunter u Parallelization of step 3 For each marker, calculate the probability of unknown gene being located at specific locations  computing P x|1...i and P x|i+1...m l FFTs parallelized as in step 2  Final dot product p = (P D P x|1...i P x|i+1...m ) l parallelized as in step 2 u each processor holds all the same portion of each vector

18 CMSC 838T – Presentation Evaluation u Experimental Environment  Input data sets l 51 family member pedigree l {19,21,24}-bit data sets (# bits = 2n-f )  Computing Facilities l Cplant Cluster (Sandia National Laboratories) u DEC Alpha EV6 processors u Myrinet connection

19 CMSC 838T – Presentation Evaluation u Runtimes For 19,21 and 24 bit problems

20 CMSC 838T – Presentation Evaluation u Runtimes For 19,21 and 24 bit problems

21 CMSC 838T – Presentation Observations Pro: Performs Genehunter computation exactly Pro: Effective for “multipoint linkage” of phenotypes Con: Old-fashioned compared to protein-based methods (?) Pro: Distributes memory requirements Pro: More computers allows larger feasible inputs Con: Experiments based on 1 pedigree Pro: Efficient parallelization up to 32 or 64 processors Con: Only allows pedigrees to grow by only 3 or 4 individuals in equal time

22 CMSC 838T – Presentation References u Genetic Recombination Dr. Craig Woodworth, Genetic Recombination in Eukaryotes, Lecture Notes, (www.clarkson.edu/class/by214/powerpoint) u Genehunter K. Markianos, M.J. Daly, & L. Kruglyak. Efficient Multipoint Linkage Analysis Through Reduction of Inheritance Space. American Journal of Human Genetics 68, 2001. u Parallel Genehunter G. Conant, S. Plimpton, W. Old, A. Wagner, P. Fain, & G. Heffelfinger. Parallel Genehunter: Implementation of a Linkage Analysis Package for Distributed- Memory Architectures, Proceedings of the First IEEE Workshop on High Performance Computational Biology, International Parallel and Distributed Computing Symposium, 2002.

23 CMSC 838T – Presentation Questions?


Download ppt "Parallel Genehunter: Implementation of a linkage analysis package for distributed memory architectures Michael Moran CMSC 838T Presentation May 9, 2003."

Similar presentations


Ads by Google