Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg

Similar presentations


Presentation on theme: "Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg"— Presentation transcript:

1 Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg debg@hms.harvard.edu

2 What is a comparative map?

3 Why construct comparative maps? l Identify & isolate genes Crops: drought resistance, yield, nutrition... Human: disease genes, drug response,… l Infer ancestral relationships l Discover principles of evolution Chromosome Gene family l “key to understanding the human genome”

4 Why automate? l Time consuming, laborious Needs to be redone frequently l Codify a common set of principles l Nadeau and Sankoff: warn of “arbitrary nature of comparative map construction”

5 Definitions l Marker: identifiable chromosomal locus l Homology: genes with common ancester l Homeology: chromosomal regions derived from a common ancestral linkage group l Synteny: loci on the same chromosome l Colinearity: syntenic regions with conserved gene order

6 Input/Output l Input: genetic maps of 2 species marker/gene correspondences (homologs) l Output: a comparative map homeologies identified

7 Map construction 3S 8L 10L 3L Maize 1 (target), Rice (base) Wilson et al. Genetics 1999 Go from this to this

8 Chromosome labeling Maize 1 (target), Rice (base) Wilson et al. Genetics 1999 Maize 1Rice 3S 8L 10L 3L

9 A natural model? Maize 1 (target), Rice (base) Wilson et al. Genetics 1999 Maize 1Rice 3S 8L 10L 3L

10 Scoring 10L 3L s m

11 Assumptions l Accept published marker order l All linkage groups of base are unique l Simplistic homeology criteria l At least one homeologous region

12 A natural model?

13

14

15

16 Dynamic programming l i = location of homolog to marker i S[i,a] = penalty (score) for an optimal labeling of the submap from marker i to the end, when labeling begins with label a a 1...i...n

17 Recurrence relation S[n,a] = m  (a, l n ) S[i,a] = m  (a, l i ) + min (S[i+1,b] + s  (a,b) ) bLbL ab...ii+1...n l i l i+1 l n a...n... l n

18 Problem with linear model s = 2 a-b-c motif: abc score: 2s = 4 aaabbbccc a-b-a motif: a score: 3m = 3 aaabbbaaa

19 The stack model l Segment at top of the stack can be: pushed (remembered), later popped replaced Push and replace cost s -- pop is free. bbb fe d c a c

20 Scoring s 9L 7L “free” pop m m m uaz265a (7L) isu136 (2L) isu151 (7L) rz509b (7L) cdo59c (7L) rz698c (9L) bcd1087a (9L) rz206b (9L) bcd1088c (9L) csu40 (3S) cdo786a (9L) csu154 (7L) isu113a (7L) csu17 (7L) cdo337 (3L) rz530a (7L)

21 Dynamic programming S[i,j,a] = score for an optimal labeling of: submap from marker i to marker j when labeling begins with label a -- i.e., marker i is labeled a a 1...i... j...n

22 Recurrence relation l S[i,i,a] = m  (a, l i ) S[i,j,a] = min: m  (a, l i ) + min (S[i+1,j,b] + s  (a,b) ) min S[i,k,a] + S[k+1,j,a] i<k<j bLbL a 1...i...k +1...j...n a 1...ii+1...n ab 1...ii+1...n

23 Results: infers evolutionary events Maize 1 (target) Rice (base) Wilson et al. Stack

24 Problem: Incomplete input l Gene order not always fully resolved. l Co-located genes can be ordered to give most parsimonious labeling. 8p 19p = 8p 19p

25 The reordering algorithm l Uses a compression scheme Within a megalocus, group genes by location of related gene. Order these groups First, last groups interact with nearby genes Any ordering of internal groups is equally parsimonious

26 The reordering algorithm

27

28 Definitions  extended to distance to a set A of labels 0 if a  A, 1 otherwise S = the set of indices of supernode start elements For simplicity, call supernode i  S  (a, A) =

29 Definitions For i  S: l n i = # markers in i l n i (a) = # markers in i with a homolog on a l l i = set of labels matching markers in i l i = {a  L | n i (a)  1},

30 Definitions l p i (c) gives mismatched marker and segment boundary penalties for label c p i (c) = s:m n i (c)  s m n i (c) :m n i (c)  s

31 Definitions p(i,a,b) gives the total mismatched marker and segment boundary penalties attributed to “hidden markers”  (p i (c)) + m  i (a,b): for i  S, a  b p(i,a,b) =  (m n i (c)) + m  i (a,b): for i  S, a=b 0: otherwise. c  a,b c  a

32 Definitions For i  S: l  i (a,b) = # labels in {a,b} without matching marker in i  i (a,b) =  (a, l i ) +  (b, l i )  i (a,b)  {0,1,2}

33 Definitions l  i (a,b) corrects if mismatch marker penalties assigned twice for same marker; in the recurrence and in p(i,a,b) l For example:  i (a,b) = 0 if  i (a,b) = 0 (if a, b are both represented in supernode)  i (a,a) = -2 if  i (a,a) > 0 (if a is not represented in supernode)

34 Recurrence relation S[i,i,a] = m  (a, l i ) S[i,j,a] = min: m  (a, l i ) + min (S[i+1,j,b] + s  (a,b) + p(i,a,b)) min S[i,k,a] + S[k+1,j,a] i<k<j k  S bLbL

35 Results: Fewer mismatches stackreordering Mouse 5 (target) Human (base)

36 Results: Mismatches placed between segments stackreordering Mouse 8 (target) Human (base)

37 Results: Detects new segments stackreordering Mouse 13 (target) Human (base)

38 Summary l Finds optimal comparative map Arranges markers in most parsimonious way l First algorithm to use megalocus data l Fast, objective, simple to use l Biologically meaningful results

39 Summary l Global view l Biologically meaningful results Provides testable hypotheses l Robust not species-specific high/low resolution, genetic/physical maps stable to errors in marker order

40 Future Directions l Algorithmic extensions 3 rd species polyploidy search for ancient duplications l Deduce history of evolutionary events makes genome rearrangement measures tractable and robust infer common ancestor

41 Future Directions l Block-segmental sequence comparisons non-local sequence alignment protein domains l 2D block-segmental comparisons comparison of regulatory networks image processing

42 Acknowledgments l Jon Kleinberg l Susan McCouch l Chris Pelkie l Sandra Harrington l Sam Cartinhour l Dave Schneider l NSF l AAUW l David and Lucile Packard Foundation l USDA l Cooperative State Research Education and Extension Service l ONR


Download ppt "Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg"

Similar presentations


Ads by Google