Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg

Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg debg@hms.harvard.edu

What is a comparative map?

Why construct comparative maps? l Identify & isolate genes Crops: drought resistance, yield, nutrition... Human: disease genes, drug response,… l Infer ancestral relationships l Discover principles of evolution Chromosome Gene family l “key to understanding the human genome”

Why automate? l Time consuming, laborious Needs to be redone frequently l Codify a common set of principles l Nadeau and Sankoff: warn of “arbitrary nature of comparative map construction”

Definitions l Marker: identifiable chromosomal locus l Homology: genes with common ancester l Homeology: chromosomal regions derived from a common ancestral linkage group l Synteny: loci on the same chromosome l Colinearity: syntenic regions with conserved gene order

Input/Output l Input: genetic maps of 2 species marker/gene correspondences (homologs) l Output: a comparative map homeologies identified

Map construction 3S 8L 10L 3L Maize 1 (target), Rice (base) Wilson et al. Genetics 1999 Go from this to this

Chromosome labeling Maize 1 (target), Rice (base) Wilson et al. Genetics 1999 Maize 1Rice 3S 8L 10L 3L

A natural model? Maize 1 (target), Rice (base) Wilson et al. Genetics 1999 Maize 1Rice 3S 8L 10L 3L

Scoring 10L 3L s m

Assumptions l Accept published marker order l All linkage groups of base are unique l Simplistic homeology criteria l At least one homeologous region

A natural model?

Dynamic programming l i = location of homolog to marker i S[i,a] = penalty (score) for an optimal labeling of the submap from marker i to the end, when labeling begins with label a a 1...i...n

Recurrence relation S[n,a] = m  (a, l n ) S[i,a] = m  (a, l i ) + min (S[i+1,b] + s  (a,b) ) bLbL ab...ii+1...n l i l i+1 l n a...n... l n

Problem with linear model s = 2 a-b-c motif: abc score: 2s = 4 aaabbbccc a-b-a motif: a score: 3m = 3 aaabbbaaa

The stack model l Segment at top of the stack can be: pushed (remembered), later popped replaced Push and replace cost s -- pop is free. bbb fe d c a c

Scoring s 9L 7L “free” pop m m m uaz265a (7L) isu136 (2L) isu151 (7L) rz509b (7L) cdo59c (7L) rz698c (9L) bcd1087a (9L) rz206b (9L) bcd1088c (9L) csu40 (3S) cdo786a (9L) csu154 (7L) isu113a (7L) csu17 (7L) cdo337 (3L) rz530a (7L)

Dynamic programming S[i,j,a] = score for an optimal labeling of: submap from marker i to marker j when labeling begins with label a -- i.e., marker i is labeled a a 1...i... j...n

Recurrence relation l S[i,i,a] = m  (a, l i ) S[i,j,a] = min: m  (a, l i ) + min (S[i+1,j,b] + s  (a,b) ) min S[i,k,a] + S[k+1,j,a] i<k<j bLbL a 1...i...k +1...j...n a 1...ii+1...n ab 1...ii+1...n

Results: infers evolutionary events Maize 1 (target) Rice (base) Wilson et al. Stack

Problem: Incomplete input l Gene order not always fully resolved. l Co-located genes can be ordered to give most parsimonious labeling. 8p 19p = 8p 19p

The reordering algorithm l Uses a compression scheme Within a megalocus, group genes by location of related gene. Order these groups First, last groups interact with nearby genes Any ordering of internal groups is equally parsimonious

The reordering algorithm

Definitions  extended to distance to a set A of labels 0 if a  A, 1 otherwise S = the set of indices of supernode start elements For simplicity, call supernode i  S  (a, A) =

Definitions For i  S: l n i = # markers in i l n i (a) = # markers in i with a homolog on a l l i = set of labels matching markers in i l i = {a  L | n i (a)  1},

Definitions l p i (c) gives mismatched marker and segment boundary penalties for label c p i (c) = s:m n i (c)  s m n i (c) :m n i (c)  s

Definitions p(i,a,b) gives the total mismatched marker and segment boundary penalties attributed to “hidden markers”  (p i (c)) + m  i (a,b): for i  S, a  b p(i,a,b) =  (m n i (c)) + m  i (a,b): for i  S, a=b 0: otherwise. c  a,b c  a

Definitions For i  S: l  i (a,b) = # labels in {a,b} without matching marker in i  i (a,b) =  (a, l i ) +  (b, l i )  i (a,b)  {0,1,2}

Definitions l  i (a,b) corrects if mismatch marker penalties assigned twice for same marker; in the recurrence and in p(i,a,b) l For example:  i (a,b) = 0 if  i (a,b) = 0 (if a, b are both represented in supernode)  i (a,a) = -2 if  i (a,a) > 0 (if a is not represented in supernode)

Recurrence relation S[i,i,a] = m  (a, l i ) S[i,j,a] = min: m  (a, l i ) + min (S[i+1,j,b] + s  (a,b) + p(i,a,b)) min S[i,k,a] + S[k+1,j,a] i<k<j k  S bLbL

Results: Fewer mismatches stackreordering Mouse 5 (target) Human (base)

Results: Mismatches placed between segments stackreordering Mouse 8 (target) Human (base)

Results: Detects new segments stackreordering Mouse 13 (target) Human (base)

Summary l Finds optimal comparative map Arranges markers in most parsimonious way l First algorithm to use megalocus data l Fast, objective, simple to use l Biologically meaningful results

Summary l Global view l Biologically meaningful results Provides testable hypotheses l Robust not species-specific high/low resolution, genetic/physical maps stable to errors in marker order

Future Directions l Algorithmic extensions 3 rd species polyploidy search for ancient duplications l Deduce history of evolutionary events makes genome rearrangement measures tractable and robust infer common ancestor

Future Directions l Block-segmental sequence comparisons non-local sequence alignment protein domains l 2D block-segmental comparisons comparison of regulatory networks image processing

Acknowledgments l Jon Kleinberg l Susan McCouch l Chris Pelkie l Sandra Harrington l Sam Cartinhour l Dave Schneider l NSF l AAUW l David and Lucile Packard Foundation l USDA l Cooperative State Research Education and Extension Service l ONR

Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg

Similar presentations

Presentation on theme: "Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg

Similar presentations

Presentation on theme: "Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg"— Presentation transcript:

Similar presentations

About project

Feedback