Presentation is loading. Please wait.

Presentation is loading. Please wait.

Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail.

Similar presentations


Presentation on theme: "Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail."— Presentation transcript:

1 Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

2 Darwin’s Finches 2 and Coco

3 The Father of All Dot Plots Algorithmic Functions of Computational Biology – Course 1 Professor Istrail The Human Genome

4 The Synteny Problem  Between distant species can reveal function Conservation reveals selective pressure  Between near species Conservation reveals evolutionary history  Between similar or the same species Recent events in subpopulations Phenotypic differences Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

5 Matching, Chaining, Extension Extension Phase Chaining Phase Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Matching Phase

6 Dot Plots 101  a,b,c,d stand for letters A,B,C,D for words  Where letters match, put a dot  Where words match, put a line (words can be rc-ed) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

7 Dot Plots 101  When words line up  Reversed  Misplaced  Something gained (relative to horizontal)  Something lost (relative to horizontal) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

8 Some large reversals in GP Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

9 NCBI has more of the centromere than anyone else (or is that N’s?) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

10 Many reversals in GP, a piece of the end is re-ordered to the middle, celera assemblies boringly good. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

11 Again everyone misses the first 10MB (or are those N’s) of NCBI31 Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

12 Rube Goldberg’s Innovation GENOMIC REGULATORY SYSTEMS Mixed character of the problem : continuous mathematics discrete mathematics

13 Open window (A) and fly kite (B). String (C) lifts small door (D) allowing moths (E) to escape and eat red flannel shirt (F). As weight of shirt becomes less, shoe (G) steps on switch (H)which heats electric iron (I) and burns hole in pants (J). Smoke (K) enters hole in tree (L), smoking out opossum (M) which jumps into basket (N),pulling rope (O) and lifting cage (P), allowing woodpecker (Q) to chew wood from pencil (R), exposing lead. Emergency knife (S) is always handy in case opossum or the woodpecker gets sick and can't work. Rube Goldberg ’ s Pencil Sharpener invention

14 A Tale of Two Networks Sea Urchin Drosophila Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

15 A Proposal for Nobel Prize “Programs built into the DNA of every animal.” Eric H. Davidson Genomic Regulatory Systems One gene, 30 years of study, 300 docs and postdocs

16 The Dogma Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

17 Genomic Regulatory Regions Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

18 TF Binding Site Complexity Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

19 Genome Complexity 1 Billion DNA bases 20,000 Genes

20 cis-Regulatory Modules Complexity 200,000 cis-Modules Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

21 The DNA program that regulates the expression of endo16 in sea urchin  THE FIRST GENE

22  THE FIRST NETWORK

23 The View from the Genome Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

24 The View from the Nucleus Algorithmic Functions of Computational Biology – Course 1 Professor Istrail

25 Building Protein-DNA Assemblies  Inter-cismodule linkage  Insulation  Communication  cismodule  DNA  Cooperativity  Linear-amp  Gates  Potentiality Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

26 The Building Blocks  Protein  Free Energy  DNA  Protein-DNA Binding (free energy) Free energy is the “GLUE” Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

27 Information Processing Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

28 0 1 1 0 0 1 0 0  Boolean Circuit  Synchronous input and output  Completely defined gates 0 Algorithmic Functions of Computational Biology - Course 1 Professor Istrail

29 0 1 1 0 0 1 0 0 1.4 0.5  Synchronous input and output  Asynchronous input and output  Completely defined gates  Incompletely defined gates  Boolean Circuit  Boolinear Circuit 00  1.1

30 OR AND NOT 1 1 0 1 OR 1 IF (x1 = 1 AND x2= 1) THEN ….. GTAGGATTAAG …... CATCCTAATTC ……. GTATCTAGAAG …….

31  Web page :  http://www.its.caltech. edu/~chyuh/cathy- mirsky-info.html Caltech, Davidson Lab October 2004

32 Introduction SNPs, HAPLOTYPES

33 A SNP is a position in a genome at which two or more different bases occur in the population, each with a frequency >1%. GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG  The most abundant type of polymorphism The two alleles at the site are G and T Single Nucleotide Polymorphism (SNP)

34 tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggc ctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcag agttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatc attatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggcc atcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaat ctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccac tcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgc atataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgtt gagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagctt actgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttatt attttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggag ggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttg acgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagca ctttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaataga aaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcgg agcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaag aagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagct aacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactg gatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtgg acatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttga ggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca tctc gaga gaga gaga gaga gaga gcgc gcgc gcgc tctc gaga gaga gaga gaga gaga tctc tctc tctc tctc gaga gaga gaga tctc gcgc tctc tctc tctc Human Genome contains ~ 3 G basepairs arranged in 46 chromosomes. Two individuals are 99.9% the same. I.e. differ in ~ 3 M basepairs. SNPs occur once every ~600 bp Average gene in the human genome spans ~27Kb ~50 SNPs per gene

35 G C T C G A C A A C A G G T T C G T C A A C A G Two individuals C A G Haplotypes T T G SNP Haplotype

36 Mutations Infinite Sites Assumption: Each site mutates at most once

37 Haplotype Pattern 0 0 1 1 0 1 0 0 1 0 0 1 C A G T T T G A C A T G C T G T At each SNP site label the two alleles as 0 and 1. The choice which allele is 0 and which one is 1 is arbitrary.

38 G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A Recombination

39 G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A The two alleles are linked, I.e., they are “ traveling together ” ? Recombination disrupts the linkage Recombination

40 Variations in Chromosomes Within a Population Common Ancestor Emergence of Variations Over Time timepresent Disease Mutation Linkage Disequilibrium (LD)

41 Time = present 2,000 gens. ago Disease-Causing Mutation 1,000 gens. ago Extent of Linkage Disequilibrium

42 A Data Compression Problem  Select SNPs to use in an association study Would like to associate single nucleotide polymorphisms (SNPs) with disease.  Very large number of candidate SNPs Chromosome wide studies, whole genome-scans For cost effectiveness, select only a subset.  Closely spaced SNPs are highly correlated It is less likely that there has been a recombination between two SNPs if they are close to each other.

43 Disease Associations

44 Association studies Disease Responder Control Non-responder Allele 0Allele 1 Marker A is associated with Phenotype Marker A: Allele 0 = Allele 1 =

45  Evaluate whether nucleotide polymorphisms associate with phenotype TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G Association studies

46 TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G

47 11 00 0 00 00 0 00 10 0 11 11 1 10 10 1 10 00 1

48 Data Compression ACGATCGATCATGAT GGTGATTGCATCGAT ACGATCGGGCTTCCG ACGATCGGCATCCCG GGTGATTATCATGAT A------A---TG-- G------G---CG-- A------G---TC-- A------G---CC-- G------A---TG-- Haplotype Blocks based on LD (Method of Gabriel et al.2002) Selecting Tagging SNPs in blocks

49 Real Haplotype Data Two different runs of the Gabriel el al Block Detection method + Zhang et al SNP selection algorithm Our block-free algorithm A region of Chr. 22 45 Caucasian samples


Download ppt "Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail."

Similar presentations


Ads by Google