Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tools for Comparative Sequence Analysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory.

Similar presentations


Presentation on theme: "Tools for Comparative Sequence Analysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory."— Presentation transcript:

1 Tools for Comparative Sequence Analysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory

2 A set of problems: http://www.dcode.org/bioquest.php 1. Browsing genomes using synteny links 2. Aligning sequences to vertebrate genomes 3. Aligning sequences to identify evolutionary conserved regions 4. Assigning function to regulatory elements 5. Decoding gene regulation using microarray data

3 zPicture: Dynamic Alignment of Megabase-long Sequences and Genomes http://zpicture.dcode.org

4 zPicture http://zpicture.dcode.org/ Automated sequence extraction and gene annotation I. Ovcharenko, G. Loots, R.C. Hardison, W. Miller, and Lisa Stubbs Genome Research, 14(3), 472-477 (2004)

5 >hg16_dna range=chr16:55400000-55800000 Tataatggctacctatttggagtgcctaccatgtattagtcattgtgcta actgatgtataggcatctcatttacagttcaactcatttgaacctaaatg aagaatagttgtttgtcccttattttatttaacaaaatttaaaactattt ctaagtcgctcattaaatgacaaagcttaaaccaaattttgtctgattgt aaaggccatacttttAATCATTTATATAAAACAACGCAGCCATATTTAAC TTCTGCCATATATTTTCTTACCGATGAATGATATATATCAAATGTTGACT TAGTTTTTAAATGGAAGACAGAAGCGGTTTAGAATGGCCTATTTTCAGTC AGCCAAAAATGTCAAAACCTTCTGTGAGTAGTCCAGGTACTGGAAATCAG ACAATTTGAACTTCAGGATACTACAATAATTTTTTCCTTTGTGGGTAGTG GTGGAGCATGAATTCTCTACTTCTTATTGGTCCTTCTGCTATGATGGCCC TTTCAGTCACACCTCTGTTCTCAAAATAAGAATATAATCAATAAAGTAGA GTTTGAGGGAACGGAGGACTAAGTCAAAAGTGGGATACCTAGGACTTCAT TCTAGttactgtggaattatctcctttgcttttcttcctgtttgtgcttt ttctatcctgttaattctcctgccttatggaaagcacagtgattgtttca cagcataaaccagacatcacttttccagtttaattttttttcaaaggccc ccattgcattttggaaaaaattcaaaatattcaacatggcctacaaagcc ctgtcacccttaaatagtgtgttgagtctggctcctacccacagtctaaa tctcaactgtctccaatcttctccctcactaaactcctaccagcaaatct tttcttcaaactggctaatgccctattctagcctcagagttttgtgctgc tgttctcttaggtacagtgtttttccccaagatttttatctggctttctc ttcttcatttagacttttaaacaaacagcttcatgaattacttgagatgt aattaatatacatacaatttacccatttaaggtatacattttaatgtttt tattatattcacagagttgtacaaccatcacactctaatttcagaacgtt ttcatcttgattcagattttaaatcaaatgtcacatcatccagtaggaac tccagtcactaattagaaatacccattatgtttttacacacattctcaat cccactacctgtttgttattgcacttgaacttacatgaaactatttactt gtttatacatttattgtctGTTATTCCTAGCACATAGAAGGTATGTCTGG CACATAGCAAACACTCGATCTTTGATGAATGAATGAATAATGATAACATT AACTTTTTTGCTTATTCTGCCTTGTATTGTGTAAGATTAGAGACaatcct tacaacaaacttgaaaacccagacttaacgatctctaaaactcacatgta agttaaggctcagagaagtttcatcacttgctcagagttacgtaactggt gaataccgaggctagatttcaaacccaaggctgcccggctctaaaTGAGG GGATATTTGATTAGGCCAAAGTAACCTGAACCCTTAAAATAACcaggctt taacttccagaaacatgggaactagataacctaagaacctgctggccacg aaacccctagaatactgaacacaatatcacaaacatattttgaaatgcat agatgagcatgtaaaatactgagggaactcctcaatggccaaaagtggaa agcagatgaaaaccagaactgtgtaaaagcctgaaagttacagtcgtcct gcagacatttgtcaatctcagtaacaaagggacttagtattttttggcta tggaagacaaaaacaagctttttgtataaggtgggaatgttgaactgaga cctcatgggagaaaaagcagatgaagggttagaggctcagtaaaagaatg aactggaaaaatccatcttctgacaaagaaagacaatgaggaaacttttc tgtcttgggctgggtgCTTGGTTGGAGCAGGGGGAAAGAATCTCTGATTT > 69149 115179 SLC6A2 69149 69197 UTR 69198 69471 exon 82066 82197 exon 84439 84676 exon 97643 97781 exon 104518 104652 exon 106610 106713 exon 107878 108002 exon 108825 108937 exon 110497 110625 exon 111069 111168 exon 112154 112254 exon 112739 112906 exon 114463 114534 exon 114923 114946 exon 114947 115179 UTR > 173279 186382 CESR 173279 173321 UTR 173322 173373 exon 177416 177623 exon 180095 180239 exon 182703 182836 exon 184865 185018 exon 185907 186077 exon 186078 186382 UTR > 173303 203537 CES1 173303 173321 UTR 173322 173373 exon 177419 177623 exon 180095 180239 exon 182703 182836 exon 184865 185018 exon 185907 186014 exon 186747 186851 exon 189424 189462 exon 193343 193483 exon 195380 195460 exon 195723 195870 exon 199927 200058 exon 202790 202862 exon 203159 203342 exon 203343 203537 UTR < 212212 242464 CES1 212212 212406 UTR 212407 212590 exon 212887 212959 exon 215691 215822 exon 219879 220026 exon 220289 220369 exon 222266 222406 exon 226287 226325 exon 228898 229002 exon 229735 229842 exon 230731 230884 exon 232913 233046 exon 235514 235658 exon 238133 238337 exon 242394 242445 exon 242446 242464 UTR < 229367 242488 CESR 229367 229671 UTR 229672 229842 exon 230731 230884 exon 232913 233046 exon 235514 235658 exon 238133 238340 exon 242394 242445 exon 242446 242488 UTR < 255598 284772 FLJ31547 255598 255832 UTR 255833 256064 exon 256150 256222 exon 262265 262412 exon 265761 265829 exon 268931 269071 exon 270794 270898 exon 272730 272834 exon 275344 275497 exon 279013 279146 exon 281027 281165 exon 283235 283439 exon Automated sequence and gene annotation extraction http://zpicture.dcode.org/ chr16:55,400,000-…

6

7 zPicture: dynamic & interactive alignments visualization tool. http://zpicture.dcode.org/ Dynamic rotation from Pip- to Smooth- plots Interactive parameter changes

8

9 zPicture: dynamic annotation

10 zPicture: dynamic selection of conservation parameters 100bps/70% 500bps/85%

11 Mycobacterium leprae vs. Mycobacterium tuberculosis. Conservation of genes: NONhypothetical genes – 97% are conserved Hypothetical genes --  20% are conserved zPicture: Aligning complete microbial genomes

12 rVista 2.0: Identification of Evolutionarily Conserved Transcription Factor Binding Sites http://rvista.dcode.org

13 rVista 2.0 http://rvista.dcode.org/ Identification of Evolutionarily Conserved Transcription Factor Binding Sites http://zpicture.dcode.org http://ecrbrowser.dcode.org http://globin.cse.psu.edu/gala

14 Human ACTTTCCTACATCTATCTATA |||||::|||||||:|||||| Mouse ACTTTGATACATCTCTCTATA Human ACTTTGATACATCTATCTATA ||||||||||||||:|||||| Mouse ACTTTGATACATCTCTCTATA Human -----GATACATCTATCTATA ||||| Mouse ACTTTGATAC----------- Human ACTTTGATACATCTATCTATA ||||| Mouse ACTTT----------------

15

16

17 zPicture-rVista 2.0 interconnection zPicture rVista 2.0

18 ECR Browser: Tool for Browsing Genome Conservation Profiles http://ecrbrowser.dcode.org

19

20

21

22

23

24

25

26

27 Grab ECR :: direct access to a conserved element

28 Genome Alignment: Align your sequence to a vertebrate genome

29 Genome Alignment AC146831

30 Genome alignment: Output page

31 ECR Browser contains rVista portal

32

33 eShadow: Phylogenetic Shadowing of Closely Related Speicies http://eshadow.dcode.org

34 eShadow: Phylogenetic Shadowing http://eshadow.dcode.org

35

36 Phylogenetic shadowing on multiple (10-14) primate sequences Apo-B Plasminogen LXR-alpha CETP Boffelli et al., Science, 2003

37

38 CREME: Using Microarray Data to Decode Genome Regulation http://crem.dcode.org

39

40

41 TFBS in Promoter ECRs of RefSeq genes ~13k RefSeq loci ~8k Conserved promoters 414 TRANSFAC PWMs ~ 3M predicted TFBS

42 TFBS in Promoter ECRs of RefSeq genes Testing Motif Abundances Identify enriched motifs in a gene set relative to a background set. Take into account length of promoters Filtering Similar PWMs TRANSFAC contains many redundancies: –Different PWMs for the same TF. –Similar PWMs for TFs from the same family. Filtering strategy: –For two PWMs that tend to co-occur in a very small window (4bp), remove the less enriched one.

43 Human Cell Cycle 16 enriched PWMs 1089 modules 336 genes, Whitfield et al. 02. 7 significant modules 5 coherently expressed E2F, NFY, CREB…

44 Human Cell Cycle DELTAEF1, EVI1, GR : 11 genes, p=0.01

45 Validation on a known module NFAT-AP1: –10 known genes containing multiple regulatory elements. In all NFAT is upstream of AP1. –CREME reported the correct module only (p=0.01). –CREME correctly identified the correct orientation of the TFBS. –The module was identified even after adding 10 random promoters to the gene set.

46 Colleagues and collaborators Lawrence Livermore National Laboratory UC, Berkeley Stanford Lawrence Berkeley National Laboratory Pennsylvania State University www.dcode.org Gaby Loots Lisa Stubbs Roded Sharan Asa Ben-Hur Ross HardisonWebb Miller Marcelo Nobrega Dario Boffelli Sha Hammond


Download ppt "Tools for Comparative Sequence Analysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory."

Similar presentations


Ads by Google