Presentation is loading. Please wait.

Presentation is loading. Please wait.

VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How.

Similar presentations


Presentation on theme: "VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How."— Presentation transcript:

1 VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How can we leverage genome sequences from many species to learn about genome function? Microbial applicationsMicrobial applications Inna Dubchak, Genomics Division LBNL, JGI ildubchak@lbl.gov vista@lbl.gov

2 Human Genome Annotation Gene A only 1–2% codingonly 1–2% coding efficient identification of regulatory sequences?efficient identification of regulatory sequences?

3 Sequence conservation implies function AGTTGAAAC GGAGCTGATGGAGC GGTGGGC T TACATTTCG ACTGTATCGCCTCG CAACCCT A potential functional region conservation sequence CTATAAATGC CTATAAATGC AC AC Last Common Ancestor divergence= non functional functional region =conservation 80 million years

4 Comparative Genomics Introduction Human Drosophila Mouse Urchin Chimp Similar Genes Synteny Sequence Alignment

5 http://genome.lbl.gov/vistahttp://genome.lbl.gov/vista VISTA is an integrated system for global sequence alignment and visualization for comparative genomic analysis

6 Algorithm Feature AVID *can handle draft sequence LAGAN ** produces true multiple alignments Shuffle-LAGAN ** handles rearrangements (inversions, translocations) * Lior Pachter, UC Berkeley ** Michael Brudno, U. Toronto How does VISTA Work: Global Genomic Aligments sequence 1 sequence 2 1- anchoring: identify regions of strong similarity 2- chaining: join regions of weak or no similarity

7 104670599 TCCCCAACTATAAATGGATGAAATTGCAGGAAATGACAGGTA-----TGACCCCTTCTCT 104670653 >>>>>>>>> ||| ||| | |||||| | || || | | | ||||||| || <<<<<<<<< 052328645 TCCTCAATTCAGAATGGAGGGAAGCACACAGGACACAGAGATCCCTTTACCCCCTTCGCT 052328704 104670654 ACCAGAGGCTTGGATTTTTTTTCTTCTTCTCCTCCCTTAGCCCGTGTTGAGCTATTTCGG 104670713 >>>>>>>>> | | | || | | | <<<<<<<<< 052328705 ATGT----------------------------------------TATCAGGCCACTCAAG 052328724 104670714 AGTTTCCTGGCAGGGAAGAGCGAGTGAGGCTGCCTTACCTTCAGGATGACCACTAGCAGG 104670773 >>>>>>>>> |||| | || || | ||||| ||||||| | ||| ||||||| ||||||||| |||||| <<<<<<<<< 052328725 AGTTCCTTGTCAAG-AAGAGTGAGTGAGTCCACCTCACCTTCAAGATGACCACCAGCAGG 052328783 104670774 CCAGCGCTCACAAGAAGAGGAATGAGGCTACTAATGAACCAGCTAAACCAGAGGATGCTG 104670833 >>>>>>>>> |||||||||||||| ||||| |||||||| |||| |||||||||||||||||||||| <<<<<<<<< 052328784 CCAGCGCTCACAAGCAGAGGGATGAGGCTGCTAACAAACCAGCTAAACCAGAGGATGCCA 052328843 104670834 TTGTCCAGGCCCATGATCCGCATGGTCTCTTTCAGCCGTGCCTCCTTCTCATACACGATG 104670893 >>>>>>>>> |||||||| |||||||||||||||||||| |||||||| ||||||||||||||||| ||| <<<<<<<<< 052328844 TTGTCCAGACCCATGATCCGCATGGTCTCCTTCAGCCGAGCCTCCTTCTCATACACAATG 052328903 104670894 CCCTTGATGATCACAGCCACTGAGTAAATCCAGGCCAGCGTCATGAAGAGGGGCATTGAC 104670953 >>>>>>>>> | ||||||||||||||| || ||||| |||||||| || ||||||||||||||||||||| <<<<<<<<< 052328904 CTCTTGATGATCACAGCGACAGAGTAGATCCAGGCTAGAGTCATGAAGAGGGGCATTGAC 052328963 104670954 CGGCTCATCACCCGCAGAAAGCTGGAGGCCCCAAGGAAGGACAAGGGGAGAAAGAAAGAC 104671013 >>>>>>>>> |||||||| ||||||||||| |||||||| | || || | || ||| | || |||| <<<<<<<<< 052328964 CGGCTCATGACCCGCAGAAAACTGGAGGCACAGAGAAAAGGCATGGGAAAAATGAAAAGT 052329023 104671014 ACACGTGAGCCAGGGTGATGGGCCAAGGCCTCTGAGCCTGCATGCTAGAGGGAGCACCAC 104671073 >>>>>>>>> ||||||| || | ||||||||| |||| || |||| ||| | <<<<<<<<< 052329024 ----GTGAGCCCGG-CACCGATCCAAGGCCT-------TGCACACTGGAGGACAAACCTC 052329071 104671074 ATCTGGGCCACAGAAGGACAGGCCCTCTAGACTCTGAAATGTACGTATGATCCAATGCTT 104671133 >>>>>>>>> ||| ||| | | | | | |||||| || ||||| ||||| | | || | || <<<<<<<<< 052329072 ATCAGGGTCGCTTATGAA-AGGCCCACTGAACTCTCAAATG--------ACCAAAGGTTT 052329122 104671134 CACGAGCAATGCAATGTAGAGAGAAAAACGAGGCTAACAAAGTGTTGCCAAACCAAATTT 104671193 >>>>>>>>> || |||| || | ||||| ||| | || | | || | ||| | |||||| <<<<<<<<< 052329123 CATTAGCAGTGGA---CAGAGATGAAACCTGGGTTTCGAGGGTATGGCCGTGCAAAATTT 052329179 104671194 CTTTGGGGGCTTGCTTCAGTAACTAGGTAACTGTGAGCGATAC-TTAAACTAAAGGTAGA 104671252 >>>>>>>>> || |||||| ||| | || ||||| || | || | | |||| |||| || <<<<<<<<< 052329180 TTTCAGGGGCTCTCTTTAATAGCTAGGAAATGGATAGGGTAATATTAAGATAAATATAAG 052329239 104671253 TTATGTTA--AAGTACTAAAAACCAAAACA------AAAAAACAACTCATTCTCTCACAA 104671304 >>>>>>>>> ||| || |||||||||| || || | || ||||| ||| | | | <<<<<<<<< 052329240 TTACTCTACTAAGTACTAAACACAAAGGGCGGGGGCAGAATCCAACTTGGTCTTCCGCTA 052329299 Global Genomic Aligner Output

8 VISTA visualization 104637349 GTAGTGCCACTGAGTGTGACAGGGATGGCAAGAAAAGCATTAAGTTCCAAGGGGAAAGAA 104637408 >>>>>>>>> | || ||| ||| |||| |||||||||| | || || |||| | |||||||| <<<<<<<<< 052290302 GAGATGTCACCAAGTA-AACAGAGATGGCAAGAGGACCAATAGGTTCTAGTGGGAAAGAC 052290360 “sliding window” to measure sequence conservation (default window size 100bp) Graphical presentation of sequence conservation as “peaks-and-valley” curve >70% identity base sequence coordinates % identity

9 VISTA homepage: http://genome.lbl.gov/vista VISTA Servers (submit your own data) VISTA Browsers (precomputed alignments) Other VISTA-related Projects Access servers, browsers, other information

10 wgVISTA Align and compare sequences, including microbial assemblies mVISTA Align and compare sequences rVISTA Search for TFBS combined with a comparative sequence analysis VISTA Servers GenomeVISTA Align DNA sequence to a genome

11 VISTA Browser Browse through pre-computed whole-genome alignments Whole Genome rVISTA Whole genome analysis for conserved TFBS over-represented in upstream regions of genes Precomputed Alignments VISTA-Point Browse and obtain sequence and alignment data

12 VISTA Browser: Access

13 VISTA Browser: Input Menu genomeposition visualization Java 2, if needed Choose “base” genome Select location Determine visualization preference VISTA Browser VISTA tracks on UCSC Browser VISTA-Point

14 VISTA Browser: Alignment Details direction exon repeats alignment SNPs gene

15 VISTA Browser: Result Position on chromosome Control Panel Graphical display of genome alignments Color Legend Cursor Info Menu & Icons Curve annotation (species) 1 row

16 VISTA Browser: Zooming vs. rhesus vs. dog

17 VISTA browser

18 VISTA Point: Access Overview

19 VISTA Point: Graphics Table

20 VISTA Point: AlignmentsTable sequence

21

22 Google map-like Dot-Plot

23

24 BlockView – Synteny Plot tool

25

26

27 RegTransBase – experimental data manually curated database of regulatory interactions captured from literature; 6000 papers RegPrecise – computational predictions manually curated database of regulons inferred by comparative genomics approach RegPredict – web tool for regulon inference integrated system for fast and accurate inference of regulons by comparative genomics NAR database issue, 2010; Featured Article NAR Web Server issue, 2010; Featured Article Principal components NAR database issue, 2007

28 mVISTA: Access

29 mVISTA: Interface Our example will show 3 sequences Align up to 100 sequences

30 mVISTA: Input of Sequences Provide your email address Upload your sequences Or enter GenBank ID your email upload file or GenBank ID

31 AVID multiple pair wise alignments accepts finished or draft sequences LAGAN true multiple alignments mVISTA: Input Parameters  Shuffle-LAGAN –multiple pair wise alignments –detects sequence rearrangements and inversions

32 mVISTA: Results PDF VISTA Browser VISTA -Point

33 wgVISTA: Microbial Assemblies Comparison wgVISTA: whole genome VISTA Compares 2 sequences (up to 10 Mb) Draft or finished microbial assembly sequences can be used

34 rVISTA: Access

35 Regulatory VISTA (rVISTA): prediction of transcription factor binding sites Simultaneous searches of the major transcription factor binding site database (Transfac) and the use of global sequence alignment to sieve through the data rVISTA search is automatically run when submitting: mVISTA mVISTA genomeVISTA genomeVISTA

36 Human TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACAAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTGTCTCTCCCTTCCCCTCTG Mouse TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTCTCTCTTCCTCCCCCTCCA Dog TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCGATTTTCTACCTACGACCTCACTTTCTGTTGCGCTCACTCCCTTCCCCTGCA Rat TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGTTCTCTCTTCCTCCCCCTCCA Cow TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCGTTCTCTCCCTTCCCCTCCT Rabbit TGATTTCTCGGCAGCCAGGGAGGGCCCCACGAC-AAGCCATTCAAAATCCCAGAAGTGATTTTCTACTTACGACCTCACTTTCTGTTG----CTCTCTCCTTCCCTCCA Ikaros-2 Ikaros-2 NFAT Ikaros-2 20 bp dynamic shifting window >80% ID 1. Identify potential transcription factor binding sites for each sequence using library of matrices (TRANSFAC) 2. Identify aligned sites using VISTA 3. Identify conserved sites using dynamic shifting window Regulatory VISTA (rVISTA):

37 rVISTA: Interface your email sequences rVISTA sequence submission: set number Submit email address, sequences, and set parameters Key step: click the box for: Find potential transcription factors

38 rVISTA: Select TRANSFAC Matrices

39 rVISTA: Mailed Results Emailed results will provide a link Choose which binding sites matrices to display You can then choose visualization options display

40 rVISTA: Results Graphic Blue all transcription factor (TF) binding sites Red TF sites which are aligned in both sequences Green TF sites which are aligned & in conserved regions sequences sites

41 Whole Genome rVISTA: Access

42 Whole Genome rVISTA: Select Alignment IDs or symbols upstream range

43 Whole Genome rVISTA: Results sites found view genes

44 Examples of VISTA usage Non-coding regulatory regions, for example enhancers Genes from the same gene families Alternative splicing Transcriptional regulation Genetic studies References collected are available through the Publications link at the VISTA home page http://genome.lbl.gov/vista http://genome.lbl.gov/vista

45 VISTA-related Publications

46 http:/www.openhelix.com

47 VISTA thanks BiologyGenomics Division, LBNL lead by Dr. Edward Rubin Dario BoffelliKelly Frazer Gaby Loots Len PennacchioMarcelo Nobrega Axel Visel Bioinformatics Michael BrudnoOlivier Couronne Simon Minovitsky Igor RatnerAlexander Poliakov Lior Pachter (UCB) Shyam PrabhakarDmitriy RyaboyNameeta Shah Inna Dubchak


Download ppt "VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How."

Similar presentations


Ads by Google