VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How.

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Lettuce genetic map viewer is written in PHP and uses GD library. The viewer interacts with tables in the relational mySQL database and creates graphical.
GenomePixelizer - a visualization tool for comparative genomics within and between species. A. Kozik, E. Kochetkova, and R. Michelmore (Department of Vegetable.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
HMM Sampling and Applications to Gene Finding and Alignment European Conference on Computational Biology 2003 Simon Cawley * and Lior Pachter + and thanks.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
PAZAR DATABASE CHIP-SEQ DEPOSIT Wyeth Wasserman.
Lab 3.41 Demo: Exploiting the UCSC Genome Browser Stefanie Butland UBC Bioinformatics Centre
Finding genes in human using the mouse Finding genes in mouse using the human Lior Pachter Department of Mathematics U.C. Berkeley.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
How to access genomic information using Ensembl August 2005.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
NGS Analysis Using Galaxy
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery l Cellular Overview.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
Copyright OpenHelix. No use or reproduction without express written consent1.
The UCSC Genome Browser Introduction
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
SAGExplore web server tutorial for Module II: Genome Mapping.
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
GeneWise and Artemis Exercises Spliced Alignment using GeneWise Click on the GeneWise hyperlink on the course links page,
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Copyright OpenHelix. No use or reproduction without express written consent1.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Sackler Medical School
Comparative Genomics Gene Regulatory Networks (GRNs) Anil Jegga Biomedical Informatics Contact Information: Anil Jegga Biomedical Informatics Room # 232,
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
Copyright OpenHelix. No use or reproduction without express written consent1.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
Copyright OpenHelix. No use or reproduction without express written consent1.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Regulation of Gene Expression
Regulatory Genomics Lab
EPConDB: Endocrine Pancreas Consortium Database
University of Pittsburgh
Ensembl Genome Repository.
Yating Liu July 2018 G-OnRamp workshop
Regulatory Genomics Lab
Regulatory Genomics Lab
Presentation transcript:

VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How can we leverage genome sequences from many species to learn about genome function? Microbial applicationsMicrobial applications Inna Dubchak, Genomics Division LBNL, JGI

Human Genome Annotation Gene A only 1–2% codingonly 1–2% coding efficient identification of regulatory sequences?efficient identification of regulatory sequences?

Sequence conservation implies function AGTTGAAAC GGAGCTGATGGAGC GGTGGGC T TACATTTCG ACTGTATCGCCTCG CAACCCT A potential functional region conservation sequence CTATAAATGC CTATAAATGC AC AC Last Common Ancestor divergence= non functional functional region =conservation 80 million years

Comparative Genomics Introduction Human Drosophila Mouse Urchin Chimp Similar Genes Synteny Sequence Alignment

VISTA is an integrated system for global sequence alignment and visualization for comparative genomic analysis

Algorithm Feature AVID *can handle draft sequence LAGAN ** produces true multiple alignments Shuffle-LAGAN ** handles rearrangements (inversions, translocations) * Lior Pachter, UC Berkeley ** Michael Brudno, U. Toronto How does VISTA Work: Global Genomic Aligments sequence 1 sequence 2 1- anchoring: identify regions of strong similarity 2- chaining: join regions of weak or no similarity

TCCCCAACTATAAATGGATGAAATTGCAGGAAATGACAGGTA-----TGACCCCTTCTCT >>>>>>>>> ||| ||| | |||||| | || || | | | ||||||| || <<<<<<<<< TCCTCAATTCAGAATGGAGGGAAGCACACAGGACACAGAGATCCCTTTACCCCCTTCGCT ACCAGAGGCTTGGATTTTTTTTCTTCTTCTCCTCCCTTAGCCCGTGTTGAGCTATTTCGG >>>>>>>>> | | | || | | | <<<<<<<<< ATGT TATCAGGCCACTCAAG AGTTTCCTGGCAGGGAAGAGCGAGTGAGGCTGCCTTACCTTCAGGATGACCACTAGCAGG >>>>>>>>> |||| | || || | ||||| ||||||| | ||| ||||||| ||||||||| |||||| <<<<<<<<< AGTTCCTTGTCAAG-AAGAGTGAGTGAGTCCACCTCACCTTCAAGATGACCACCAGCAGG CCAGCGCTCACAAGAAGAGGAATGAGGCTACTAATGAACCAGCTAAACCAGAGGATGCTG >>>>>>>>> |||||||||||||| ||||| |||||||| |||| |||||||||||||||||||||| <<<<<<<<< CCAGCGCTCACAAGCAGAGGGATGAGGCTGCTAACAAACCAGCTAAACCAGAGGATGCCA TTGTCCAGGCCCATGATCCGCATGGTCTCTTTCAGCCGTGCCTCCTTCTCATACACGATG >>>>>>>>> |||||||| |||||||||||||||||||| |||||||| ||||||||||||||||| ||| <<<<<<<<< TTGTCCAGACCCATGATCCGCATGGTCTCCTTCAGCCGAGCCTCCTTCTCATACACAATG CCCTTGATGATCACAGCCACTGAGTAAATCCAGGCCAGCGTCATGAAGAGGGGCATTGAC >>>>>>>>> | ||||||||||||||| || ||||| |||||||| || ||||||||||||||||||||| <<<<<<<<< CTCTTGATGATCACAGCGACAGAGTAGATCCAGGCTAGAGTCATGAAGAGGGGCATTGAC CGGCTCATCACCCGCAGAAAGCTGGAGGCCCCAAGGAAGGACAAGGGGAGAAAGAAAGAC >>>>>>>>> |||||||| ||||||||||| |||||||| | || || | || ||| | || |||| <<<<<<<<< CGGCTCATGACCCGCAGAAAACTGGAGGCACAGAGAAAAGGCATGGGAAAAATGAAAAGT ACACGTGAGCCAGGGTGATGGGCCAAGGCCTCTGAGCCTGCATGCTAGAGGGAGCACCAC >>>>>>>>> ||||||| || | ||||||||| |||| || |||| ||| | <<<<<<<<< GTGAGCCCGG-CACCGATCCAAGGCCT TGCACACTGGAGGACAAACCTC ATCTGGGCCACAGAAGGACAGGCCCTCTAGACTCTGAAATGTACGTATGATCCAATGCTT >>>>>>>>> ||| ||| | | | | | |||||| || ||||| ||||| | | || | || <<<<<<<<< ATCAGGGTCGCTTATGAA-AGGCCCACTGAACTCTCAAATG ACCAAAGGTTT CACGAGCAATGCAATGTAGAGAGAAAAACGAGGCTAACAAAGTGTTGCCAAACCAAATTT >>>>>>>>> || |||| || | ||||| ||| | || | | || | ||| | |||||| <<<<<<<<< CATTAGCAGTGGA---CAGAGATGAAACCTGGGTTTCGAGGGTATGGCCGTGCAAAATTT CTTTGGGGGCTTGCTTCAGTAACTAGGTAACTGTGAGCGATAC-TTAAACTAAAGGTAGA >>>>>>>>> || |||||| ||| | || ||||| || | || | | |||| |||| || <<<<<<<<< TTTCAGGGGCTCTCTTTAATAGCTAGGAAATGGATAGGGTAATATTAAGATAAATATAAG TTATGTTA--AAGTACTAAAAACCAAAACA------AAAAAACAACTCATTCTCTCACAA >>>>>>>>> ||| || |||||||||| || || | || ||||| ||| | | | <<<<<<<<< TTACTCTACTAAGTACTAAACACAAAGGGCGGGGGCAGAATCCAACTTGGTCTTCCGCTA Global Genomic Aligner Output

VISTA visualization GTAGTGCCACTGAGTGTGACAGGGATGGCAAGAAAAGCATTAAGTTCCAAGGGGAAAGAA >>>>>>>>> | || ||| ||| |||| |||||||||| | || || |||| | |||||||| <<<<<<<<< GAGATGTCACCAAGTA-AACAGAGATGGCAAGAGGACCAATAGGTTCTAGTGGGAAAGAC “sliding window” to measure sequence conservation (default window size 100bp) Graphical presentation of sequence conservation as “peaks-and-valley” curve >70% identity base sequence coordinates % identity

VISTA homepage: VISTA Servers (submit your own data) VISTA Browsers (precomputed alignments) Other VISTA-related Projects Access servers, browsers, other information

wgVISTA Align and compare sequences, including microbial assemblies mVISTA Align and compare sequences rVISTA Search for TFBS combined with a comparative sequence analysis VISTA Servers GenomeVISTA Align DNA sequence to a genome

VISTA Browser Browse through pre-computed whole-genome alignments Whole Genome rVISTA Whole genome analysis for conserved TFBS over-represented in upstream regions of genes Precomputed Alignments VISTA-Point Browse and obtain sequence and alignment data

VISTA Browser: Access

VISTA Browser: Input Menu genomeposition visualization Java 2, if needed Choose “base” genome Select location Determine visualization preference VISTA Browser VISTA tracks on UCSC Browser VISTA-Point

VISTA Browser: Alignment Details direction exon repeats alignment SNPs gene

VISTA Browser: Result Position on chromosome Control Panel Graphical display of genome alignments Color Legend Cursor Info Menu & Icons Curve annotation (species) 1 row

VISTA Browser: Zooming vs. rhesus vs. dog

VISTA browser

VISTA Point: Access Overview

VISTA Point: Graphics Table

VISTA Point: AlignmentsTable sequence

Google map-like Dot-Plot

BlockView – Synteny Plot tool

RegTransBase – experimental data manually curated database of regulatory interactions captured from literature; 6000 papers RegPrecise – computational predictions manually curated database of regulons inferred by comparative genomics approach RegPredict – web tool for regulon inference integrated system for fast and accurate inference of regulons by comparative genomics NAR database issue, 2010; Featured Article NAR Web Server issue, 2010; Featured Article Principal components NAR database issue, 2007

mVISTA: Access

mVISTA: Interface Our example will show 3 sequences Align up to 100 sequences

mVISTA: Input of Sequences Provide your address Upload your sequences Or enter GenBank ID your upload file or GenBank ID

AVID multiple pair wise alignments accepts finished or draft sequences LAGAN true multiple alignments mVISTA: Input Parameters  Shuffle-LAGAN –multiple pair wise alignments –detects sequence rearrangements and inversions

mVISTA: Results PDF VISTA Browser VISTA -Point

wgVISTA: Microbial Assemblies Comparison wgVISTA: whole genome VISTA Compares 2 sequences (up to 10 Mb) Draft or finished microbial assembly sequences can be used

rVISTA: Access

Regulatory VISTA (rVISTA): prediction of transcription factor binding sites Simultaneous searches of the major transcription factor binding site database (Transfac) and the use of global sequence alignment to sieve through the data rVISTA search is automatically run when submitting: mVISTA mVISTA genomeVISTA genomeVISTA

Human TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACAAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTGTCTCTCCCTTCCCCTCTG Mouse TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTCTCTCTTCCTCCCCCTCCA Dog TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCGATTTTCTACCTACGACCTCACTTTCTGTTGCGCTCACTCCCTTCCCCTGCA Rat TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGTTCTCTCTTCCTCCCCCTCCA Cow TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCGTTCTCTCCCTTCCCCTCCT Rabbit TGATTTCTCGGCAGCCAGGGAGGGCCCCACGAC-AAGCCATTCAAAATCCCAGAAGTGATTTTCTACTTACGACCTCACTTTCTGTTG----CTCTCTCCTTCCCTCCA Ikaros-2 Ikaros-2 NFAT Ikaros-2 20 bp dynamic shifting window >80% ID 1. Identify potential transcription factor binding sites for each sequence using library of matrices (TRANSFAC) 2. Identify aligned sites using VISTA 3. Identify conserved sites using dynamic shifting window Regulatory VISTA (rVISTA):

rVISTA: Interface your sequences rVISTA sequence submission: set number Submit address, sequences, and set parameters Key step: click the box for: Find potential transcription factors

rVISTA: Select TRANSFAC Matrices

rVISTA: Mailed Results ed results will provide a link Choose which binding sites matrices to display You can then choose visualization options display

rVISTA: Results Graphic Blue all transcription factor (TF) binding sites Red TF sites which are aligned in both sequences Green TF sites which are aligned & in conserved regions sequences sites

Whole Genome rVISTA: Access

Whole Genome rVISTA: Select Alignment IDs or symbols upstream range

Whole Genome rVISTA: Results sites found view genes

Examples of VISTA usage Non-coding regulatory regions, for example enhancers Genes from the same gene families Alternative splicing Transcriptional regulation Genetic studies References collected are available through the Publications link at the VISTA home page

VISTA-related Publications

VISTA thanks BiologyGenomics Division, LBNL lead by Dr. Edward Rubin Dario BoffelliKelly Frazer Gaby Loots Len PennacchioMarcelo Nobrega Axel Visel Bioinformatics Michael BrudnoOlivier Couronne Simon Minovitsky Igor RatnerAlexander Poliakov Lior Pachter (UCB) Shyam PrabhakarDmitriy RyaboyNameeta Shah Inna Dubchak