Tools for Comparative Sequence Analysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Periodic clusters. Non periodic clusters That was only the beginning…
Transcriptional regulation and promoter analysis
Annotation standards in ORegAnno (Draft) Obi Griffith The RegCreative Jamboree Nov 29, 2006 Ghent, Belgium.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Footprints and Shadows Looking for Functional Pieces Within Genomes.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.
A Computational Analysis of the H Region of Mouse Olfactory Receptor Locus 28 Deanna Mendez SoCalBSI August 2004.
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genome-wide computational prediction of transcriptional regulatory modules reveal new insights into human gene expression Mathieu Blanchette et al. Presented.
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
Regulation of Gene Expression: An Overview  Transcriptional  Tissue-specific transcription factors  Direct binding of hormones, growth factors, etc.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
1 Supplemental Figure 1 Expression analysis of MPF1-like Withania duplicates The RNAs isolated from leaves, flower buds, sepals, stamens, carpels and siliques.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Identification of Compositionally Similar Cis-element Clusters in Coordinately Regulated Genes Anil G Jegga, Ashima Gupta, Andrew T Pinski, James W Carman,
Sackler Medical School
The TRANSFAC ® System comprises 7 databases: TRANSFAC ® Professional Suite TRANSFAC ® Professional Transcription factor database TRANSCompel ® Professional.
Comparative Genomics Gene Regulatory Networks (GRNs) Anil Jegga Biomedical Informatics Contact Information: Anil Jegga Biomedical Informatics Room # 232,
Mark D. Adams Dept. of Genetics 9/10/04
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.
Copyright OpenHelix. No use or reproduction without express written consent1.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Cis-regulatory Modules and Module Discovery
Comparative Sequence Analysis BioQUEST Workshop, Beloit, June Ivan Ovcharenko Lawrence Livermore National Laboratory.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Transcription factor binding motifs (part II) 10/22/07.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
The Transcriptional Landscape of the Mammalian Genome
Background for Molecular Biology of Lactase Persistence
Structure of proximal and distant regulatory elements in the human genome Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology.
Problems from last section
Fig. 1. Prediction of Foxn4 cis-regulatory elements and experimental design for functional verification.(A) Comparative sequence analysis between mouse.
Gene regulatory regions of the insect/crustacean egr-B homologs.
Presentation transcript:

Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory

A set of problems: 1. Browsing genomes using synteny links 2. Aligning sequences to vertebrate genomes 3. Aligning sequences to identify evolutionary conserved regions 4. Assigning function to regulatory elements 5. Decoding gene regulation using microarray data

zPicture: Dynamic Alignment of Megabase-long Sequences and Genomes

zPicture Automated sequence extraction and gene annotation I. Ovcharenko, G. Loots, R.C. Hardison, W. Miller, and Lisa Stubbs Genome Research, 14(3), (2004)

>hg16_dna range=chr16: Tataatggctacctatttggagtgcctaccatgtattagtcattgtgcta actgatgtataggcatctcatttacagttcaactcatttgaacctaaatg aagaatagttgtttgtcccttattttatttaacaaaatttaaaactattt ctaagtcgctcattaaatgacaaagcttaaaccaaattttgtctgattgt aaaggccatacttttAATCATTTATATAAAACAACGCAGCCATATTTAAC TTCTGCCATATATTTTCTTACCGATGAATGATATATATCAAATGTTGACT TAGTTTTTAAATGGAAGACAGAAGCGGTTTAGAATGGCCTATTTTCAGTC AGCCAAAAATGTCAAAACCTTCTGTGAGTAGTCCAGGTACTGGAAATCAG ACAATTTGAACTTCAGGATACTACAATAATTTTTTCCTTTGTGGGTAGTG GTGGAGCATGAATTCTCTACTTCTTATTGGTCCTTCTGCTATGATGGCCC TTTCAGTCACACCTCTGTTCTCAAAATAAGAATATAATCAATAAAGTAGA GTTTGAGGGAACGGAGGACTAAGTCAAAAGTGGGATACCTAGGACTTCAT TCTAGttactgtggaattatctcctttgcttttcttcctgtttgtgcttt ttctatcctgttaattctcctgccttatggaaagcacagtgattgtttca cagcataaaccagacatcacttttccagtttaattttttttcaaaggccc ccattgcattttggaaaaaattcaaaatattcaacatggcctacaaagcc ctgtcacccttaaatagtgtgttgagtctggctcctacccacagtctaaa tctcaactgtctccaatcttctccctcactaaactcctaccagcaaatct tttcttcaaactggctaatgccctattctagcctcagagttttgtgctgc tgttctcttaggtacagtgtttttccccaagatttttatctggctttctc ttcttcatttagacttttaaacaaacagcttcatgaattacttgagatgt aattaatatacatacaatttacccatttaaggtatacattttaatgtttt tattatattcacagagttgtacaaccatcacactctaatttcagaacgtt ttcatcttgattcagattttaaatcaaatgtcacatcatccagtaggaac tccagtcactaattagaaatacccattatgtttttacacacattctcaat cccactacctgtttgttattgcacttgaacttacatgaaactatttactt gtttatacatttattgtctGTTATTCCTAGCACATAGAAGGTATGTCTGG CACATAGCAAACACTCGATCTTTGATGAATGAATGAATAATGATAACATT AACTTTTTTGCTTATTCTGCCTTGTATTGTGTAAGATTAGAGACaatcct tacaacaaacttgaaaacccagacttaacgatctctaaaactcacatgta agttaaggctcagagaagtttcatcacttgctcagagttacgtaactggt gaataccgaggctagatttcaaacccaaggctgcccggctctaaaTGAGG GGATATTTGATTAGGCCAAAGTAACCTGAACCCTTAAAATAACcaggctt taacttccagaaacatgggaactagataacctaagaacctgctggccacg aaacccctagaatactgaacacaatatcacaaacatattttgaaatgcat agatgagcatgtaaaatactgagggaactcctcaatggccaaaagtggaa agcagatgaaaaccagaactgtgtaaaagcctgaaagttacagtcgtcct gcagacatttgtcaatctcagtaacaaagggacttagtattttttggcta tggaagacaaaaacaagctttttgtataaggtgggaatgttgaactgaga cctcatgggagaaaaagcagatgaagggttagaggctcagtaaaagaatg aactggaaaaatccatcttctgacaaagaaagacaatgaggaaacttttc tgtcttgggctgggtgCTTGGTTGGAGCAGGGGGAAAGAATCTCTGATTT > SLC6A UTR exon exon exon exon exon exon exon exon exon exon exon exon exon exon UTR > CESR UTR exon exon exon exon exon exon UTR > CES UTR exon exon exon exon exon exon exon exon exon exon exon exon exon exon UTR < CES UTR exon exon exon exon exon exon exon exon exon exon exon exon exon exon UTR < CESR UTR exon exon exon exon exon exon UTR < FLJ UTR exon exon exon exon exon exon exon exon exon exon exon Automated sequence and gene annotation extraction chr16:55,400,000-…

zPicture: dynamic & interactive alignments visualization tool. Dynamic rotation from Pip- to Smooth- plots Interactive parameter changes

zPicture: dynamic annotation

zPicture: dynamic selection of conservation parameters 100bps/70% 500bps/85%

Mycobacterium leprae vs. Mycobacterium tuberculosis. Conservation of genes: NONhypothetical genes – 97% are conserved Hypothetical genes --  20% are conserved zPicture: Aligning complete microbial genomes

rVista 2.0: Identification of Evolutionarily Conserved Transcription Factor Binding Sites

rVista Identification of Evolutionarily Conserved Transcription Factor Binding Sites

Human ACTTTCCTACATCTATCTATA |||||::|||||||:|||||| Mouse ACTTTGATACATCTCTCTATA Human ACTTTGATACATCTATCTATA ||||||||||||||:|||||| Mouse ACTTTGATACATCTCTCTATA Human -----GATACATCTATCTATA ||||| Mouse ACTTTGATAC Human ACTTTGATACATCTATCTATA ||||| Mouse ACTTT

zPicture-rVista 2.0 interconnection zPicture rVista 2.0

ECR Browser: Tool for Browsing Genome Conservation Profiles

Grab ECR :: direct access to a conserved element

Genome Alignment: Align your sequence to a vertebrate genome

Genome Alignment AC146831

Genome alignment: Output page

ECR Browser contains rVista portal

eShadow: Phylogenetic Shadowing of Closely Related Speicies

eShadow: Phylogenetic Shadowing

Phylogenetic shadowing on multiple (10-14) primate sequences Apo-B Plasminogen LXR-alpha CETP Boffelli et al., Science, 2003

CREME: Using Microarray Data to Decode Genome Regulation

TFBS in Promoter ECRs of RefSeq genes ~13k RefSeq loci ~8k Conserved promoters 414 TRANSFAC PWMs ~ 3M predicted TFBS

TFBS in Promoter ECRs of RefSeq genes Testing Motif Abundances Identify enriched motifs in a gene set relative to a background set. Take into account length of promoters Filtering Similar PWMs TRANSFAC contains many redundancies: –Different PWMs for the same TF. –Similar PWMs for TFs from the same family. Filtering strategy: –For two PWMs that tend to co-occur in a very small window (4bp), remove the less enriched one.

Human Cell Cycle 16 enriched PWMs 1089 modules 336 genes, Whitfield et al significant modules 5 coherently expressed E2F, NFY, CREB…

Human Cell Cycle DELTAEF1, EVI1, GR : 11 genes, p=0.01

Validation on a known module NFAT-AP1: –10 known genes containing multiple regulatory elements. In all NFAT is upstream of AP1. –CREME reported the correct module only (p=0.01). –CREME correctly identified the correct orientation of the TFBS. –The module was identified even after adding 10 random promoters to the gene set.

Colleagues and collaborators Lawrence Livermore National Laboratory UC, Berkeley Stanford Lawrence Berkeley National Laboratory Pennsylvania State University Gaby Loots Lisa Stubbs Roded Sharan Asa Ben-Hur Ross HardisonWebb Miller Marcelo Nobrega Dario Boffelli Sha Hammond