InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Slides:



Advertisements
Similar presentations
On line (DNA and amino acid) Sequence Information Lecture 7.
Advertisements

Pfam(Protein families )
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Archives and Information Retrieval
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2.
UCSC Genome Browser Tutorial
Matching Problems in Bioinformatics Charles Yan Fall 2008.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Genome Browsing with the UCSC Genome Browser
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
On line (DNA and amino acid) Sequence Information
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 1.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
The UCSC Genome Browser Introduction
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genomics and Personalized Care in Health Systems Lecture 5 Genome Browser Leming Zhou, PhD School of Health and Rehabilitation Sciences Department of Health.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
Protein Domain Database
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Pfam: multiple sequence alignments and HMM-profiles of protein domains
UniProt: Universal Protein Resource
Genome Annotation Continued
Visualization of genomic data
Genome Center of Wisconsin, UW-Madison
Ensembl Genome Repository.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

InterPro/prosite UCSC Genome Browser Exercise 3

Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The challenge is to turn this raw data into biological knowledge  A valuable tool for this challenge is an automated diagnostic pipe through which newly determined sequences can be streamlined

From sequence to function  Nature tends to innovate rather than invent  Proteins are composed of functional elements: domains and motifs Domains are structural units that carry out a certain function Domains are structural units that carry out a certain function The same domains are The same domains are shared between different proteins Motifs are shorter Motifs are shorter sequences with certain biological activity

InterPro  An integrated documentation resource for protein families, domains and sites  Groups signatures describing the same protein family or domain  Combines a number of databases that use different methodologies to derive protein signature: UniProt: UniProtKB Swiss-Prot, TrEMBL, UniRef,UniParc UniProt: UniProtKB Swiss-Prot, TrEMBL, UniRef,UniParc prosite: documented DB on domains, families and functional sites. prosite: documented DB on domains, families and functional sites. Pfam: a DB of protein families represented by MSAs Pfam: a DB of protein families represented by MSAs

Member databases  Sequence-motif methods: Protein signature DBs with different focus Protein signature DBs with different focus  Sequence-cluster methods: Hierarchically clustered sequence/structure DBs Hierarchically clustered sequence/structure DBs

InterPro search

prosite  A method for determining the function of uncharacterized translated protein sequences  Consists of a DB of annotated biologically important sites/patterns/motifs/signature/fingerprints

prosite  Entries are represented with patterns or profiles pattern A.1000T C G profile [AC]-A-[GC]-T-[TC]-[GC] Profiles are used in prosite when the motif is relatively divergent, and it is difficult to represent as a pattern

Scanning prosite Query: sequence Query: pattern Result: all patterns found in sequence Result: all sequences which adhere to this pattern

Patterns with a high probability of occurrence  Entries describing commonly found post- translational modifications or compositionally biased regions.  Found in the majority of known protein sequences  High probability of occurrence

prosite sequence query

prosite pattern query

UCSC Genome Browser

Reset all settings of previous user UCSC Genome Browser - Gateway

UCSC Genome Browser query results

UCSC Genome Browser Annotation tracks Vertebrate conservation mRNA (GenBank) RefSeq UCSC Genes Base position Single species compared SNPs Repeats Gene Direction Exon Intron UTR

USCS Gene

UCSC Genome Browser - movement Zoom x3 + Center

UCSC Genome Browser – Base view

Annotation track options dense squish full pack

Annotation track options Another option to toggle between ‘pack’ and ‘dense’ view is to click on the track title Sickle-cell anemia distr. Malaria distr.

BLAT  BLAT = Blast-Like Alignment Tool  BLAT is designed to find similarity of >95% on DNA, >80% for protein  Rapid search by indexing entire genome. Good for: 1. Finding genomic coordinates of cDNA 2. Determining exons/introns 3. Finding human (or chimp, dog, cow…) homologs of another vertebrate sequence

BLAT on UCSC Genome Browser

BLAT Results

Match Non-Match (mismatch/indel) Indel boundaries

BLAT Results

BLAT Results on the browser

Getting DNA sequence of region