Mutiple Motifs Charles Yan Spring 2006. 2 Mutiple Motifs.

Slides:



Advertisements
Similar presentations
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Profiles for Sequences
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Matching Problems in Bioinformatics Charles Yan Fall 2008.
Corrections. N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
The Protein Data Bank (PDB)
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Single Motif Charles Yan Spring Single Motif.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Classification A comparison of function inference techniques.
Proteomics: Analyzing proteins space. Protein families Why proteins? Shift of interest from “Genomics” to “Proteomics” Classification of proteins to groups/families.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Teresa K.Attwood School of Biological Sciences University of Manchester, Oxford Road Manchester M13 9PT, UK Bioinformatics:
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Protein families, domains and motifs in functional prediction May 31, 2016.
Protein families, domains and motifs in functional prediction
Bio/Chem-informatics
Protein Families, Motifs & Domains.
Sequence based searches:
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Sequence Based Analysis Tutorial
BLAST.
A brief on: Domain Families & Classification
Protein domains Jasmin sutkovic
MULTIPLE SEQUENCE ALIGNMENT
PROTEIN PATTERN DATABASES
A brief on: Domain Families & Classification
Presentation transcript:

Mutiple Motifs Charles Yan Spring 2006

2 Mutiple Motifs

3 From Single Motif to Multiple Motifs One single motif is not sufficent to discriminate a protein family. Multiple motifs have stronger discriminating power.

4 Multiple Motifs Protein function prediction using multiple motifs Each protein family is characterized by a set of motifs (in stead of a single one). If a protein contain a set of motifs, it probably belong to the family that the set of motifs correspond to.

5 PRINTS PRINTS ( ) is a database of protein fingerprints. A fingerprint is a group of conserved motifs used to characterize a protein family; ftp.bioinf.man.ac.uk/pub/prints PRINTS is now maintained at the University of Manchester PRINTS VERSION 38.0 (16 June, 2005) 1900 FINGERPRINTS, encoding 11,435 single motifs

6 PRINTS Each fingerprint has been defined and iteratively refined using database SWISS-PROT/TrEMBL composite. Two types of fingerprint are represented in the database, i.e. they are either simple or composite, depending on their complexity: simple fingerprints are essentially single-motifs; while composite fingerprints encode multiple motifs. The bulk of the database entries are of the latter type because discrimination power is greater for multi-component searches. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbors.

7 PRINTS A motif is a conserved element corresponding to a region whose function or structure is known. It is likely to be predictive of any subsequent occurrence of such a structural/functional region in any other protein sequence. A motif is represented as a conserved alignment of multiple sequence. A fingerprint is a set of motifs used to predict the occurrence of similar motifs, either in an individual sequence.

8 PRINTS

9 The starting point is a multiple sequence alignment of a small number of sequences Once a motif, or set of motifs, has been identified, the conserved regions are excised in the form of local alignments The motif/s are used to scan against the database Only those sequences that match with all motifs are regarded as true matches The additional sequence data from the new true set is then used to generate another set of aligned motifs, and the database is searched again Until converge

10 PRINTS

11 PRINTS

12 PRINTS a) General field

13 PRINTS b) Summary field A good fingerprint should exhibit a clear discrimination cut-off, i.e. shows all true positives matching with all n motifs, perhaps some noise, and few or no matches at intermediate positions of the summary table.

14 PRINTS Motif name Iteration number PCODE: the protein identification codes of the initial sequences ST: the location of the motifs within those sequences, INT: and the interval between adjacent motifs. for the first motif, this is simply the distance from the beginning of the sequence to the start of the motif.

15 PRINTS

16 PRINTS FPScan Submitting a PROTEIN sequence find the closest matching PRINTS fingerprint/s.

17 PRINTS

18 PRINTS

19 PRINTS

20 PRINTS

21 PRINTS GRAPHScan A graphical view of the result of a scan of a fingerprint against a sequence. Matching motifs are highlighted if they score above the threshold % identity

22 PRINTS

23 PRINTS

24 PRINTS MULScan This facility allows multiple sequences to be scanned against the database, Results are returned via .

25 Related Projects InterPro - Integrated Resources of Proteins Domains and Functional Sites InterPro BLOCKS - BLOCKS db BLOCKS Pfam - Protein families db (HMM derived) [Mirror at St. Louis (USA)] PfamSt. Louis (USA) PRINTS - Protein Motif fingerprint db PRINTS ProDom - Protein domain db (Automatically generated) ProDom PROTOMAP - An automatic hierarchical classification of Swiss-Prot proteins PROTOMAP SBASE - SBASE domain db SBASE SMART - Simple Modular Architecture Research Tool SMART TIGRFAMs - TIGR protein families db TIGRFAMs