Corrections. N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)

Slides:



Advertisements
Similar presentations
Analysis of Biomolecular Sequences 29/01/2015 Mail: Prof. Neri Niccolai Simone Gardini
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Pfam(Protein families )
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Fa05CSE 182 CSE182-L5: Position specific scoring matrices Regular Expression Matching Protein Domains.
Matching Problems in Bioinformatics Charles Yan Fall 2008.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Single Motif Charles Yan Spring Single Motif.
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
UniProt - The Universal Protein Resource
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Protein Sequence Analysis Part II
Tunis, March 2007 A. Auchincloss UniProtKB and ExPASy 1 Practical exercises Answers…
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Discover the UniProt Blast tool. Murcia, February, 2011Protein Sequence Databases Customize the BLAST results.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Motif discovery and Protein Databases Tutorial 5.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Domain Database
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Group discussion Name this protein. Protein sequence, from Aedes aegypti automated annotation >25558.m01330 MIHVQQMQVSSPVSSADGFIGQLFRVILKRQGSPDKGLICKIPPLSAARREQFDASLMFE.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Protein families, domains and motifs in functional prediction May 31, 2016.
Sequence similarity, BLAST alignments & multiple sequence alignments
Protein families, domains and motifs in functional prediction
Bio/Chem-informatics
Demo: Protein Information Resource
Sequence based searches:
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
BLAST.
Presentation transcript:

Corrections

N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)

Query: annotation:(type:carbohyd "N-linked (GlcNAc...)" confidence:experimental) reviewed:yes

Taxonomic distribution

TPNLINDTME

Multiple alignment (ClustalW) -[LAPIQ]-N-[HAYRCS]-[ST]-[KLESGM]

N-glycosylation does not occur in Bacteria: …false positive !

301 protein (within the set of 1000 proteins) are N-glycosylated according to the UniProtKB annotation…!

Scan Prosite with the official pattern The official pattern also match with bacteria sequences (false positives)

PRATT pattern with 20 sequences D-K-T-G-T-[IL]-T-x(3)-[ILMV]-x-[FILV]

AT31_HUMAN: SIMILARITY: Belongs to the cation transport ATPase (P-type) family. Type V subfamily. The pattern is a discriminator for ATP ase family (Cation-transporting )

C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H

Pattern scan

The pattern missed some Zn finger in the same protein i.e. Q24174 Pattern Profile Not found with the pattern

The pattern: C - X(2,4) - C - X(3) - [LIVMFYWC] - X(8) - H - X(3,5) – H Should includes: YRCVLCGTVAKSRNSLHSHMSrQHRGIST C-X(2,4)-C-X(3)-[LIVMFYWCA]-X(8)-H-X(3,5)-H

Yes ! But: The pattern becomes less restrictive. You get more sequences which should not be here. (As the results are limited to 1000, the number of hits is not the same…)

Discriminators (Signatures, descriptors) for the Zinc finger C2H2 type domain can be found in Prosite (Pattern and Profile) and Pfam (HMM)

Step 1: scan UniProtKB/Swiss-Prot with the pattern Use the ‘scanprosite’ tool at

Step 2: Retrieve the matched human UniProt (go at the end of the Scan Prosite result page: click on ‘Matched UniProtKB entries’)

Step 3: Retrieve the sequences annotated as being ‘phosphorylated on a Thr’

-> 19 candidates to be manually checked …. Step 3: Retrieve the sequences annotated as being ‘phosphorylated on a Thr’

InterPro scan results

InterPro : other shema (Graphical view from UniProtKB)

InterPro shema PFAM Graphical view

Prosite Graphical view

NCBI against Swiss-Prot NCBI: Color key for alignment scores

NCBI Swiss-Prot does not contain the alternative sequences (i.e. P ) – !! NCBI gives the ‘version number’ of the Swiss-Prot sequence (i.e. Q8BU25.2)….

UniProt: Color code for identity scores (not alignment !)

ProDom database List of proteins sharing at least a common domain…

1) BLAST at

2) PROSITE tools

You are lucky: domains are rarely not annotated in the different domain/family databases !

3) Construct a profile with My hits at SIB Use PSI Blast

Do a PSI BLAST against UniProtKB

Select sequence with a E value > and do a second cycle

Look at the MSA

Construct a profile with the MSA

The profile

The profile hits

Construct a HMM with the MSA

The HMM

The HMM hits

- Look at the Goloco data in InterPro. How many proteins (and/or hits) are found by the different methods ?

According to InterPro: Goloco domain is described by at least one of the different methods (PFAM, Prosite, Smart) PFAM: 167 proteins Prosite: 192 proteins SMART: 1 proteins These different numbers are the consequence of the interval between the different releases of the different databases (including the sequence databases (UniProtKB). It may also be due to the different methods used (HMM, profile…)

Look for the HMM for the Goloco domain in PFAM

Download the HMM matrix

the HMM matrix