Group discussion Name this protein. Protein sequence, from Aedes aegypti automated annotation >25558.m01330 MIHVQQMQVSSPVSSADGFIGQLFRVILKRQGSPDKGLICKIPPLSAARREQFDASLMFE.

Slides:



Advertisements
Similar presentations
Prokaryotic Annotation at TIGR Michelle Gwinn Giglio June, 2005.
Advertisements

1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Pfam(Protein families )
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
Profiles for Sequences
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Corrections. N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Sequence/Structure Alignment Resources from NCBI Steve Bryant Protein Data Bank Rutgers University November 19, 2005.
BLAST.
Proteomics: Analyzing proteins space. Protein families Why proteins? Shift of interest from “Genomics” to “Proteomics” Classification of proteins to groups/families.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop May 16, 2012.
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Tutorial 4 Substitution matrices and PSI-BLAST 1.
Motif discovery and Protein Databases Tutorial 5.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Protein families, domains and motifs in functional prediction May 31, 2016.
What is BLAST? Basic BLAST search What is BLAST?
Protein families, domains and motifs in functional prediction
Protein Families, Motifs & Domains.
Functional manual annotation including GO
Basics of BLAST Basic BLAST Search - What is BLAST?
Demo: Protein Information Resource
Sequence based searches:
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Functional Annotation Final Results
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Dr Tan Tin Wee Director Bioinformatics Centre
Identify D. melanogaster ortholog
Basic Local Alignment Search Tool
Ion Channels and Synaptic Organization
Hands-on: Reviewing BLAST
Basic Local Alignment Search Tool
A family of mammalian F-box proteins
Presentation transcript:

Group discussion Name this protein

Protein sequence, from Aedes aegypti automated annotation >25558.m01330 MIHVQQMQVSSPVSSADGFIGQLFRVILKRQGSPDKGLICKIPPLSAARREQFDASLMFE REPWFYERILPEFEAYQRAKGVGERDGFVAYAKSYVAYYKSMDQGTVLVMEDLGRRGFRM YNKLLPLDYNHVKLAMVQLGRFHAVSFAMKRDRQKVYESLKVSNPIVEMFRKNRYCRFLV SESLKLALEIPGLTEMERTVLNQLKDNVLAEFEACLDIGQAEPYTAIVHNDCWINNCMFS YEEDGLHPKELILIDWQLGCCAAPAVELIYLFYLCTDTQFRAKHFEEMVQLYHQSFGILL RKLGGDSDVDYPYEVLKKQLRRLGRYGVMMGSFLVPTMCIPSEDLPNLDESAARQKSTDQ YELPYKLDEKSLPVYQERMLGVIRDAIKFGCFDL Name this protein.

The output of BLASTP against NR  Lots of full-length hits

Top hits Here is a good illustration of the danger of the transitive proliferation of names from annotations to future annotations. In this case, the Aedes aegypti protein sequences, recently released by TIGR, are in Genbank. Our query sequence is one of them. Therefore, many of the hits we first encounter in our Blast output belong to this release. The top hit is our sequence to itself.

Searching for a characterized hit  We go through all of the hits to find a characterized match. Here it is: (next page)

First characterized match

Exploring the best characterized match

HMM Hit is to PF02958, Domain of unknown function (DUF227)  Total score:  Trusted cutoff:  Noise cutoff:  Total expect: 7e-74 The alignment looks good…

HMM details

Interpro

SMART protein signature

SignalP There is no signal peptide in our sequence.

TargetP  No clear targeting sequence

TmHMM

Results  Blast:  HMM:  Interpro:  SignalP  TargetP  TmHMM: How would you name this protein? Why?

Discussion  Blast: 42% similarity over 72% of length to a studied protein, Juvenile hormone-inducible protein 26 from Drosophila melanogaster, another Diptera (insect). Its function is not known.  HMM: very strong hit to PF02958, Domain of unknown function (DUF227), a domain found in insects and C. elegans. A member of the Protein kinase superfamily.  Interpro: IPR Protein of unknown function DUF227. This family includes proteins of unknown function. All known members of this group are proteins from drosophila and Caenorhabditis elegans. Caenorhabditis elegans  SMART: SM00587 CHK. ZnF_C4 abd HLH domain containing kinases domain. A subfamily of choline kinases.  SignalP: No signal peptide.  TargetP: No clear targeting sequence.  TmHMM: None. Discussion: The 42% similarity over 72% of the length of this protein is marginal. It is not definitive enough to name the protein “Juvenile hormone-inducible protein” by our standards. However, depending on the standards of your project, you might append the word “putative” to it: “Juvenile hormone-inducible protein, putative”—but a conservative call would be “conserved hypothetical protein.” Another option is to call it protein kinase, putative. Discuss which of these you believe is most supported.