Download presentation
Presentation is loading. Please wait.
1
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP
2
Protein databases Genpept – protein sequence database translated from GenBank UniProtKB/TrEMBL – is a computer-annotated protein sequence database complementing the UniProtKB/Swiss-Prot Protein Knowledgebase. UniProtKB/Swiss-Prot – is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and a high level of integration with other databases. AG-ICB-USP
3
How to assign protein functions? Similar proteins may share common functions, but… proteins that share common domains may have evolved to perform distinct functions Proteins that exert similar function may share common domains, but… domain sequences are not always very similar – more refined are requires than simply similarity searches Proteins may share common domains, but have different architectures – no single domain are necessarily involved with protein function. Many proteins use multiple domains to perform their activities AG-ICB-USP
4
Some conclusions Similarity searches may reveal proteins that share very similar sequences and functions – high similarity over the full length of the query sequence An output with no significant hits or with hits to unannotated proteins will no unravel the possible function of the query protein Similarity searches do not differentiate orthologues from paralogues When matching multidomain proteins, it may not be appropriate to transfer the functional annotation – the context is important! AG-ICB-USP
5
So what do proteins with similar function have in common? AG-ICB-USP
6
residues, motifs, domains, architecture… AG-ICB-USP
7
Pattern databases Databases that contain patterns of residue conservation within groups of related sequences There are several methods to determine patterns There are many different pattern databases AG-ICB-USP
8
Pattern databases AG-ICB-USP
9
Common protein pattern databases AG-ICB-USP Prosite patterns – regular expressions Prosite profiles – weight matrices (profiles) Pfam – database of protein domain families. Contains curated multiple sequence alignments for each family and corresponding HMMs Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource
10
How to start building a pattern database? AG-ICB-USP Prosite patterns – regular expressions Prosite profiles – weight matrices (profiles) Pfam – database of protein domain families. Contains curated multiple sequence alignments for each family and corresponding HMMs Prints – database of groupf of motifs that in the context of being together, are more potent for assign protein function Prodom – automatedly generated databases based on a recursive use of PSI-BLAST similarity searches Interpro – an integrated databaes that combines different protein signature recognition methods in one single resource
11
How to start building a pattern database? AG-ICB-USP
12
How to start building a pattern database? AG-ICB-USP With multiple sequence alignments of functionally related proteins
13
Some definitions AG-ICB-USP Protein motif – a single conserved region Prosite pattern – a consensus expression of a conserved region Frequency matrices (PRINTS) – matrices that contain the frequencies in which residures occur in a given motif PSSM – position specific score (weight) matrices (BLOCKS) –add a scoring scheme to the frequency matrices HMMs profiles – probabilistic models derived from alignment profiles Protein domain - is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain.
14
AG-ICB-USP
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.