Presentation on theme: "Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase."— Presentation transcript:
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase
iProClass Text Search Search! Select field Ways to get to iProClass text search Add (+)/delete (-) input boxes Search tips: 1- Use “not null” or “null” to search entries that “contain” or “do not contain” information in the selected search field, respectively. In the present example, we want to search for proteins that have enzymatic activity corresponding to EC 18.104.22.168 and have 3D structure (PDB ID not null). 2- Use and/or/not logical operators
iProClass Text Search Result (I) Things you can do from the result table: 1.Add search terms or start over 1 2 3 4 5 2. Customize the table columns 3. Save your results as table or FASTA format 4. Select entries using check boxes and perform analysis using tool bar options 5. Links to protein records, to protein names (BioThesaurus), to protein families (PIRSF)
iProClass Text Search Result (II) 2. How to customize the table columns: Display PDB ID column a- Select PDB ID in the “Fields not in display” box c- Now PDB ID should be in the “Field in display”. Press apply button for the changes to take place. b- Use the > to add item into the “Fields in display” box
iProClass Text Search Result (III) 3. Save your results as table or FASTA format a- Select Entries using check boxes in the Protein AC/ID column. To select all, check the box in the column heading. b- Click on “Save Result As: Table” to store the information in the result table. This file can be opened in Excel as shown below. c- Click on FASTA to save protein sequences.
iProClass Text Search Result (IV) a- Select Entries using check boxes in the Protein AC/ID column. To select all, check the box in the column heading. Then select tool, e.g., Domain Display 4. Select entries using checkboxes and perform analysis using tool bar options Domain Display shows Pfam domains present in the proteins selected
iProClass Text Search Result (V) 5. Links to protein records, to protein names (BioThesaurus), to protein families (PIRSF) Link to protein reports Link to protein names Link to taxonomy Link to PIRSF report Link to pre-computed BLAST
iProClass Protein Report (I) pre-computed BLAST Rich links & extensive cross-references Shows ID correspondence to other databases See protein synonyms
iProClass Protein Report (II) Integrated added-value information from other databases
iProClass Protein Report (III) Links to different protein family classification databases Interactive Domain and Sequence Display
iProClass Text Search Result (VII) See protein synonyms and the source attribution
iProClass Text Search Result (VII) Related Sequences (pre-computed BLAST) show proteins similar to the query, significantly faster than running BLAST in real time, and may also evidence tight protein clusters (related sequence number low).
Batch Retrieval in iProClass If possible, specify the type of ID 3979833 304131 24660393 Due to the diversity of databases and the lack of consistency in protein/gene names and/or identifiers in the literature, it can be difficult to retrieve multiple entries when protein and gene identifiers come from different sources. The batch retrieval tool overcomes this problem and provides high flexibility, allowing the retrieval of multiple entries from the iProClass database by selecting a specific identifier or a combination of them.
Batch Retrieval Result Page Choose columns to be displayed Retrieve more sequences Links to iProClass and UniProtKB reports
Search a Pattern in iProClass A pattern is a formula (regular expression) that represents the conserved region of a group of related proteins. PROSITE is a database that contains patterns and profiles specific for more than a thousand protein families or domains. Pattern search at PIR allows: 1- The search of a specific PROSITE or user-defined pattern against one of the following sequence database: (i) UniProtKB is the central hub for the collection of functional information on proteins, with accurate, consistent, and rich annotation. It consists of two sections: a section containing manually-annotated records (UniProtKB/Swiss-Prot), and a section with computationally analyzed records that await full manual annotation (UniProtKB/TrEMBL). (ii) A subset of UniProtKB entries belonging to a certain organism or taxon group. (iii) UniRef100 provides clustered sets of sequences at 100% identity from UniProtKB (including splice variants and isoforms) and selected UniParc records.user-defined patternUniProtKB UniRef100 P-D-x(2)-H-[DE]-[LIVF]-[LIVMFY]-G-H-[LIVMC]-[PA] Enter pattern Enter PROSITE ID
Sequence range where pattern is found Display the query pattern Search a Pattern Result in iProClass
Search a Pattern in iProClass Pattern search at PIR allows: 2- The search of PROSITE patterns (note that profiles are excluded) in a query sequence, entering the single amino acid code sequence or its unique ID. MNDRADFVVPDITTRKNVGLSHDANDFTLPQPLDRYS AEDHATWATLYQRQCKLLPGRACDEFMEGLERLEVD Enter sequence Enter ID Link to PROSITE documentation
Protein ID Mapping Service Maps between UniProtKB and more than 30 other data sources to support data interoperability among disparate data sources and to allow integration and querying of data from heterogeneous molecular biology databases. Enter IDs Load file with ID list
Protein ID Mapping Service Example: we want to obtain a list of Entrez Gene IDs for a group of UniProtKB proteins Enter IDs IDs can be cut and pasted if needed or saved as a text file using the "save as" option provided by your web browser. P04176 P16331 P00439 P17276 Select ID type for source database Select ID type for target database Mapping
iProClass Protein Knowledgebase The iProClass Integrated database for protein functional analysis The iProClass Integrated database for protein functional analysis Wu CH, Huang H, Nikolskaya A, Hu Z, Yeh LS, Barker WC. Computational Biology and Chemistry, 28: 87-96, 2004. iProClass is freely available for academic institutions. Vendors and commercial entities who want to use and/or redistribute iProClass need to contact PIR to request a license (firstname.lastname@example.org).email@example.com Cite iProClass: iProClass Distribution: