Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein databases Henrik Nielsen

Similar presentations


Presentation on theme: "Protein databases Henrik Nielsen"— Presentation transcript:

1 Protein databases Henrik Nielsen

2 Protein databases, historical background
Swiss-Prot, Established in 1986 in Switzerland ExPASy (Expert Protein Analysis System) Swiss Institute of Bioinformatics (SIB) and European Bioinformatics Institute (EBI) PIR, Established in 1984 National Biomedical Research Foundation, Georgetown University, USA In 2002 merged into: UniProt, A collaboration between SIB, EBI and Georgetown University.

3 UniProt UniProt Knowledgebase (UniProtKB)
UniProt Reference Clusters (UniRef) UniProt Archive (UniParc) UniProt Knowledgebase Release 2016_01 (20-Jan-16) consists of: UniProtKB/Swiss-Prot: Annotated manually (curated) 550,299 entries UniProtKB/TrEMBL: Computer annotated 59,718,159 entries

4 Types of databases GenBank / EMBL / DDBJ: Swiss-Prot: TrEMBL:
Entries created & maintained by individual contributors No check for redundancy Swiss-Prot: Entries created & maintained by staff Better standards compliance TrEMBL: Entries created by automatic translation of EMBL sequences & annotations

5 Growth of UniProt TrEMBL Swiss-Prot

6 Content of UniProt Knowledgebase
Amino acid sequences Functional and structural annotations Function / activity Secondary structure Subcellular location Mutations, phenotypes Post-translational modifications Origin organism: Species, subspecies; classification tissue References Cross references

7 Amino acid sequences From where do you get amino acid sequences? Translation of nucleotide sequences (GenBank/EMBL/DDBJ) Direct amino acid sequencing: Edman degradation Mass spectrometry 3D-structures

8 Content of UniProt Knowledgebase
Amino acid sequences Functional and structural annotations Function / activity Secondary structure Subcellular location Mutations, phenotypes Post-translational modifications Origin organism: Species, subspecies; classification tissue References Cross references

9 Protein structure Primary structure: Amino acid sequence
Secondary structure: ”Backbone” hydrogen bonding Alpha helix / Beta sheet / Turn Tertiary structure: Fold, 3D coordinates Quaternary structure: subunits

10 Content of UniProt Knowledgebase
Amino acid sequences Functional and structural annotations Function / activity Secondary structure Subcellular location Mutations, phenotypes Post-translational modifications Origin organism: Species, subspecies; classification tissue References Cross references

11 Subcellular location / protein sorting
Der er sorteringssignaler I animosyresekvensen. Vi skal se nærmere på det bedst kendte og vigtigste. Various proteins belong to different compartments of the cell – some even belong outside the cell.

12 Content of UniProt Knowledgebase
Amino acid sequences Functional and structural annotations Function / activity Secondary structure Subcellular location Mutations, phenotypes Post-translational modifications Origin organism: Species, subspecies; classification tissue References Cross references

13 Post-translational modifications
Many proteins are modified after they have been synthesized in order to become functional. Proteolysis: Cleavage of signal peptides, propeptides or initiator methionine. Glycosylation: Especially common on the cell surface. Plays a role in sorting of proteins to lysosomes. Phosphorylation: Often reversible. Regulates the activity of many enzymes. Der er også f.eks. Acetylering af histoner…

14 More post-translational modifications
Lipid anchors (e.g. GPI anchors) Disulfide bonds Prosthetic groups (e.g. metal ions)

15 UniProt entry, formatted view
Entry name (ID) Accession #

16 Entry names and accession numbers
Entry name (UniProt ID / GenBank LOCUS) Provides a mnemonic identifier for a database entry. One and only one name per entry. Accession # Provides a stable identifier for a database entry (does not change across database versions). One or more accession numbers per entry.

17 UniProt entry, formatted view

18 UniProt entry, text view (flat file)

19 UniProt entry, formatted view

20 Entry information, formatted view

21 UniProt entry, text view (flat file)

22 UniProt entry, formatted view

23 Names & Taxonomy, formatted view

24 Comments (CC lines)

25 Comments (CC lines), continued

26 Feature table (FT lines)

27 Gene Ontology (GO)

28 Secondary structure (Feature Table)

29 Evidence (Comments, Feature Table)
Experimental: Predicted: By similarity:

30 Evidence types in UniProt
Used in Swiss-Prot Used in TrEMBL See also

31 UniProt entry, sequence(s)

32 Cross-references, nucleotide sequences

33 Cross-references, 3D structure

34 Cross-references Other databases linked from UniProt
(there are ~100 in total): Nucleotide sequences 3D structure Protein-protein interactions Enzymatic activities and pathways Gene expression (microarrays and 2D-PAGE) Ontologies Families and domains Organism specific databases


Download ppt "Protein databases Henrik Nielsen"

Similar presentations


Ads by Google