http://www.pdb.org/ 1
Experimental approaches for structural biology X-ray crystallography NMR cryoEM
cryoEM
Where to get structural data? biological molecules PDB – Protein Data Bank http://www.pdb.org free NDB – Nucleic Data Bank http://ndbserver.rutgers.edu/ organic molecules CSD – Cambridge Structural Database paid
PDB History 1957 Myoglobin structure determined 1970’s Discussions how to establish an archive of protein structures PDB established at Brookhaven Oct 1971, 7 structures 1980’s Technology takes off molecular biology, instrumentation, computer hardware and software Number of structures increases Structural biology is able to focus on medical problems IUCr requires data deposition to the PDB 1990’s Complexity of structures increases Structural genomics begins 5
Current state of the PDB 23. 11. 2014 – 105 025 structures in the PDB archive 8 550 new structures deposited in 2014 so far Depositions by macromolecule type 92.6 % Proteins (97 089 structures) 2.8 % Nucleic acids (2 769 structures) 4.5 % Protein-nucleic acid complexes (5 143 structures) Depositions by experimental technique: 88.0% x-ray diffraction (93 200 structures) 11.2% solution NMR (10 705 structures) 0.5% cryo-EM (864 structures) data z http://www.pdb.org/pdb/statistics/holdings.do http://www.pdb.org/pdb/static.do?p=general_information/pdb_statistics/index.html data as of 26. 11. 2012 http://www.pdb.org/pdb/static.do?p=general_information/pdb_statistics/index.html 6
from http://www. pdb. org/pdb/static. do from http://www.pdb.org/pdb/static.do?p=general_information/pdb_statistics/index.html - Growth of Released Structures Per Year
PDB ID Each structure in the PDB is represented by a 4 character identifier of the form [0-9][a-z,0-9][a-z,0-9][a-z,0-9] 1B3T
Data formats of PDB PDB format, mmCIF (and derived xml PDBML) Friday, April 14, 2017November 8, 1999 Protein Data Bank Doolittle Building, Rutgers University Data formats of PDB PDB format, mmCIF (and derived xml PDBML) Dictionary resources at: http://mmcif.pdb.org/ mmCIF is the PDB archival format all data released in all three formats from Genes to Drugs via PDB 9
PDB Format legacy format Friday, April 14, 2017November 8, 1999 Protein Data Bank Doolittle Building, Rutgers University PDB Format legacy format http://www.wwpdb.org/docs.html fortran-like 80 column-wide not structured enough to describe complicated 3D objects its limits have been broken several times 99,999 atoms, 34 (or 58) chains readable by most programs from Genes to Drugs via PDB 10
model – chain – residue – atom
mmCIF language based on community-agreed definitions allows adding new features and customization mmCIF categories are easily transformed to database tables not designed to be read by humans, data should be viewed through programs and databases http://ich.vscht.cz/~cechp/mmcif/
Pubmed, MEDLINE, Entrez etc. http://www.pubmed.gov http://www.pubmed.org
NCBI National Institute of Health (NIH) – U. S. government National Library of Medicine (NLM) National Centre for Biotechnology Information (NCBI)
NCBI (founded 1988, http://www.ncbi.nlm.nih.gov/) Genomic sequences - GenBank – open access annotated collection of all available nucleotide sequences, doubles each 18 months (October 2008 – 97 381 682 336 bp), new release every 2 months, accession number (U49845) required upon publication OMIM – Online Mendelian Inheritance in Man, db of diseases together with their genetic components PubChem (http://pubchem.ncbi.nlm.nih.gov/) – db of small organic molecules, includes the information about their bioacivities Entrez (http://www.ncbi.nlm.nih.gov/sites/gquery) – federated search engine offering unified access to all NCBI databases federovany znamena, ze prohledava vice db
MEDLINE journal citations and abstracts for biomedical literature since 1996 - free access to MEDLINE via PubMed. PubMed - Web-based retrieval system developed by the NCBI at the NLM. It is part of NCBI's Entrez. PubMed contains abstracts links to full-text articles links to other databases …and much more
What’s in Pubmed Most PubMed records are MEDLINE citations. citations and author abstracts from approx. 5 200 biomedical journals diverse topics: microbiology, delivery of health care, nutrition, pharmacology and environmental health. currently over 19 million references dating back to 1948 new material added Tuesday through Saturday about 90% records are from English-language sources or have English abstracts Approximately 79% of the citations are included with the published abstract
What’s in Pubmed Pubmed Central (PMC) NCBI Bookshelf http://www.pubmedcentral.nih.gov/ db of free full texts since 2007 paper funded by NIH must be freely available through PMC no later tha 12 month since publishing NCBI Bookshelf http://www.ncbi.nlm.nih.gov/sites/entrez?db=books free biomedical books (biochemistry, molecular biology, …)