The.pdb file format, and other resources for structural information Topic 5 Chapter 10 & 11, Du and Bourne “Structural Bioinformatics”
PDB: Protein Data Bank
Experimental Methods: X-Ray and NMR X-ray crystallography -- Need crystals -- No size limit (in principle) -- Based on the scattering of the electron cloud of the atoms -- Quality metrics = resolution and R-factor Nuclear Magnetic Resonance spectroscopy (NMR) -- Solution-based -- Typical size limitation (< 50K) -- Produces multiple models -- Not just for determining structure (dynamics) -- “Resolution”: root-mean-square-deviation (RMSD)
Quality: Resolution (in Å) and R-factor (values = 0 to 1). Atom coordinates: Define the mean coordinates of the (heavy) atoms. B-factors (aka, temperature factors): Describes the apparent disorder about the mean. Disorder is spatial (crystal heterogeneity) and temporal (protein flexibility). However, in reality, B-factors are in protein crystallography are NOT pure Debye-Waller factors (mobilities). Instead, B- factors are most often best characterized as “fudge factors” uses to fit the electron density maps. Occupancies: Occasionally, a better fit to the electron density can often by obtained by assuming that certain atoms can be in more than one location, due to alternate conformations. Important experimental quantities from X-ray
HEADER OXIDOREDUCTASE 21-JUL-93 1SPD TITLE AMYOTROPHIC LATERAL SCLEROSIS AND STRUCTURAL DEFECTS IN CU,ZN TITLE 2 SUPEROXIDE DISMUTASE COMPND MOL_ID: 1; COMPND 2 MOLECULE: SUPEROXIDE DISMUTASE; COMPND 3 CHAIN: A, B; COMPND 4 EC: ; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 ORGANISM_TAXID: 9606 KEYWDS OXIDOREDUCTASE, SUPEROXIDE ACCEPTOR EXPDTA X-RAY DIFFRACTION AUTHOR H.E.PARGE,J.A.TAINER REVDAT 4 29-FEB-12 1SPD 1 JRNL VERSN REVDAT 3 24-FEB-09 1SPD 1 VERSN REVDAT 2 01-APR-03 1SPD 1 JRNL REVDAT 1 30-APR-94 1SPD 0 JRNL AUTH H.X.DENG,A.HENTATI,J.A.TAINER,Z.IQBAL,A.CAYABYAB,W.Y.HUNG, JRNL AUTH 2 E.D.GETZOFF,P.HU,B.HERZFELDT,R.P.ROOS,C.WARNER,G.DENG, JRNL AUTH 3 E.SORIANO,C.SMYTH,H.E.PARGE,A.AHMED,A.D.ROSES,R.A.HALLEWELL, JRNL AUTH 4 M.A.PERICAK-VANCE,T.SIDDIQUE JRNL TITL AMYOTROPHIC LATERAL SCLEROSIS AND STRUCTURAL DEFECTS IN JRNL TITL 2 CU,ZN SUPEROXIDE DISMUTASE. JRNL REF SCIENCE V JRNL REFN ISSN JRNL PMID REMARK 2 REMARK 2 RESOLUTION ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : PROLSQ, X-PLOR REMARK 3 AUTHORS : KONNERT,HENDRICKSON,BRUNGER REMARK 3 The.pdb file
Resolution From Wikipedia: Resolution in terms of electron density is a measure of the resolvability in the electron density map of a molecule. In X-ray crystallography, resolution is the highest resolvable peak in the diffraction pattern.
Resolution in practice…
Resolution Histogram From Wikipedia: Resolution in terms of electron density is a measure of the resolvability in the electron density map of a molecule. In X-ray crystallography, resolution is the highest resolvable peak in the diffraction pattern.
R-factor (aka, residual factor or agreement factor) is a measure of the difference between the observed and computed intensities. Note that the structure factor F is related to intensities from the diffraction pattern. A similar quality criterion is R free, which is calculated from a subset (~10%) of reflections that were not included in the structure refinement. ||F obs | - |F calc || R = |F obs | R values: 0.6: Very bad 0.5: Bad 0.4: Recoverable 0.2: Good for Protein 0.05: Good for small organic models 0.0: Perfect R-Factor
PDB entry page (1mbo)
A good rule of thumb for defining an acceptability threshold is based on resolution and R- factor. A resolution of 2.0 Å or lower and a R-factor of 0.20 or lower is a commonly used threshold in structural bioinformatic analyses. It is important to remember though, that there is no such thing as a single structure. Proteins are best described by ensembles. In the past, NMR structures were considered to be of lower quality than x-ray structures. However, they are increasingly accepted, especially since the environmental conditions (solvent vs. liquid crystal) have been argued to be more biological. Unfortunately, there is no magic number that can be used to assess NMR structure quality, or lack thereof. Common rules of thumb
EXPDTA
REMARKs
REMARK 2
REMARK 3 REMARK 3 presents information on refinement program(s) used and related statistics. For non-diffraction studies, REMARK 3 is used to describe any refinement done, but its format is mostly free text.
REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 63 REMARK 290 REMARK 290 SYMOP SYMMETRY REMARK 290 NNNMMM OPERATOR REMARK X,Y,Z REMARK Y,X-Y,Z REMARK X+Y,-X,Z REMARK X,-Y,Z+1/2 REMARK Y,-X+Y,Z+1/2 REMARK X-Y,X,Z+1/2 REMARK 290 REMARK 290 WHERE NNN -> OPERATOR NUMBER REMARK 290 MMM -> TRANSLATION VECTOR REMARK 290 REMARK 290 CRYSTALLOGRAPHIC SYMMETRY TRANSFORMATIONS REMARK 290 THE FOLLOWING TRANSFORMATIONS OPERATE ON THE ATOM/HETATM REMARK 290 RECORDS IN THIS ENTRY TO PRODUCE CRYSTALLOGRAPHICALLY REMARK 290 RELATED MOLECULES. REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 SMTRY REMARK 290 REMARK 290 REMARK: NULL Symmetry operations
Unit cell and symmetry
REMARK 500 GEOMETRY AND STEREOCHEMISTRY REMARK 500 SUBTOPIC: COVALENT BOND ANGLES REMARK 500 REMARK 500 THE STEREOCHEMICAL PARAMETERS OF THE FOLLOWING RESIDUES REMARK 500 HAVE VALUES WHICH DEVIATE FROM EXPECTED VALUES BY MORE REMARK 500 THAN 6*RMSD (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 500 IDENTIFIER; SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). REMARK 500 REMARK 500 STANDARD TABLE: REMARK 500 FORMAT: (10X,I3,1X,A3,1X,A1,I4,A1,3(1X,A4,2X),12X,F5.1) REMARK 500 REMARK 500 EXPECTED VALUES PROTEIN: ENGH AND HUBER, 1999 REMARK 500 EXPECTED VALUES NUCLEIC ACID: CLOWNEY ET AL 1996 REMARK 500 REMARK 500 M RES CSSEQI ATM1 ATM2 ATM3 REMARK 500 ALA A 1 CB - CA - C ANGL. DEV. = -9.3 DEGREES REMARK 500 ALA A 1 CA - C - O ANGL. DEV. = 17.7 DEGREES REMARK 500 THR A 2 N - CA - CB ANGL. DEV. = DEGREES REMARK 500 THR A 2 OG1 - CB - CG2 ANGL. DEV. = 15.0 DEGREES REMARK 500 THR A 2 CA - CB - OG1 ANGL. DEV. = DEGREES REMARK 500 THR A 2 N - CA - C ANGL. DEV. = 17.4 DEGREES REMARK 500 ALA A 1 CA - C - N ANGL. DEV. = DEGREES REMARK 500 LYS A 3 CD - CE - NZ ANGL. DEV. = DEGREES REMARK 500 ALA A 4 CA - C - N ANGL. DEV. = DEGREES REMARK 500 ALA A 4 O - C - N ANGL. DEV. = 19.6 DEGREES REMARK 500 VAL A 5 C - N - CA ANGL. DEV. = DEGREES REMARK 500 CYS A 6 O - C - N ANGL. DEV. = 10.3 DEGREES Remarks contain all sorts of useful info
SEQRES 1 A 147 PRO LYS ALA LEU ILE VAL TYR GLY SER THR THR GLY ASN SEQRES 2 A 147 THR GLU TYR THR ALA GLU THR ILE ALA ARG GLU LEU ALA SEQRES 3 A 147 ASP ALA GLY TYR GLU VAL ASP SER ARG ASP ALA ALA SER SEQRES 4 A 147 VAL GLU ALA GLY GLY LEU PHE GLU GLY PHE ASP LEU VAL SEQRES 5 A 147 LEU LEU GLY CYS SER THR TRP ASN ASP ASP SER ILE GLU SEQRES 6 A 147 LEU GLN ASP ASP PHE ILE PRO LEU PHE ASP SER LEU GLU SEQRES 7 A 147 GLU THR GLY ALA GLN GLY ARG LYS VAL ALA CYS PHE GLY SEQRES 8 A 147 CYS GLY ASP SER SER TYR GLU TYR PHE CYS GLY ALA VAL SEQRES 9 A 147 ASP ALA ILE GLU GLU LYS LEU LYS ASN LEU GLY ALA GLU SEQRES 10 A 147 ILE VAL GLN ASP GLY LEU ARG ILE ASP GLY ASP PRO ARG SEQRES 11 A 147 ALA ALA ARG ASP ASP ILE VAL GLY TRP ALA HIS ASP VAL SEQRES 12 A 147 ARG GLY ALA ILE SEQRES
HET ACE A 0 3 HET ACE B 0 3 HET CU A HET ZN A HET CU B HET ZN B HETNAM ACE ACETYL GROUP HETNAM CU COPPER (II) ION HETNAM ZN ZINC ION FORMUL 1 ACE 2(C2 H4 O) FORMUL 3 CU 2(CU 2+) FORMUL 4 ZN 2(ZN 2+) HELIX 1 HA GLU A 133 THR A HELIX 2 HB GLU B 133 THR B SHEET 1 SA 9 ALA A 4 LYS A 9 0 SHEET 2 SA 9 GLN A 15 GLU A N PHE A 20 O ALA A 4 SHEET 3 SA 9 VAL A 29 LYS A 30 1 N LYS A 30 O GLU A 21 SHEET 4 SA 9 VAL A 94 ASP A SHEET 5 SA 9 GLY A 85 ALA A 89 1 SHEET 6 SA 9 GLY A 41 HIS A N HIS A 43 O VAL A 87 SHEET 7 SA 9 ARG A 115 HIS A N VAL A 118 O HIS A 46 SHEET 8 SA 9 CYS A 146 GLY A N GLY A 147 O LEU A 117 SHEET 9 SA 9 ALA A 4 LYS A 9 1 N VAL A 7 O VAL A 148 SHEET 1 SB 9 ALA B 4 LYS B 9 0 SHEET 2 SB 9 GLN B 15 GLU B N PHE B 20 O ALA B 4 SHEET 3 SB 9 VAL B 29 LYS B 30 1 N LYS B 30 O GLU B 21 SHEET 4 SB 9 VAL B 94 ASP B SHEET 5 SB 9 GLY B 85 ALA B 89 1 SHEET 6 SB 9 GLY B 41 HIS B N HIS B 43 O VAL B 87 SHEET 7 SB 9 ARG B 115 HIS B N VAL B 118 O HIS B 46 SHEET 8 SB 9 CYS B 146 GLY B N GLY B 147 O LEU B 117 SHEET 9 SB 9 ALA B 4 LYS B 9 1 N VAL B 7 O VAL B 148 SSBOND 1 CYS A 57 CYS A SSBOND 2 CYS B 57 CYS B Other info regarding the protein
... LINK CH3 ACE A 0 N ALA A LINK CH3 ACE B 0 N ALA B LINK C ACE A 0 N ALA A LINK CU CU A 154 NE2 HIS A LINK CU CU A 154 ND1 HIS A LINK CU CU A 154 NE2 HIS A LINK CU CU A 154 NE2 HIS A LINK ZN ZN A 155 ND1 HIS A LINK ZN ZN A 155 ND1 HIS A LINK ZN ZN A 155 ND1 HIS A LINK ZN ZN A 155 OD2 ASP A LINK C ACE B 0 N ALA B LINK CU CU B 154 NE2 HIS B LINK CU CU B 154 NE2 HIS B LINK CU CU B 154 ND1 HIS B LINK CU CU B 154 NE2 HIS B LINK ZN ZN B 155 ND1 HIS B LINK ZN ZN B 155 OD2 ASP B LINK ZN ZN B 155 ND1 HIS B LINK ZN ZN B 155 ND1 HIS B SITE 1 CUA 4 HIS A 46 HIS A 48 HIS A 63 HIS A 120 SITE 1 ZNA 4 HIS A 63 HIS A 71 HIS A 80 ASP A 83 SITE 1 CUB 4 HIS B 46 HIS B 48 HIS B 63 HIS B 120 SITE 1 ZNB 4 HIS B 63 HIS B 71 HIS B 80 ASP B 83 SITE 1 AC1 4 HIS A 46 HIS A 48 HIS A 63 HIS A 120 SITE 1 AC2 4 HIS A 63 HIS A 71 HIS A 80 ASP A 83 SITE 1 AC3 4 HIS B 46 HIS B 48 HIS B 63 HIS B 120 SITE 1 AC4 4 HIS B 63 HIS B 71 HIS B 80 ASP B Other info regarding the protein
NMR Structures
ATOM Records
ATOM 1 N PRO A N ATOM 2 CA PRO A C ATOM 3 C PRO A C ATOM 4 O PRO A O ATOM 5 CB PRO A C ATOM 6 CG PRO A C ATOM 7 CD PRO A C ATOM 8 N LYS A N ATOM 9 CA LYS A C ATOM 10 C LYS A C ATOM 11 O LYS A O ATOM 12 CB LYS A C ATOM 13 CG LYS A C ATOM 14 CD LYS A C ATOM 15 CE LYS A C ATOM 16 NZ LYS A N ATOM 17 N ALA A N ATOM 18 CA ALA A C ATOM 19 C ALA A C ATOM 20 O ALA A O ATOM 21 CB ALA A C ATOM Example Atom number Atom name Residue name Residue number xyz Atom coordinates Occup- ancy B-factor ChainID
Alternate Location Indicator
Insertion Code Things would be very simple if the amino acids in every chain were numbered in the obvious way, starting with 1. The problem with numbering started when people wanted to compare the 'same' proteins from different species. They found that there were the following possibilities that gave rise to differences: 1.More or fewer residues at either end. 2.Extra residues at various places within the chain. 3.Fewer residues at various places within the chain. 4.Different amino acids at the same place. For example, relative to an important reference, the Tyr and Phe in the next example must be “inserted” to maintain the sequence numbering elsewhere. Reference:Xxx-Arg Asp-Xxx Current:Xxx-Arg-Tyr-Phe-Asp-Xxx
ATOM 2518 CB ARG H C ATOM 2519 CG ARG H C ATOM 2520 CD ARG H C ATOM 2521 NE ARG H N ATOM 2522 CZ ARG H C ATOM 2523 NH1 ARG H N ATOM 2524 NH2 ARG H N ATOM 2525 N TYR H 100A N ATOM 2526 CA TYR H 100A C ATOM 2527 C TYR H 100A C ATOM 2528 O TYR H 100A O ATOM 2529 CB TYR H 100A C ATOM 2530 CG TYR H 100A C ATOM 2531 CD1 TYR H 100A C ATOM 2532 CD2 TYR H 100A C ATOM 2533 CE1 TYR H 100A C ATOM 2534 CE2 TYR H 100A C ATOM 2535 CZ TYR H 100A C ATOM 2536 OH TYR H 100A O ATOM 2537 N PHE H 100B N ATOM 2538 CA PHE H 100B C ATOM 2539 C PHE H 100B C ATOM 2540 O PHE H 100B O ATOM 2541 CB PHE H 100B C ATOM 2542 CG PHE H 100B C ATOM 2543 CD1 PHE H 100B C ATOM 2544 CD2 PHE H 100B C ATOM 2545 CE1 PHE H 100B C ATOM 2546 CE2 PHE H 100B C ATOM 2547 CZ PHE H 100B C ATOM 2548 N ASP H N ATOM 2549 CA ASP H C 1JGU Insertion Code
HETATM
HETATM 1110 N1 FMN N HETATM 1111 C2 FMN C HETATM 1112 O2 FMN O HETATM 1113 N3 FMN N HETATM 1114 C4 FMN C HETATM 1115 O4 FMN O HETATM 1116 C4A FMN C HETATM 1117 N5 FMN N HETATM 1118 C5A FMN C HETATM 1119 C6 FMN C HETATM 1120 C7 FMN C HETATM 1121 C7M FMN C HETATM 1122 C8 FMN C HETATM 1123 C8M FMN C HETATM 1124 C9 FMN C HETATM 1125 C9A FMN C HETATM 1126 N10 FMN N HETATM 1127 C10 FMN C HETATM 1128 C1' FMN C HETATM 1129 C2' FMN C HETATM 1130 O2' FMN O HETATM
Ligand Explorer
PDBsum
PDBsum and Ligplot