Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 1 C571/C696 Chemical Information Technology David Wild

Similar presentations


Presentation on theme: "Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 1 C571/C696 Chemical Information Technology David Wild"— Presentation transcript:

1 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 1 C571/C696 Chemical Information Technology David Wild djwild@indiana.edu http://www.informatics.indiana.edu/djwild Representing 3D Structures

2 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 2 What we’ll cover today Sources of 3D information (X-ray, NMR) Experimental 3D databases Rotatable bonds & conformational flexibility Representing 3D structures using distance matrices Estimation of 3D structure on computer Conformational search and minimization 3D descriptors and fingerprints Types & sources of protein information How proteins are represented on computer PDB file format

3 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 3 Sources of 3D information X-ray Crystallography NMR Spectroscopy Computer-generated 3D structures X-ray and NMR methods apply to both small molecules and protiens

4 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 4 X-ray crystallography Exploits diffraction of x-rays by electron clouds Allows 3D location of atoms to be inferred Requires sample to be in crystalline form More info: –http://www-structure.llnl.gov/Xray/101index.htmlhttp://www-structure.llnl.gov/Xray/101index.html

5 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 5 X-ray crystallography Taken from http://www-structure.llnl.gov/Xray/101index.html http://www-structure.llnl.gov/Xray/101index.html

6 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 6 NMR Spectroscopy Exploits magnetic fields created by quantum spin in nucleii Atomic spin can switch state when radio waves are applied Different atoms and groups resonate at different frequencies Information can be pieced together to infer 3D structure More info: –http://www.rod.beavon.clara.net/nmr1.htmhttp://www.rod.beavon.clara.net/nmr1.htm

7 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 7 Experimental 3D Databases – Cambridge Structural Database Experimental X-ray structures for 261,000 structures (Jan 2004) Various tools for searching the database (some available free) More info at: http://www.ccdc.cam.ac.uk/

8 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 8 CSD Growth since 1970

9 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 9 Factors involved in 3D representation Rotatable bonds and Conformational flexibility Sampling conformations or including flexibility in algorithms Measuring energy of conformations Representation of electronic and other characteristics

10 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 10 Rotatable bonds and conformational flexibility Most compounds have rotatable bonds. This means that the molecule can take on many 3D conformations. Molecules prefer low-energy states, so low-energy conformations are more likely How do we work out which bonds are rotatable? Do we pick one particular conformation (e.g. lowest energy), or pick several, or allow for flexibility?

11 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 11 Working definition of a rotatable bond Any single bond which is: –Not part of a ring –Not terminal (e.g. methyl) –Not in a conjugated system (e.g. amide)

12 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 12 C Torsion (dihedral) angle The torsion angle ( τ ), also known as the dihedral angle, is the relative position, or angle, between the A-B bonds and the C-D bonds when considering four atoms connected in the order A-B-C-D A B C D A B D τ τ

13 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 13 Ring flexibility Chair & boat conformations Occur with non-aromatic rings (e.g. cyclohexane) ChairBoat

14 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 14 3D representation on computer The Coordinate Table is an extension of the atom table which lists coordinates of atoms in 3D space relative to a defined origin The Distance Matrix gives distances (in Ångstrom) between all atoms. It’s main use is in comparison of 3D structures. It can be derived from the coordinate table. These are usually stored in addition to a connection table.

15 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 15 AtomLabelXYZ 1C-1.8920-0.9920-1.5760 2C-1.3680-2.1480-0.9880 3C-0.0760-2.1440-0.4640 4C0.7080-0.9840-0.5200 5C0.2000-0.1560-1.1960 6C-0.10800.1600-1.6520 7O2.0840-1.02800.1040 8O2.5320-2.03200.6360 9C2.87600.02400.1120 10O0.75201.3320-1.0840 11O0.66802.02400.0320 12C1.30003.06000.1520 13C-0.24001.57601.4440 Coordinate Table 1 2 3 4 6 5 7 8 9 10 11 12 13

16 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 16 Distance Matrix 1 2 3 4 6 5 7 8 9 10 11 12 13 4.8Å 3.5Å 12345678910111213 11.42.42.82.43.84.84.21.42.42.72.94.3 21.42.42.84.35.15.02.43.73.94.25.6 31.42.43.84.24.82.84.24.74.96.4 41.42.52.83.62.43.74.74.66.1 51.52.42.31.42.33.73.54.8 61.31.22.52.84.43.95.0 72.23.74.15.75.26.3 82.82.54.23.54.3 91.42.62.33.7 102.21.32.5 111.22.4 121.5 13

17 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 17 3D Molecule file formats All tend to include coordinate/atom lookup table and connection table information Examples: MOL file (MDL), Sybyl MOL2 file (Tripos)

18 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 18 3D MOL file for Aspirin Chime 12290214053D 21 21 0 0 1 V2000 -1.8920 -0.9920 -1.5760 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.3680 -2.1480 -0.9880 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.0760 -2.1440 -0.4640 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7080 -0.9840 -0.5200 C 0 0 0 0 0 0 0 0 0 0 0 0 0.2000 0.1560 -1.1960 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.1080 0.1600 -1.6520 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0840 -1.0280 0.1040 C 0 0 0 0 0 0 0 0 0 0 0 0 2.5320 -2.0320 0.6360 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8760 0.0240 0.1120 O 0 0 0 0 0 0 0 0 0 0 0 0 0.7520 1.3320 -1.0840 O 0 0 0 0 0 0 0 0 0 0 0 0 0.6680 2.0240 0.0320 C 0 0 0 0 0 0 0 0 0 0 0 0 1.3000 3.0600 0.1520 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.2400 1.5760 1.1440 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.8760 -0.9600 -1.9840 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.9880 -3.0360 -0.9520 H 0 0 0 0 0 0 0 0 0 0 0 0 0.3000 -3.0600 -0.0040 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.4880 1.0840 -2.0560 H 0 0 0 0 0 0 0 0 0 0 0 0 2.5640 0.7800 -0.3240 H 0 0 0 0 0 0 0 0 0 0 0 0 -0.7600 0.6360 0.9320 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.0080 2.3480 1.2880 H 0 0 0 0 0 0 0 0 0 0 0 0 0.3440 1.4320 2.0560 H 0 0 0 0 0 0 0 0 0 0 0 0 13 21 1 0 13 20 1 0 13 19 1 0 11 13 1 0 11 12 1 0 10 11 1 0 9 18 1 0 7 9 1 0 7 8 1 0 6 17 1 0 5 10 1 0 5 6 1 0 4 7 1 0 4 5 1 0 3 16 1 0 3 4 1 0 2 15 1 0 2 3 1 0 1 14 1 0 1 6 1 0 1 2 1 0 M END

19 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 19 Computer estimation of 3D structure Programs take as input 2D structures (e.g. in SMILES) and output 3D structures There is no one correct 3D structure, since in three dimensions a molecule is conformationally flexible Methods may output one single conformation, or an ensemble of possible conformations

20 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 20 Fragment / Rule Based 3D Structure Generation Split 2D structure into small fragments matched to a pre- defined empirical database Generally use a combination of real fragment coordinates, theory and rules to generate the 3D structure Generally produce one or more low-energy conformations Examples: Concord, Corina, Omega

21 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 21 Distance Geometry Based Structure Generation Rapidly samples “conformational space” of molecule, looking for valid conformations based on distance bounds. Outputs an ensemble of possible conformations, which can then be scored, e.g. by energy For algorithm, see –http://www.daylight.com/meetings/summerschool01/course/basic s/dist.htmlhttp://www.daylight.com/meetings/summerschool01/course/basic s/dist.html

22 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 22 Concord Distributed by Tripos, inc. One of the earliest structure generators Fragment / rule-based Produces low-energy, geometry optimized conformation An industry standard More information: –http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/c oncord.htmlhttp://www.tripos.com/sciTech/inSilicoDisc/chemInfo/c oncord.html

23 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 23 Corina Created by Gasteiger lab in Germany Fragment / Rule-based Similar to Concord More information, plus 1,000 free structure generations on the web, at: –http://www2.chemie.uni- erlangen.de/software/corina/free_struct.htmlhttp://www2.chemie.uni- erlangen.de/software/corina/free_struct.html

24 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 24 Omega Recently introduced by OpenEye Rule-based Systematically tests conformations, not stochastic Extremely fast generation of multiple low-energy conformations Can handle 100,000 compounds/processor/day Free academic use license More information at: –http://www.eyesopen.com/products/applications/omega.htmlhttp://www.eyesopen.com/products/applications/omega.html

25 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 25 Rubicon Marketed by Daylight Mixture of Distance Geometry and SMARTS- based rules Rules can be user-defined For more information, see –http://www.daylight.com/products/rubicon.htmlhttp://www.daylight.com/products/rubicon.html

26 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 26 Structure Minimization Finding the conformer or conformers that have the lowest energy, and are therefore most likely to be found in nature (“conformational search”) May start with an existing non-optimized structure Can use standard optimization methods such as exhaustive search, simulated annealing, monte carlo, or genentic algorithms Can attempt to use ab initio derivation More info see: –http://www.chem.swin.edu.au/modules/mod6/http://www.chem.swin.edu.au/modules/mod6/

27 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 27 3D small molecule databases and searching Databases store coordinate tables and often distance matrices Searching is a little different from 2D searching: –Needs to take into account conformational flexibility –Requirements different Less common and less mature than 2D databases and searching See http://www.netsci.org/Science/Cheminform/feature06.ht ml for a review http://www.netsci.org/Science/Cheminform/feature06.ht ml

28 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 28 3D substructure (“pharmacophore”) search A pharmacophore is a set of features in 3D required for binding to a particular protein E.g. “find all of the molecules that have an OH group between 2 and 5 Å away from a Carboxyl Oxygen, both of which are 7-8 Å from a Benzene Ring

29 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 29 3D Similarity Searching Can use 3D fingerprints based on pharmacophore “fragments” –See, e.g., Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound Subsets. H. Matter and T. Pötter, J. Chem. Inf. Comput. Sci.; 1999; 39(6) pp 1211 - 1225Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound Subsets Can be atom based, involving comparison of distance matrices –E.g. finding pairs of most-similar atoms between molecules, based on their distances from other atoms in the molecule But other forms are also used, e.g. using fields –See, e.g., Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials, D. Thorner, D. Wild, P. Willett, & M. Wright, Perspectives in Drug Discovery and Design, 9/10/11, 301-320, 1998Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials May be used for searching databases or ranking small datasets

30 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 30 A debate! – 2D vs 3D similarity Which is more effective… … for retrieving molecules with similar biological activity? … for retrieving molecules with similar 2D structures? … for retrieving related molecules of interest to chemists? … for ranking molecules for a particular target?

31 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 31 WDI - Mean Actives Retrieved in Top 300

32 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 32 Agrochemicals Dataset - Correlation between similarity and activity with four activities

33 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 33 Any consensus? Which is more effective… … for retrieving molecules with similar biological activity? Usually 2D … for retrieving molecules with similar 2D structures? 2D … for retrieving related molecules of interest to chemists? Sometimes 2D, sometimes 3D (bioisosteres) … for ranking molecules for a particular target? Sometimes 2D, sometimes 3D

34 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 34 Other forms of 3D information Surface (van de Waal’s, Connolly, volume) Properties projected onto surface (electrostatics, hydrophobics) Fields (energy, force, electrostatic, steric, hydrophobic) Atom-based properties (charge, hydrophobicity, etc)

35 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 35 What is a macromolecule? Any very large molecule (>1000 atoms) Usually made up of repeating building block molecules (amino acids, nucleic bases, etc) in a chain Polypeptides (amino acid building blocks) Proteins (amino acid building blocks) Nucleic acids (made up of bases) Polysaccharides (made up of sugars) We shall be focusing on polypeptides and proteins

36 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 36 Types of protein information Atomic (3D atom coordinates and bond information) Primary (Amino acid sequence) Secondary (Alpha helices, beta sheets, etc) Tertiary (3D folding of protein) Quaternary (dimers, protein families)

37 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 37 Atomic information 3D coordinates of all atoms in the protein Derived from X-ray crystallography or NMR Spectroscopy

38 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 38 Primary structure (Sequence) Lists Amino acids in order they appear in chain Uses three letter or one-letter abbreviations, e.g: Ser-Tyr-Ser-Met-Glu-His-Phe-Arg-Trp-Gly-Lys S Y S M E H F R W G K Essentially “1-dimensional” representation of the protein Can be stored on computer as a text string

39 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 39 Secondary structure α-helix – C=O and NH groups hydrogen bond to group 4 along in the chain, forming a coil shape:β-sheet, turn β-sheet – flat structure due to hydrogen-bonding between two or more chains Certain groups of amino acids tend to form themselves into regular 3D shapes:

40 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 40 Secondary structure (2) Secondary structural features can be fairly well predicted from primary structure, or it can be inferred from atom coordinates Primary sequence can be ‘tagged’ with secondary structure information E.g. G A F T G E I S P G M I K D C G A T W V β β β β β β β α α α α α α α

41 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 41 Tertiary structure How the protein chain is folded in three dimensions Information mostly derived from atomic coordinate information Extremely difficult to predict from scratch using computational methods May be predicted by finding proteins with similar primary and secondary structures that have known coordinates (homology modeling, threading).

42 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 42 Tertiary structure example (HIV)

43 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 43 Protein information representation Atomic – coordinate/connection table Primary – text string Secondary – text string Tertiary – set of points and vectors

44 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 44 File formats Tripos Sybyl MOL2 –For storage of atomic coordinate information –Same as 3D small molecule file format PDB format –Special format for proteins –Complex and somewhat ill-defined –Allows representation of multiple types of information (primary, secondary, tertiary, atomic)

45 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 45 PDB file format Official guide: http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html Different sections to specify different kinds of information –Title –Primary structure –Heterogen –Secondary Structure –Connectivity Annotation –Miscellaneous –Crystallographic / Co-ordinate –Connectivity –Book-keeping Each section made up of keywords, one per line

46 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 46 PDB Title section HEADER – Type, date, ID code COMPND – Description of compound TITLE – Title of experiment used to produce structure AUTHOR JRNL – Reference publication REMARK - Comments

47 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 47 Primary structure section SEQRES – specifies amino acid sequence MODRES – specifies modifications to amino acids

48 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 48 Secondary structure section HELIX – specifies start & end of helical section SHEET – specifies start & end of turn TURN – specifies location of turn

49 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 49 Coordinates section ATOM – specifies coordinates for an atom in a residue HETATM – specifies coordinates for other atoms (e.g. in drug) TER – specifies end of list of coordinates for a chain

50 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 50 Connectivity section CONECT – specifies connectivity between atoms (usually used for non amino-acids)

51 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 51 PDB file example HIV Protease HEADER PROTEIN 28-OCT-96 COMPND HIV-1 PROTEASE COMPLEXED WITH THE INHIBITOR A77003 (R,S) AUTHOR GENERATED BY SYBYL, A PRODUCT OF TRIPOS ASSOCIATES, INC. SEQRES 1 A 99 PRO GLN ILE THR LEU TRP GLN ARG PRO LEU VAL THR ILE SEQRES 2 A 99 LYS ILE GLY GLY GLN LEU LYS GLU ALA LEU LEU ASP THR SEQRES 3 A 99 GLY ALA ASP ASP THR VAL LEU GLU GLU MET SER LEU PRO SEQRES 4 A 99 GLY ARG TRP LYS PRO LYS MET ILE GLY GLY ILE GLY GLY SEQRES 5 A 99 PHE ILE LYS VAL ARG GLN TYR ASP GLN ILE LEU ILE GLU SEQRES 6 A 99 ILE CYS GLY HIS LYS ALA ILE GLY THR VAL LEU VAL GLY SEQRES 7 A 99 PRO THR PRO VAL ASN ILE ILE GLY ARG ASN LEU LEU THR SEQRES 8 A 99 GLN ILE GLY CYS THR LEU ASN PHE SEQRES 1 B 99 PRO GLN ILE THR LEU TRP GLN ARG PRO LEU VAL THR ILE SEQRES 2 B 99 LYS ILE GLY GLY GLN LEU LYS GLU ALA LEU LEU ASP THR SEQRES 3 B 99 GLY ALA ASP ASP THR VAL LEU GLU GLU MET SER LEU PRO SEQRES 4 B 99 GLY ARG TRP LYS PRO LYS MET ILE GLY GLY ILE GLY GLY SEQRES 5 B 99 PHE ILE LYS VAL ARG GLN TYR ASP GLN ILE LEU ILE GLU SEQRES 6 B 99 ILE CYS GLY HIS LYS ALA ILE GLY THR VAL LEU VAL GLY SEQRES 7 B 99 PRO THR PRO VAL ASN ILE ILE GLY ARG ASN LEU LEU THR SEQRES 8 B 99 GLN ILE GLY CYS THR LEU ASN PHE ATOM 1 N PRO A 1 8.133 -13.258 12.706 1.00 0.00 ATOM 2 CA PRO A 1 9.325 -12.418 13.001 1.00 0.00 ATOM 3 C PRO A 1 8.939 -10.978 13.283 1.00 0.00 ATOM 4 O PRO A 1 7.813 -10.607 13.030 1.00 0.00 ATOM 5 CB PRO A 1 10.211 -12.484 11.768 1.00 0.00 ATOM 6 CG PRO A 1 9.219 -12.779 10.674 1.00 0.00 ATOM 7 CD PRO A 1 8.271 -13.768 11.335 1.00 0.00 ATOM 8 H1 PRO A 1 7.974 -14.024 13.392 1.00 0.00

52 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 52 PDB file example HIV Protease (2) ATOM 1844 CE2 PHE B 99 5.527 -13.746 8.735 1.00 0.00 ATOM 1845 CZ PHE B 99 6.308 -12.665 8.239 1.00 0.00 ATOM 1846 OXT PHE B 99 5.672 -12.903 13.426 1.00 0.00 ATOM 1847 H PHE B 99 5.668 -10.590 12.626 1.00 0.00 TER 1848 PHE B 99 HETATM 1849 C1 A 1 -3.676 0.038 -4.301 1.00 0.00 HETATM 1850 N21 A 1 -2.730 -0.070 -5.222 1.00 0.00 HETATM 1851 H28 A 1 -2.958 0.299 -6.126 1.00 0.00 HETATM 1852 C22 A 1 -1.389 -0.623 -4.962 1.00 0.00 HETATM 1853 H29 A 1 -1.369 -1.096 -3.981 1.00 0.00 HETATM 1854 C25 A 1 -1.031 -1.707 -6.000 1.00 0.00 HETATM 1855 H30 A 1 -1.021 -1.235 -6.985 1.00 0.00 HETATM 1856 C27 A 1 -2.085 -2.821 -6.044 1.00 0.00 HETATM 1857 H36 A 1 -1.845 -3.547 -6.818 1.00 0.00 HETATM 1858 H35 A 1 -3.079 -2.429 -6.267 1.00 0.00 HETATM 1859 H34 A 1 -2.140 -3.350 -5.091 1.00 0.00 HETATM 1860 C26 A 1 0.365 -2.310 -5.758 1.00 0.00 HETATM 1861 H33 A 1 0.450 -2.709 -4.748 1.00 0.00 HETATM 1862 H32 A 1 1.159 -1.573 -5.891 1.00 0.00 HETATM 1863 H31 A 1 0.564 -3.134 -6.440 1.00 0.00 HETATM 1864 C23 A 1 -0.360 0.506 -4.927 1.00 0.00 HETATM 1865 N37 A 1 -0.195 1.091 -3.733 1.00 0.00 HETATM 1866 H59 A 1 -0.715 0.711 -2.967 1.00 0.00 HETATM 1867 C38 A 1 0.602 2.329 -3.511 1.00 0.00 HETATM 1868 H60 A 1 1.052 2.671 -4.449 1.00 0.00 HETATM 1869 C46 A 1 1.713 2.066 -2.491 1.00 0.00 HETATM 1870 H68 A 1 1.221 1.950 -1.522 1.00 0.00

53 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 53 PDB file example HIV Protease (3) CONECT 1943 1934 1941 1944 CONECT 1944 1943 CONECT 1945 1864 CONECT 1946 1849 1947 1960 CONECT 1947 1946 1948 1949 1950 CONECT 1948 1947 CONECT 1949 1947 CONECT 1950 1947 1951 1958 CONECT 1951 1950 1952 CONECT 1952 1951 1953 1954 CONECT 1953 1952 CONECT 1954 1952 1955 1956 CONECT 1955 1954 CONECT 1956 1954 1957 1958 CONECT 1957 1956 CONECT 1958 1950 1956 1959 CONECT 1959 1958 CONECT 1960 1946 1961 1962 1963 CONECT 1961 1960 CONECT 1962 1960 CONECT 1963 1960 CONECT 1964 1849 MASTER 0 0 0 0 0 0 0 0 1965 2 126 16 END

54 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 54 Protein Databases The PDB (www.pdb.org) is the main worldwide repository for the processing and distribution of 3-D structure data of large molecules of proteins and nucleic acids. It currently holds around 24,000 structureswww.pdb.org Other databases (e.g. SwissProt http://au.expasy.org/sprot/) contain just sequence data for more proteins http://au.expasy.org/sprot/ See also EBI: http://www.ebi.ac.uk/Databases/http://www.ebi.ac.uk/Databases/

55 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 55 PDB Growth

56 Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 56 Follow-up Read chapter 2 of Leach & Gillet Read chapter 3 & 4 of Getting Started in Chemoinformatics


Download ppt "Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 1 C571/C696 Chemical Information Technology David Wild"

Similar presentations


Ads by Google