Presentation on theme: "Computers in Chemistry Dr John Mitchell & Rosanna Alderson University of St Andrews."— Presentation transcript:
Computers in Chemistry Dr John Mitchell & Rosanna Alderson University of St Andrews
1. Why? Working with experiment to test our theories. Computer uses theory to calculate an answer that can be compared with experiment. If prediction and experiment don’t agree, something has to give.
To Test Our Theories The theory that lies beneath chemistry is ultimately quantum physics. Turning this into a prediction of the rate of a chemical reaction or frequency of a transition in an IR spectrum needs lots of computation. Example: Quantum chemistry predicts that atoms in molecules are not spherical.
Atoms in molecules are not spherical
To Make Testable Predictions Computation’s ability to make accurate predictions of experimental measurements is a good test of the validity of a theory. We only understand if we can predict.
Crystal Structure Prediction Given the structural diagram of an organic molecule, predict the 3D crystal structure.
Calculate Energy of Infinite Crystal Calculate molecular energies and interactions Allow unit cell to change Optimise size, shape, packing Find energy of infinite lattice Find lattice with best energy Predicted crystal structure
To Analyse Experimental Results Modern experimental techniques (NMR, mass-spec, X-ray crystallography etc.) are complex and work best if analysis of the results is done by computer. This both speeds up the process and lessens the risk of human bias in analysis of data.
Looking at Molecules – Experimental Analysis “the measurement of radiation intensity as a function of wavelength” Gives an indication to synthesis success and overall structure. “the science that examines the arrangement of atoms in solids” Gives a 3D structure – Allows conformation of molecular arrangement & indicates interaction within a crystal. Spectroscopic Data AnalysisX-Ray Crystallography No indication of the underlying theoretical physics!
To Access Data that Experiment can’t Computational chemistry provide means to obtain data very difficult, expensive or time- consuming to get experimentally. Behaviour at high temperature or pressure. Structure of liquids at atomic scale. Dynamics of proteins.
Phase Changes of Iron in the Earth’s Core et al.,
Structure of Liquid Water and Water Clusters Computer simulations are an important source of evidence, since atomic scale details of an irregular structure are hard to obtain by experiment.
2. The Power to Compute
Development of Computer Power University of Manchester SSEM, 1948
Development of Computer Power IBM Roadrunner, 2008
Computer Power: Moore’s Law Computer power doubles every two years: exponential growth
Computer Power: Moore’s Law Logarithmic scale
Computer Power: Moore’s Law This growth will, eventually, slow down as components reach atomic scale … we think!
The Size of the Problem
Scaling of the Expense of Computation Typical scaling is ~N 4, as fourth power of molecular size. For the foreseeable future, there will be chemical problems at the limit of our computing capacity.
3. Philosophies of Computational Chemistry
“The problem is difficult, but by making suitable approximations we can solve it at reasonable cost based on our understanding of physics and chemistry.” A: Philosophy of Theoretical Chemistry
Theoretical Chemistry Calculations and simulations based on real physics. Calculations are either quantum mechanical or use numbers derived from quantum mechanics. Attempt to model or simulate reality. Usually Low Throughput.
What Kinds of Theoretical Chemistry can be Done? Prof. Eitan Geva (1) Quantum Chemistry
What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Using quantum mechanics to solve the structures and energetics of molecules; everything depends on the distribution of electrons.
1926 – Erwin Schrödinger proposed the Schrödinger equation The time independent Schrödinger equation; Hamiltonian – an operator Energy Wavefunction What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry
E The Hamiltonian Mathematical operator embodying the underlying physics -Kinetic energy of electrons -Attraction between electrons and nuclei of atoms -Repulsion between electrons The Wavefunction Describes the distribution of electrons in space that gives the lowest energy -A function of all electron positions within the molecule -The square of the wavefunction gives the electron density -Any molecular property can be calculated from the wavefunction The Energy -There is always one energy associated with each wavefunction Although quantum chemistry involves solving Schrödinger’s equation, it is not fully exact. There are some approximations involved.
What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry There are two main kinds of quantum chemistry: Ab initio Density Functional Theory
What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Ab initio “from first principles”. Solve Schrödinger equation to get wavefunction. In principle rigorous – we know what we calculate. But the standard “Hartree-Fock” method contains significant approximations. Expensive to adjust for these and get more accuracy.
What Kinds of Theoretical Chemistry can be Done? (1) Quantum Chemistry Density Functional Theory Makes use of the theorem that all properties of interest can be determined directly from the electron density. True in principle, but the correct “functional” is unknown. Less rigorous than ab initio, but usually more accurate for an equivalent cost (or cheaper for similar accuracy).
What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation
What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation There are various techniques for simulating molecules, the most significant is probably Molecular Dynamics. Molecular Dynamics makes a “balls-and- springs” model of the molecule in the computer, and follows its behaviour over time.
What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation Light-harvesting protein subunit.
What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation Time steps need to be very, very short (~ seconds), so it takes a million steps to simulate one nanosecond of real time and a billion steps to simulate a microsecond. So it is hard to directly simulate relatively slow or rare events, such as protein folding.
What Kinds of Theoretical Chemistry can be Done? (2) Molecular Simulation Also, a balls-and-springs model lacks the quantum mechanics needed to simulate a chemical reaction. Nonetheless, molecular dynamics is very important for understanding shape changes, interactions and energetics of large molecules.
B: Philosophy of Informatics “The problem is too difficult to solve at reasonable cost based on real physics and chemistry, so instead we will build a purely empirical model to predict the required molecular properties from chemical structure, using the available data.”
Informatics In general, informatics methods represent phenomena mathematically, but not in a physics-based way. Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model. Do not attempt to simulate reality. Usually High Throughput.
Informatics Bioinformatics = Informatics applied to biology (genes and proteins). Cheminformatics or chemoinformatics = informatics applied to chemistry; cheminformatics techniques are often used in drug discovery and pharmaceutical research. Medical informatics = application of informatics to medicine or medical data.
Modelling in Chemistry LOW THROUGHPUT HIGH THROUGHPUT
Modelling in Chemistry LOW THROUGHPUT HIGH THROUGHPUT Theoretical Chemistry
Modelling in Chemistry LOW THROUGHPUT HIGH THROUGHPUT
Modelling in Chemistry LOW THROUGHPUT HIGH THROUGHPUT Informatics
4. How Best to Compute Solubility?
Which would you Prefer... or ?
Which would you Prefer... Solubility in water (and other biological fluids) is highly desirable for pharmaceuticals! or ?
Solubility is an important issue in drug discovery and a major cause of failure of drug development projects Expensive for the pharma industry Patients suffer lack of available treatments A good computational model for predicting the solubility of druglike molecules would be very valuable.
We can use theoretical chemistry to calculate solubility via a thermodynamic cycle 49 ΔG hyd ΔG solu Crystalline Gaseous Solution ΔG sub Sub = sublimation Hyd = hydration Solu = solution
We can use theoretical chemistry to calculate solubility via a thermodynamic cycle 50 ΔG hyd ΔG solu Crystalline Gaseous Solution ΔG sub Sub = sublimation Hyd = hydration Solu = solution
Calculate Energy of Infinite Crystal Take one molecule Solve its Schrödinger equation Calculate its interactions Allow unit cell to change Find best size, shape, packing Find energy of infinite lattice This is the same methodology as used in crystal structure prediction.
We can use theoretical chemistry to calculate solubility via a thermodynamic cycle 52 ΔG hyd ΔG solu Crystalline Gaseous Solution ΔG sub Sub = sublimation Hyd = hydration Solu = solution
Model of Solvent-Solute Interaction Calculate energy of interaction between solute and solvent Model is called RISM
We can use theoretical chemistry to calculate solubility via a thermodynamic cycle 54 ΔG hyd ΔG solu Crystalline Gaseous Solution ΔG sub Sub = sublimation Hyd = hydration Solu = solution
Our Methods … (B) Random Forest (informatics)
A decision tree is like a flow chart Random Forest
This is a decision tree. We use lots of them to make a forest! A Machine Learning Method
Looks soluble to me! Random Forest Looks sort of soluble… As soluble as can be! I guess it’s insoluble This guy is soluble! Soluble? No way! I know it’s soluble
Fits into drug discovery pipeline here Could take 15 years and $1 billion!
Application to Proteins Funnel-shaped energy landscape
Using computers to study the world of proteins Rosanna Alderson
These two proteins only have 21% of the same sequence of amino acids in their polypeptide chain but fold into similar structures!
? Are there any proteins with a similar structure to this one? Lots of proteins have a similar structure! We need to look at a deeper level- to see if we can find amino acids we know are important for a particular function.
MGSSHHHHHHENLYFQGMMFKKKMLAAT What if we don’t have the 3D structure of a protein but only know its amino acids? We know what the amino acids are but we don’t know how they fold together… ?
MGSSHHHHHHENLYFQGMMFKKKMLAAT Look for similar sequences MGSSHHHHHHENLYFQGMMFKKKMLAAT MGSSHHHHHHDNLPFQGMMFKKNMLAAT 3D structure input Similar to input from PDB