A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE.

Slides:

Advertisements

Similar presentations

Stochastic Molecular Replacement. Nicholas M. Glykos MBG, DUTH, Alexandroupolis, Greece.

Advertisements

Multidimensional Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE.

Order Statistics Sorted

Molecular Dynamics at Constant Temperature and Pressure Section 6.7 in M.M.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

Acoustic design by simulated annealing algorithm

Simulated Annealing Methods Matthew Kelly April 12, 2011.

CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.

Applications and integration with experimental data Checking your results Validating your results Structure determination from powder data calculations.

Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.

© 2007 Pearson Education Chapter 14: Solving and Analyzing Optimization Models.

Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.

Recent Development on Elimination Ordering Group 1.

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

NORM BASED APPROACHES FOR AUTOMATIC TUNING OF MODEL BASED PREDICTIVE CONTROL Pastora Vega, Mario Francisco, Eladio Sanz University of Salamanca – Spain.

Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.

Simulated Annealing 10/7/2005.

A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.

Planning operation start times for the manufacture of capital products with uncertain processing times and resource constraints D.P. Song, Dr. C.Hicks.

Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.

The Calibration Process

Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.

Lecture II-2: Probability Review

Simulated Annealing G.Anuradha. What is it? Simulated Annealing is a stochastic optimization method that derives its name from the annealing process used.

Elements of the Heuristic Approach

TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

Introduction to Monte Carlo Methods D.J.C. Mackay.

Colorado Center for Astrodynamics Research The University of Colorado STATISTICAL ORBIT DETERMINATION Project Report Unscented kalman Filter Information.

Quantify prediction uncertainty (Book, p ) Prediction standard deviations (Book, p. 180): A measure of prediction uncertainty Calculated by translating.

Efficient Model Selection for Support Vector Machines

1 IE 607 Heuristic Optimization Simulated Annealing.

Modeling and simulation of systems Simulation optimization and example of its usage in flexible production system control.

Free energies and phase transitions. Condition for phase coexistence in a one-component system:

Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Video Mosaics AllisonW. Klein Tyler Grant Adam Finkelstein Michael F. Cohen.

Module 1: Statistical Issues in Micro simulation Paul Sousa.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

The set of files includes : Tcl source of the POLYGON program The database (file obtained initially by P.Afonine from using phenix.model_vs_data.

Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.

Yaomin Jin Design of Experiments Morris Method.

Simulated Annealing.

For a new configuration of the same volume V and number of molecules N, displace a randomly selected atom to a point chosen with uniform probability inside.

Overview of MR in CCP4 II. Roadmap

POINTLESS & SCALA Phil Evans. POINTLESS What does it do? 1. Determination of Laue group & space group from unmerged data i. Finds highest symmetry lattice.

Chapter 10 Verification and Validation of Simulation Models

Silesian University of Technology in Gliwice Inverse approach for identification of the shrinkage gap thermal resistance in continuous casting of metals.

Kanpur Genetic Algorithms Laboratory IIT Kanpur 25, July 2006 (11:00 AM) Multi-Objective Dynamic Optimization using Evolutionary Algorithms by Udaya Bhaskara.

Simulated Annealing G.Anuradha.

Methods in Chemistry III – Part 1 Modul M.Che.1101 WS 2010/11 – 8 Modern Methods of Inorganic Chemistry Mi 10:15-12:00, Hörsaal II George Sheldrick

Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++

Monte-Carlo based Expertise A powerful Tool for System Evaluation & Optimization  Introduction  Features  System Performance.

1 1 Slide © 2004 Thomson/South-Western Simulation n Simulation is one of the most frequently employed management science techniques. n It is typically.

CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.

Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.

Computational Physics (Lecture 10) PHY4370. Simulation Details To simulate Ising models First step is to choose a lattice. For example, we can us SC,

A New Potential Energy Surface for N 2 O-He, and PIMC Simulations Probing Infrared Spectra and Superfluidity How precise need the PES and simulations be?

11 Sep 2007Tracking - Paul Dauncey1 Tracking Code Paul Dauncey, Imperial College London.

Computational Physics (Lecture 10)

By Rohit Ray ESE 251 Simulated Annealing.

Chapter 10 Verification and Validation of Simulation Models

Introduction to Simulated Annealing

Axel T Brünger, Paul D Adams, Luke M Rice Structure

Boltzmann Machine (BM) (§6.4)

Chapter 7: The Normality Assumption and Inference with OLS

Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014

More on HW 2 (due Jan 26) Again, it must be in Python 2.7.

More on HW 2 (due Jan 26) Again, it must be in Python 2.7.

Conformational Search

Presentation transcript:

A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE

A stochastic approach to Molecular Replacement. Nicholas M. Glykos & Michael Kokkinidis IMBB, FORTH, Heraklion, Crete, GREECE

“Why ? What’s wrong with AMoRe ?”

“Why ? What’s wrong with AMoRe ?” “Interesting. Can we now go back to the AMoRe.log file ?”

stochastic adj. 1. determined by a random distribution of probabilities. 2. (of a process) characterized by a sequence of random variables. 3. governed by the laws of probability. Etymology : Gk stokhastikos, f. stokhazomai aim at, guess, f. stokhos aim.

crystal2 ~ crystal2 ~ file Stochastic.ppt Stochastic.ppt : c program text with garbage crystal2 ~

n 6n “The classical approach to the problem of placing n copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n-dimensional optimisation problem into a succession of three-dimensional searches.”

Acta Cryst. (2000), D56,

n 6n “The classical approach to the problem of placing n copies of a search model in the asymmetric unit of a target crystal structure, is to divide this 6n-dimensional optimisation problem into a succession of three-dimensional searches.” Acta Cryst. (2000), D56,

The method(s) : I. Treat all translational & orientational parameters of all molecules as variables whose values are to be simultaneously and independently determined.

The method(s) : II. Assume that the correct solution corresponds to the (pronounced) global minimum of a suitable (?) statistic (like the R-factor, or the linear correlation coefficient between F o ’s and F c ’s, or, F o 2 and F c 2, or, …).

The method(s) : III. Use simulated annealing (in the form of a modified reverse Monte Carlo method) to explore the 6n-dimensional parameter space.

The method(s) : III. Use simulated annealing (in the form of a modified reverse Monte Carlo method) to explore the 6n-dimensional parameter space. Other published optimisation techniques include : a genetic algorithm approach (Chang & Lewis, 1997), an evolutionary search methodology (Kissinger et al., 1999) and a systematic 6D search (Sheriff et al., 1999).

The program : Name : “Queen of Spades” Availability : absolutely free, no warranties whatsoever. The distribution includes source code plus pre-compiled executables for Irix, OSF, Linux, Solaris, VMS & windoze. Download the latest version via α Current stable version : α, Release 0.9.

The reverse Monte Carlo method: 1. Assign random initial positions & orientations to all molecules present in the asymmetric unit of the target crystal structure. Calculate F c ’s from this arrangement. 2. Calculate the R-factor between the F o ’s and the F c ’s. Call this R old.

The reverse Monte Carlo method: 3. Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (R new ). 4. If R new < R old, then, the new arrangement is accepted and we start again from (3). 5. If the new R-factor is worse, we still accept the move with probability exp[ –(R new – R old ) / T ].

The reverse Monte Carlo method: 3. Randomly chose and alter the orientation and position of one of the molecules. Calculate the R-factor resulting from the new arrangement (R new ). 4. If R new < R old, then, the new arrangement is accepted and we start again from (3). 5. If the new R-factor is worse, we still accept the move with probability exp[ –(R new – R old ) / T ].

Speeding it up : Avoid FFTs : calculate and store (in core) the molecular transform of the search model. Keep a table containing the contribution of each molecule to each reflection. CPU time per step ~ Number of reflections in P1.

Annealing schedules : Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode.

Annealing schedules : Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode. At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R=

Annealing schedules : Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode. At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R=

Annealing schedules : Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode. At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R=

Annealing schedules : Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode. At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R=

Annealing schedules : Constant temperature run. Linear temperature gradient (slow cooling). “Heating bath” mode. At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R= At T= , average R=

Move size control : Constant move size : max(Δt) = d min /max(a,b,c) ) max(Δκ) = d min (in degrees). Move size linearly dependent on current R-factor and time step : max(Δt) = 0.5 R (1.0 - t/t total ) max(Δκ) = π R (1.0 - t/t total )

Scaling : To B or not to B ? The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …

Scaling : To B or not to B ? The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but …

Scaling : To B or not to B ? The default is to scale |Fc|’s to |Fo|’s using both a scale and a temperature factor, but … 0.32±0.0223±5

Bulk solvent correction : The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary.

Bulk solvent correction : The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary. The exponential scaling model algorithm allows a computationally efficient and model-independent correction to be applied : F corrected = F p { 1.0 – k sol exp[ -B sol / d 2 ] }

Bulk solvent correction : The absence of a bulk-solvent correction from this type of calculations is a serious problem : introduces systematic errors up to 5Å, makes a low resolution cutoff necessary. The exponential scaling model algorithm allows a computationally efficient and model-independent correction to be applied : F corrected = F p { 1.0 – k sol exp[ -B sol / d 2 ] }

Bulk solvent correction ? Acta Cryst. (2000), D56,

Bulk solvent correction ? Acta Cryst. (2000), D56,

Using the program : Input : a.pdb file, and a formatted file containing h,k,l,F,σ(F). Running the program : $ Qs –auto 1, or, $ Qs –auto 2, etc. (no scripts), or, $ Qs Output :.pdb files containing the final coordinates for each model, plus a packing diagram for each solution.

Examples : A 5D problem. One molecule of lysozyme per a.u. Monoclinic space group (C2), 4Å data. rms deviation of model 1.4Å. Up to ±20% noise added to error-free data. About 90 seconds of CPU time per minimisation.

Examples : A 6D problem (1). Target structure 1bvx, search model 2lz2 (rms deviation 1.3Å). One molecule of lysozyme per a.u. Tetragonal space group (P ). Real 15-4Å data. About 3.8 hours of CPU time per minimisation.

Examples : A 6D problem (2). Target structure 1b6q. 30% solvent. Search model : incomplete poly-Ala. One monomer of Rop per a.u. Orthorhombic space group (C222 1 ). Real 15-4Å data. About 40 minutes of CPU time per run.

Examples : A 6D problem (2). Target structure 1b6q. 30% solvent. Search model : incomplete poly-Ala. One monomer of Rop per a.u. Orthorhombic space group (C222 1 ). Real 15-4Å data. About 40 minutes of CPU time per run.

Examples : An 11D problem. Target structure 1lys, model 2ihl (rmsd 1.52 & 1.56Å). Two molecules of lysozyme per asymmetric unit. Monoclinic space group (P2 1 ), 4Å data. ±20% noise added to error-free data. Solutions appear after ~3.8 hours of CPU time.

Disadvantages : In most cases, treating the problem as 6n- dimensional is a waste of CPU time. You can only have one search model (ie you can not search simultaneously with your DNA & protein models). The structure of the search model is kept fixed throughout the calculation.

Disadvantages : The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.

Disadvantages : n >1 The (putative) evidence from the self-rotation function and/or the native Patterson function are ignored (but, in a way, for n >1 they are also ignored by the traditional methods). When the starting model deviates significantly from the target structure, (i) there is no guarantee that the global minimum of any chosen statistic will correspond to the correct solution, (ii) traditional methods may be more sensitive in identifying the correct solution.

Advantages : If there are just one or two molecules per asymmetric unit and CPU time is not a problem, the method can be used as a last ditch effort to conclusively show that there is no such thing as a pronounced global minimum (or otherwise ?). The automatic (black box) mode is really black: no keywords, no scripts, just a.pdb file containing the model and an ASCII file containing h,k,l,F,σ(F).

Advantages : The computational procedures differ so much from those used in conventional methods, that the results obtained can be considered as independent.

Advantages : The computational procedures differ so much from those used in conventional methods, that the results obtained can be considered as independent. The method is honest in the sense that it is rather unlikely to find a wrong solution which will give a simultaneous sudden drop of both the R and Rfree leading to a solution with a reasonable packing arrangement.

A word of caution …

Res R Corr

A word of caution … Res R Corr

A word of caution … Res R Corr

Conclusion : n Substituting computing for thinking will probably fail for n ≥ 3.