Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time-

Slides:



Advertisements
Similar presentations
Artificial Intelligence Presentation
Advertisements

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.
Test Automation Success: Choosing the Right People & Process
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
M.I.R.(A.S.) S.M. Prince U.M.I.S.T.. The only generally applicable way of solving macromolecular crystal structure No reliance on homologous structure.
SAN DIEGO SUPERCOMPUTER CENTER Blue Gene for Protein Structure Prediction (Predicting CASP Targets in Record Time) Ross C. Walker.
Linkage Learning in Evolutionary Algorithms. Recombination Missouri University of Science and Technology Recombination explores the search space Classic.
Experimental Phasing stuff. Centric reflections |F P | |F PH | FHFH Isomorphous replacement F P + F H = F PH FPFP F PH FHFH.
Major Application: Finding Homologies (C) Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Active Appearance Models Suppose we have a statistical appearance model –Trained from sets of examples How do we use it to interpret new images? Use an.
In Macromolecular Crystallography Use of anomalous signal in phasing
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Automated protein structure solution for weak SAD data Pavol Skubak and Navraj Pannu Automated protein structure solution for weak SAD data Pavol Skubak.
The Calibration Process
1 Scenario-based Analysis of UML Design Class Models Lijun Yu October 4th, 2010 Oslo, Norway.
HumanAUT Secure Human Identification Protocols Adam Bender Avrim Blum Manuel Blum Nick Hopper The ALADDIN Center Carnegie Mellon University.
TEXTAL Progress Basic modeling of side-chain and backbone coordinates seems to be working well. –even for experimental MAD maps, 2.5-3A –using pattern-recognition.
CSE808 F'99Xiangping Chen1 Simulation of Rare Events in Communications Networks J. Keith Townsend Zsolt Haraszti James A. Freebersyser Michael Devetsikiotis.
Phasing based on anomalous diffraction Zbigniew Dauter.
Tx “bad” lags and range data gaps Pasha Ponomarenko 10/10/2014STELab discussion1.
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
 1  Outline  stages and topics in simulation  generation of random variates.
Statistical problems in network data analysis: burst searches by narrowband detectors L.Baggio and G.A.Prodi ICRR TokyoUniv.Trento and INFN IGEC time coincidence.
28 th March 2007 MrBUMP – Automated Molecular Replacement Ronan Keegan, Martyn Winn CCP4, Daresbury Laboratory.
Progress report on Crank: Experimental phasing Biophysical Structural Chemistry Leiden University, The Netherlands.
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
Sparse Matrix Factorizations for Hyperspectral Unmixing John Wright Visual Computing Group Microsoft Research Asia Sept. 30, 2010 TexPoint fonts used in.
Common parameters At the beginning one need to set up the parameters.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.
October, 2000.A Self Organsing NN for Job Scheduling in Distributed Systems I.C. Legrand1 Iosif C. Legrand CALTECH.
MrBUMP – Molecular Replacement with Bulk Model Preparation Automated search model discovery and preparation for structure solution by molecular replacement.
Advanced Computer Architecture and Parallel Processing Rabie A. Ramadan http:
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Background Subtraction and Likelihood Method of Analysis: First Attempt Jose Benitez 6/26/2006.
CEA DSM Dapnia SAp Readout mode of the Photometer Koryo Okumura, Marc Sauvage, Nicolas Billot, Bertrand Morin DSM/DAPNIA/SAp.
1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang.
Active Frame Selection for Label Propagation in Videos Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas.
Atomic structure model
Optimal Eye Movement Strategies In Visual Search.
15-18 December 2004 GWDAW-9 Annecy 1 All-Sky broad band search for continuous waves using LIGO S2 data Yousuke Itoh 1 for the LIGO Scientific Collaboration.
Etch Process Input and Output Parameters Process Modeling how to use input parameters to achieve desired output parameters Process Model Quality parameter1.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Bethesda, March 4 th 2009 Semi-automatic structure solution with HKL-3000 Structural Biology.
Selection Methods Choosing the individuals in the population that will create offspring for the next generation. Richard P. Simpson.
C++ for Engineers and Scientists, Second Edition 1 Problem Solution and Software Development Software development procedure: method for solving problems.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
I'm concerned that the OS requirement for the signal is inefficient as the charge of the TeV scale leptons can be easily mis-assigned. As a result we do.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Best 20 jobs jobs sites.
Stony Brook Integrative Structural Biology Organization
Kyriaki Dimitriadou, Brandeis University
Database Requirements for CCP4 17th October 2005
CCP4 from a user perspective
Question 1 When is detector active; when is it inactive?
Provide quick feedback to data collection experiments.
Equations Objectives for today’s lesson :
POWER CHALLENGES Several Ways To Solve 7 CHALLENGES.
Combining Efficient Conformational Sampling with a Deformable Elastic Network Model Facilitates Structure Refinement at Low Resolution  Gunnar F. Schröder,
Conformational Search
Presentation transcript:

Parallel Fine Sampling to Solve Large or Difficult Structures Manually exploring large parameter space to find right combination of parameters is time- consuming and frustrating. It often results in giving up on an otherwise solvable structure. Parallel exploration of parameter space is an effective approach to solve challenging structures efficiently and reliably –Systematically explore parameter space –Speed up with parallel execution on PC cluster xxx

Structures Solved by Fine Grid Search TargetMol/ASUSites/MolSitesSpace Group Resolution MB3864A4624P PE000293D6954H32.15 PD06751F61484P TB1547G81296P PC06751C620120P FJ5490C12672P12.00 FH7599A* C22.00 *work in progress

PD06751F 454aa/15 Met, 1.9Å P , hexamer in asu Space group choices narrowed down by systematic absence 1080 SHELXD jobs (200 trials each), parameters explored: –E value cutoff ( /0.1) –Number of sites (40-120/10) –Resolution cutoff ( /0.1) 3.5 hrs to finish all 1080 jobs on SDC cluster (220 CPUs) Of 1080 jobs, 39% find correct heavy atom solutions First correct solution within minutes, 84/84 sites found

PE00293D 285aa/11 Met, H3, 2.15Å, hexamer/asu, 2 wavelength MAD, PDB id: 2p SHELXD jobs (200 trials each), parameters explored: –E value cutoff ( /0.1) –Number of sites (20-90/10) –Resolution cutoff ( /0.1) 1 hrs to finish all 760 jobs on SDC cluster (220 CPUs) Solutions are rare, only 12 jobs (out of 760 jobs, 1.5%) find correct heavy atom solutions, 53/54 sites found

TB1547G 409aa (13 Met)/monomer, P , 2 tetramers per asu Initially labeled as something else (TB5131A, 179aa/2 Met) Treated as an unknown target POINTLESS and XPREP to narrow down space group choices, XPREP to generate FA values SHELXD Grid search: –Sites in step of 10 –Resolution cutoff in step of 0.1 –E value cutoff from in step of parallel SHELXD jobs, each SHELXD job attempts 200 trials The job order is randomized to uniformly sample the search space initially Solutions appeared usually appears in minutes, so jobs can be terminated early if necessary Each SHELXD job needs ~1hrs, ~2 hrs for all jobs to finish on SDC cluster (220 CPUs) Interpretation of density map gave correct identification of the target

FH7599A: MR+MAD Estimated monomers per asu, Heavy atom sites No highly homologous (>20% seq id) MR models FFAS or PSI-BLAST identified a remote sequence homolog TM0064 (14% seq id) TM0064 trimer poly-alanine is used as MR model, use of the trimer as MR template significantly improved signal to noise in MR procedure Density modification is critical for improving MR phases Improved DM phases + MAD data to locate ~200 heavy atom sites and MAD phasing rmsd 2.42 Å for 82% C  FH7599A vs TM0064