Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation.

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

Lecture 5 Memory Management Part I. Lecture Highlights  Introduction to Memory Management  What is memory management  Related Problems of Redundancy,
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
Decentralized Reactive Clustering in Sensor Networks Yingyue Xu April 26, 2015.
High Performance Computing Course Notes Grid Computing.
On Large-Scale Peer-to-Peer Streaming Systems with Network Coding Chen Feng, Baochun Li Dept. of Electrical and Computer Engineering University of Toronto.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.
Parallelized Evolution System Onur Soysal, Erkin Bahçeci Erol Şahin Dept. of Computer Engineering Middle East Technical University.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
1: Operating Systems Overview
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
Yuan CMSC 838 Presentation Parallelisation of IBD computation for determining genetic disease map.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Performance Evaluation
P2P-based Simulator for Protein Folding Shun-Yun Hu 2005/06/03.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Distributed Computer Architecture Benjamin Jordan, Kevin Cone, Jason Bradley.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Ajou University, South Korea ICSOC 2003 “Disconnected Operation Service in Mobile Grid Computing” Disconnected Operation Service in Mobile Grid Computing.
Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI CSCI.
Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.
Chapter 3 Memory Management: Virtual Memory
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Conformational Sampling
Highly Distributed Parallel Computing Neil Skrypuch COSC 3P93 3/21/2007.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Wireless Networks Breakout Session Summary September 21, 2012.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Example: Sorting on Distributed Computing Environment Apr 20,
EFFECTIVE LOAD-BALANCING VIA MIGRATION AND REPLICATION IN SPATIAL GRIDS ANIRBAN MONDAL KAZUO GODA MASARU KITSUREGAWA INSTITUTE OF INDUSTRIAL SCIENCE UNIVERSITY.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Distributed Architectures for Medical Systems Andrew A. Kitchen Computer Integrated Surgery 8 March 2001.
April 14, 2004 The Distributed Performance Consultant: Automated Performance Diagnosis on 1000s of Processors Philip C. Roth Computer.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
1 RealProct: Reliable Protocol Conformance Testing with Real Nodes for Wireless Sensor Networks Junjie Xiong, Edith C.-Ngai, Yangfan Zhou, Michael R. Lyu.
Home - Distributed Parallel Protein folding Chris Garlock.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Best detection scheme achieves 100% hit detection with
Cluster Based Protein Folding Douglas Fuller and Brandon McKethan.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Self-service, with applications to distributed classifier construction Michael K. Reiter and Asad Samar April 27, 2006 Properties & Related Work Self-Service.
Computer Architecture Lecture 25 Fasih ur Rehman.
A very short introduction to Project Supercomputing Center, KISTI Chan Yeol (Connor) Park
The Accelerated Weighted Ensemble
Modeling molecular dynamics from simulations
Grid Computing.
Grid Computing Colton Lewis.
ReMoDy Reactive Molecular Dynamics for Surface Chemistry Simulations
Home - Distributed Parallel Protein folding
Grid Computing Done by: Shamsa Amur Al-Matani.
Experimental Overview
Distributed Systems and Algorithms
Presentation transcript:

Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation

CMSC 838T – Presentation Overview u Overview of talk  Motivation  Challenge  Methods l Ensemble Dynamics l  Evaluation  Observations

CMSC 838T – Presentation Motivation u Atomistic simulation of protein folding  understand dynamics of folding  real-time folding in full atomic detail  large-scale parallelization methods u Benefits  protein folding & disease l protein self-assemble to function l proteins misfold  diseases  nanotechnology l nanomachines l self-assemble on the nanoscale

CMSC 838T – Presentation Challenge u Difficulties  limited by current computational techniques l fastest folding in microseconds l one CPU: 1ns/day, 30 years l 10,000 fold computational gap u 1,000 CPUs, 1 microsecond / day  traditional parallelization scheme l hard to scale to a large amount of processors l extremely fast communication l complexity of coordination l expensive supercomputers u cost u time-sharing

CMSC 838T – Presentation Method u ensemble dynamics  a new simulation algorithm  parallel simulation u  heterogeneous network, Internet  large-scale distributed platform

CMSC 838T – Presentation Simulation of Dynamics u free energy barrier  progress from one state to another: transition  thermal fluctuations to push system over free energy barrier u previous approaches: sampling  maybe stuck in meta-stable free energy minima  expensive computational cost of sampling

CMSC 838T – Presentation Ensemble Dynamics u application scenario  waiting time of transitions dominates total time  protein folding l transition: free energy barrier crossing  coupled simulations: transition coupling u Algorithm  M independent simulations from a initial condition  first simulation to cross free energy barrier l M times less to cross barrier than average time  restart M simulations with the new location after transition u Near linear speed up in #processors  exponential kinetics: f(t) = 1 – exp(-k t)  If k * t is small, f(t) = k * t  M simulations  M * f(t) = M * k * t folding events

CMSC 838T – Presentation Limitations u barrier crossing probability  exponential assumptions u correct transition detection  transition: free energy barrier crossing  a large variance in energy: threshold  correct detection is not guaranteed u multiple possible transition  not addressed  selection of the first transition

CMSC 838T – Presentation Distributed Computing u Distributed simulations  M processors for each run  simulate folding in atomic detail on each processor  restart once a crossing barrier event occurs u Implementation:  worldwide distributed computing: Internet  started in October 2000 l more than 200,000 participants l 10,000 CPU-years in the first 12 months

CMSC 838T – Presentation

CMSC 838T – Presentation u client-server architecture  server assign jobs(work unit) to client  client sends back results after computation  ~100K data transfer between client and server u why is ensemble dynamics good for  CPU intensive job: a few hours, often days  connection speed: modem, good enough  suitable for

CMSC 838T – Presentation Work u  search for intelligent life outside Earth  data analysis of signals u  find drug therapy for HIV  how drugs interact with various HIV virus mutations u distributed projects  Divide-and-Conquer  CPU intensive jobs  small pieces of data(kilobytes) transfer  communication not a major concern

CMSC 838T – Presentation Evaluation u  based on Tinker molecular dynamics code  voluntary participants worldwide, over 400,000 CPUs u simulate folding and unfolding  folding rates  simulations on small proteins

CMSC 838T – Presentation Folding Rates

CMSC 838T – Presentation Folding & Unfolding

CMSC 838T – Presentation Observations u Sampling  too expensive to run for a long timescales  waste too much time lingering in local energy minima u Ensemble dynamics  speed up simulations of dynamics  biological meaning of simulations results?  results on large protein folding?  limitations: correct transition detection, transition probability u  cheap way to achieve super computation power  huge distributed computing platform: over 400,000 CPUs  an efficient approach for CPU intensive job u Complexity of problems and size of data increase rapidly  find better algorithm is preferable to buying supercomputers