High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud Sally R. Ellingson Graduate Research Assistant Center.

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Miroslav Brumovský28th July 2011 Miroslav Brumovský 28 th July 2011 Methods using polarization for in silico fragment-based drug design.
Developing a MapReduce Application – packet dissection.
SALSA HPC Group School of Informatics and Computing Indiana University.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
AutoDock 4 and AutoDock Vina -Brief Intruction
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Extending a molecular docking tool to run simulations on clouds Damjan Temelkovski Dr. Tamas Kiss Dr. Gabor Terstyanszky University of Westminster.
National Institute on Aging Richard J. Hodes, M.D. Director,NIA/NIH/DHHS ADC Meeting – NIH Roadmap and Budget October 2003.
Protein Structure and Drug Discovery Workshop To be held at Monash University, Mebourne, Australia October 3 rd to 4 th 2006 Molecular Visualization Learn.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
Developing Reusable Software Infrastructure – Middleware – for Multiscale Modeling Wilfred W. Li, Ph.D. National Biomedical Computation Resource Center.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.
A genetic algorithm for structure based de-novo design Scott C.-H. Pegg, Jose J. Haresco & Irwin D. Kuntz February 21, 2006.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Application of e-infrastructure to real research.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Appraisal and Data Mining of Large Size Complex Documents Rob Kooper, William McFadden and Peter Bajcsy National Center for Supercomputing Applications.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Building Grid-enabled Virtual Screening Service.
1 Developing domain specific gateways based on the WS- PGRADE/gUSE framework Peter Kacsuk MTA SZTAKI Start date: Duration:
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
G. Terstyanszky, T. Kukla, T. Kiss, S. Winter, J.: Centre for Parallel Computing School of Electronics and Computer Science, University of.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
Parameter Sweep Workflows for Modelling Carbohydrate Recognition ProSim Project Tamas Kiss, Gabor Terstyanszky, Noam Weingarten.
INFSO-RI Enabling Grids for E-sciencE V. Breton, 30/08/05, seminar at SERONO Grid added value to fight malaria Vincent Breton EGEE.
1 Computational Biophysics and Drug Design Jung-Hsin Lin ( 林榮信 ) Division of Mechanics, Research Center for Applied Sciences & Institute of Biomedical.
Page 1 SCAI Dr. Marc Zimmermann Department of Bioinformatics Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Grid-enabled drug discovery.
Università degli Studi di Milano Dipartimento di Scienze Farmaceutiche “Pietro Pratesi” Alessandro Pedretti GriDock: An MPI-based software for virtual.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Hierarchical Database Screenings for HIV-1 Reverse Transcriptase Using a Pharmacophore Model, Rigid Docking, Solvation Docking, and MM-PB/SA Junmei Wang,
1 COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY 1/9/2008 SAXS Software.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
GRIDP: Web-enabled Drug Discovery Is there any way I can use computational tools to reduce the number of molecules I have to screen to a manageable number,
See also: See also: 1. a short film produced by Studio KO graphic designers, which introduces the key notions for drug.
See also: See also: 1. a short film produced with the help of graphic designers Studio KO introduced the key notions for.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Next Generation of Apache Hadoop MapReduce Owen
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Autoligand is a script that comes with autodock tools. It has two modes: Find #n binding site within the grid region. Define shape and volume of binding.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
Docking and Virtual Screening Using the BMI cluster
Molecular Modeling in Drug Discovery: an Overview
FESR Consorzio COMETA - Progetto PI2S2 Molecular Modelling Applications Laura Giurato Gruppo di Modellistica Molecolare (Prof.
Page 1 Computer-aided Drug Design —Profacgen. Page 2 The most fundamental goal in the drug design process is to determine whether a given compound will.
Organizations Are Embracing New Opportunities
Peter Kacsuk, Zoltan Farkas MTA SZTAKI
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Simplified picture of the principles used for multiple copy simultaneous search (MCSS) and for computational combinatorial ligand design (CCLD). Simplified.
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
Jason Chiang Department of Electrical and Computer Engineering
DATA MINING FOR SMALL MOLECULE ALLOSTERIC INHIBITORS
Hadoop Clusters Tess Fulkerson.
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Cloud Distributed Computing Environment Hadoop
Volume 19, Issue 1, Pages (January 2012)
University of Westminster Centre for Parallel Computing
Cheminformatics Basics
Presentation transcript:

High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud Sally R. Ellingson Graduate Research Assistant Center for Molecular Biophysics, UT/ORNL Department of Genome Science and Technology, UT Scalable Computing and Leading Edge Innovative Technologies (IGERT) Dr. Jerome Baudry PhD Advisor Center for Molecular Biophysics, UT/ORNL Department of BCMB, UT The Second International Emerging Computational Methods for the Life Sciences Workshop ACM International Symposium on High Performance Distributed Computing June 8, 2011, San Jose, CA

Ultimate Goal: Reduce the time and cost of discovering novel drugs

1.Virtual Molecular Docking a)Novel Drug Discovery b)Virtual high-throughput screenings (VHTS) 2.Cloud Computing a)Advantages for VHTS b)Kandinsky c)Hadoop (MapReduce) 3.AutoDockCloud a)Current Implementation b)Future Implementations

Virtual Molecular Docking Given a receptor (protein) and ligand (small molecule), predict 1.Bound conformations Search algorithm to explore conformational space 2.Binding affinity Force field to evaluate energetics

Novel Drug Discovery Human HDAC4 HA3 crystal structure ZINC

Virtual High-Throughput Screening (VHTS)

VHTS with Autodock4

Potential advantages of Cloud Computing for VHTS Affordable access to compute resources (especially for small labs and classrooms). Easy to use interface accessible through web for non-computer experts. Software maintained by experts. Scalable resources for size of screening.

Kandinsky Private Cloud Platform at ORNL Kandinsky, the Systems Biology Knowledgebase Computer, Sponsored by the Office of Biological and Environmental Research in the DOE Office of Science 68 nodes X 16 cores/node = 1088 cores 20 Gbps Infiniband Interconnect Designed to support Hadoop applications and gain an understanding of the MapReduce paradigm. 57 nodes for MapReduce tasks 1 tasktracker per node 10 map and 6 reduce tasks per node (16 tasks per node) 570 map tasks and 342 reduce tasks can run simultaneously on Kandinsky

Hadoop Scalable Economical Efficient Reliable

MapReduce programming paradigm used by Hadoop people.apache.org

Current AutoDockCloud Implementation input=file names needed for each docking map(input) { copy input to local working directory; run AutoDock4 locally; copy result file to HDFS; } *pre-docking set-up and post-docking analysis is currently done manually *no reduce function is currently being used

Current AutoDockCloud Implementation Er Agonist screening from DUD as benchmark 450 speed-up with 570 available map slots on Kandinsky, private cloud at ORNL

Current AutoDockCloud Implementation Docking enrichment plot for ER agonist using AutoDockCloud and DUD. Percent of known ligands found Percent of ranked database

Future AutoDockCloud Implementation input=ligand file from chemical compound database map(input) { create pdbqt (AutoDock input file) from input; run AutoDock4 locally; find best scoring ligand structure; save structure to HDFS; return ; } reduce( ) { sort; return ranked_database; } *pre-docking and post-docking will be automated and distributed *less total I/O requirements

Future Plans Incorporate additional docking engines – Autodock Vina Less I/O More efficient and accurate algorithm No charge information needed Deploy on Commercial Cloud (EC2) Develop web interface

1.Virtual Molecular Docking a)Novel Drug Discovery b)Virtual high-throughput screenings (VHTS) 2.Cloud Computing a)Advantages for VHTS b)Kandinsky c)Hadoop (MapReduce) 3.AutoDockCloud a)Current Implementation b)Future Implementations

Questions/Comments Acknowledgements Dr. Jerome Baudry (advisor) Center for Molecular Biophysics, UT/ORNL Genome Science and Technology, UT Scalable Computing and Leading Edge Innovative Technologies (IGERT) Avinash Kewalramani, ORNL ECMLS and HPDC organizers and participants