Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia

Similar presentations

Presentation on theme: "Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia"— Presentation transcript:

1 Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study
Rajkumar Buyya Melbourne, Australia


3 Contents Introduction Resource Management challenges Nimrod-G Toolkit
SPMD/Parameter-Study Creation Tools Grid enabling Drug Design Application Nimrod-G Grid Resource Broker Scheduling Experiments on World Wide Grid Conclusions Scheduling Economics Grid Economy Grid

4 A typical Grid environment and Players
Resource Broker Application Resource Broker

5 Grid Characteristics Heterogeneous Distributed
Resource Types: PC, WS, Clusters Resource Architecture: CPU Arch, OS Applications: CPU/IO/message intensive Users and Owners Requirements Access Price: different for different users, resources and time. Availability: varies from time to time. Distributed Resources Ownership Users Each have their own (private) policies and objectives. Very much similar to heterogeneity and decentralization that is present in “human economies” (democratic and capitalist world). Hence, we propose the use of “economics” as a metaphor for resource management and scheduling. It regulates supply and demand for resources and offers incentive for resource owners for contributing resources to the Grid.

6 Grid Tools for Handling
Uniform Access System Management Computational Economy Security Resource Discovery Resource Allocation & Scheduling Data locality Network Management Application Development

7 Nimrod-G: Grid Resource Broker
A resource broker for managing, steering, and executing task farming (parametric sweep/SPMD model) applications on Grid based on deadline and computational economy. Based on users’ QoS requirements, our Broker dynamically leases services at runtime depending on their quality, cost, and availability. Key Features A single window to manage & control experiment Persistent and Programmable Task Farming Engine Resource Discovery Resource Trading Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & results Steering & data management Accounting

8 Parametric Processing
Parameters Magic Engine for Manufacturing Humans! Multiple Runs Same Program Multiple Data Killer Application for the Grid! Courtesy: Anand Natrajan, University of Virginia

9 Sample P-Sweep Applications
Bioinformatics: Drug Design / Protein Modelling Combinatorial Optimization: Meta-heuristic parameter estimation Ecological Modelling: Control Strategies for Cattle Tick Sensitivity experiments on smog formation Data Mining High Energy Physics: Searching for Rare Events Electronic CAD: Field Programmable Gate Arrays Computer Graphics: Ray Tracing Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Civil Engineering: Building Design Automobile: Crash Simulation Network Simulation Aerospace: Wing Design astrophysics

10 Virtual Drug Design: Data Intensive Computing on Grid
A Virtual Laboratory for “Molecular Modelling for Drug Design” on Peer-to-Peer Grid. It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design. In collaboration with: Kim Branson, Structural Biology, Walter and Eliza Hall Institute (WEHI)

11 Molecule to be screened
Dock input file score_ligand yes minimize_ligand yes multiple_ligands no random_seed anchor_search no torsion_drive yes clash_overlap conformation_cutoff_factor 3 torsion_minimize yes match_receptor_sites no random_search yes maximum_cycles ligand_atom_file S_1.mol2 receptor_site_file ece.sph score_grid_prefix ece vdw_definition_file parameter/vdw.defn chemical_definition_file parameter/chem.defn chemical_score_file parameter/chem_score.tbl flex_definition_file parameter/flex.defn flex_drive_file parameter/flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2 Molecule to be screened

12 Parameterize Dock input file (use Nimrod Tools: GUI/language)
score_ligand $score_ligand minimize_ligand $minimize_ligand multiple_ligands $multiple_ligands random_seed $random_seed anchor_search $anchor_search torsion_drive $torsion_drive clash_overlap $clash_overlap conformation_cutoff_factor $conformation_cutoff_factor torsion_minimize $torsion_minimize match_receptor_sites $match_receptor_sites random_search $random_search maximum_cycles $maximum_cycles ligand_atom_file ${ligand_number}.mol2 receptor_site_file $HOME/dock_inputs/${receptor_site_file} score_grid_prefix $HOME/dock_inputs/${score_grid_prefix} vdw_definition_file vdw.defn chemical_definition_file chem.defn chemical_score_file chem_score.tbl flex_definition_file flex.defn flex_drive_file flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2 Molecule to be screened

13 Create Dock PlanFile 1. Define Variable and their value
parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300"; parameter score_ligand text default "yes"; parameter minimize_ligand text default "yes"; parameter multiple_ligands text default "no"; parameter random_seed integer default 7; parameter anchor_search text default "no"; parameter torsion_drive text default "yes"; parameter clash_overlap float default 0.5; parameter conformation_cutoff_factor integer default 5; parameter torsion_minimize text default "yes"; parameter match_receptor_sites text default "no"; parameter random_search text default "yes"; parameter maximum_cycles integer default 1; parameter receptor_site_file text default "ece.sph"; parameter score_grid_prefix text default "ece"; parameter ligand_number integer range from 1 to 2000 step 1; Molecules to be screened

14 Create Dock PlanFile 2. Define Task that jobs need to do
task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:. endtask task main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobname

15 Use Nimrod-G Submit & Play!

16 A Nimrod/G Monitor Cost Deadline Legion hosts Globus Hosts
Bezek is in both Globus and Legion Domains A Nimrod/G Monitor Cost Deadline

17 Adaptive Scheduling Algorithms
Discover More Resources Discover Resources Establish Rates Compose & Schedule Evaluate & Reschedule Meet requirements ? Remaining Jobs, Deadline, & Budget ? Distribute Jobs

18 Scheduling Experiment on World Wide Grid Testbed
WW Grid Scheduling Experiment on World Wide Grid Testbed Cardiff/UK Portsmoth/UK TI-Tech/Tokyo ETL/Tsukuba AIST/Tsukuba ANL/Chicago USC-ISC/LA UTK/Tennessee UVa/Virginia Dartmouth/NH BU/Boston EUROPE: ZIB/Germany PC2/Germany AEI/Germany Lecce/Italy CNR/Italy Calabria/Italy Pozman/Poland Lund/Sweden CERN/Swiss Kasetsart/Bangkok Monash/Melbourne VPAC/Melbourne Santiago/Chile

19 Deadline and Budget Constrained Scheduling Experiment
Workload: 165 jobs, each need 5 minute of CPU time Deadline: 2 hrs. and budget: units Strategy: minimise time / cost Execution Cost with cost optimisation Optimise Cost: (G$) (finished in 2hrs.) Optimise Time: (G$) (finished in 1.25 hr.) In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. Users can now trade-off between Time Vs. Cost.

20 World Wide Grid (WWG) Internet Australia North America Monash Uni.:
WW Grid World Wide Grid (WWG) Australia North America Monash Uni.: ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster Nimrod/G Linux cluster Globus+Legion GRACE_TS Solaris WS Globus/Legion GRACE_TS Internet WW Grid Asia/Japan Europe Tokyo I-Tech.: ETL, Tuskuba ZIB/FUB: T3E/Mosix Cardiff: Sun E6500 Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster Pozman: SGI/SP2 Linux cluster Globus + GRACE_TS Chile: Cluster Globus + GRACE_TS Globus + GRACE_TS South America

21 Resources Selected & Price/CPU-sec.
Resource & Location Grid services & Fabric Cost/CPU sec. or unit No. of Jobs Executed Time_Opt Cost_Opt Linux Cluster-Monash, Melbourne, Australia Globus, GTS, Condor 2 64 153 Linux-Prosecco-CNR, Pisa, Italy Globus, GTS, Fork 3 7 1 Linux-Barbera-CNR, Pisa, Italy 4 6 Solaris/Ultas2 TITech, Tokyo, Japan 9 SGI-ISI, LA, US 8 37 5 Sun-ANL, Chicago,US 42 Total Experiment Cost (G$) 237000 115200 Time to Complete Exp. (Min.) 70 119

22 DBC Scheduling for Time Optimization

23 DBC Scheduling for Cost Optimization

24 Conclusions P2P and Grid Computing is emerging as a next generation computing platform for solving large scale problems through sharing of geographically distributed resources. Resource management is a complex undertaking as systems need to be adaptive, scalable, competitive,…, and driven by QoS. We proposed a framework based on “computational economies” and discussed several economic models for resource allocation and for regulating supply-and-demand for resources. Scheduling experiments on World Wide Grid demonstrate our Nimrod-G broker ability to dynamically lease or rent services at runtime based on their quality, cost, and availability depending on consumers QoS requirements. Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board. Economics paradigm for QoS driven resource management is essential to push P2P/Grids into mainstream computing!

25 Download Software & Information
Nimrod & Parameteric Computing: Economy Grid & Nimrod/G: Virtual Laboratory/Virtual Drug Design: Grid Simulation (GridSim) Toolkit (Java based): World Wide Grid (WWG) testbed: Looking for new volunteers to grow  Please contact me to barter your & our machines! Want to build on our work/collaborate: Talk to me now or

Download ppt "Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia"

Similar presentations

Ads by Google