Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia

Similar presentations

Presentation on theme: "Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia"— Presentation transcript:

1 Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia

2 2

3 3 Contents Introduction Resource Management challenges Nimrod-G Toolkit SPMD/Parameter-Study Creation Tools Grid enabling Drug Design Application Nimrod-G Grid Resource Broker Scheduling Experiments on World Wide Grid Conclusions SchedulingEconomics Grid Economy Grid

4 4 A typical Grid environment and Players Resource Broker Application

5 5 Grid Characteristics Heterogeneous Resource Types: PC, WS, Clusters Resource Architecture: CPU Arch, OS Applications: CPU/IO/message intensive Users and Owners Requirements Access Price: different for different users, resources and time. Availability: varies from time to time. Distributed Resources Ownership Users Each have their own (private) policies and objectives. Very much similar to heterogeneity and decentralization that is present in human economies (democratic and capitalist world). Hence, we propose the use of economics as a metaphor for resource management and scheduling. It regulates supply and demand for resources and offers incentive for resource owners for contributing resources to the Grid.

6 6 Grid Tools for Handling Security Resource Allocation & Scheduling Data locality System Management Resource Discovery Uniform Access Computational Economy Application Development Network Management

7 7 A resource broker for managing, steering, and executing task farming (parametric sweep/SPMD model) applications on Grid based on deadline and computational economy. Based on users QoS requirements, our Broker dynamically leases services at runtime depending on their quality, cost, and availability. Key Features A single window to manage & control experiment Persistent and Programmable Task Farming Engine Resource Discovery Resource Trading Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & results Steering & data management Accounting Nimrod-G: Grid Resource Broker

8 8 Parametric Processing Multiple Runs Same Program Multiple Data Killer Application for the Grid! Parameters Courtesy: Anand Natrajan, University of Virginia Magic Engine for Manufacturing Humans!

9 9 Sample P-Sweep Applications Bioinformatics: Drug Design / Protein Modelling Bioinformatics: Drug Design / Protein Modelling Sensitivity experiments on smog formation Combinatorial Optimization: Meta-heuristic parameter estimation Ecological Modelling: Control Strategies for Cattle Tick Electronic CAD: Field Programmable Gate Arrays Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Aerospace: Wing Design Network Simulation Automobile: Crash Simulation Data Mining Civil Engineering: Building Design astrophysics

10 10 Virtual Drug Design: Data Intensive Computing on Grid A Virtual Laboratory for Molecular Modelling for Drug Design on Peer-to-Peer Grid. It provides tools for examining millions of chemical compounds (molecules) in the Protein Data Bank (PDB) to identify those having potential use in drug design. In collaboration with: Kim Branson, Structural Biology, Walter and Eliza Hall Institute (WEHI)

11 11 Dock input file score_ligand yes minimize_ligand yes multiple_ligands no random_seed 7 anchor_search no torsion_drive yes clash_overlap 0.5 conformation_cutoff_factor 3 torsion_minimize yes match_receptor_sites no random_search yes maximum_cycles 1 ligand_atom_file S_1.mol2 receptor_site_file ece.sph score_grid_prefix ece vdw_definition_file parameter/vdw.defn chemical_definition_file parameter/chem.defn chemical_score_file parameter/chem_score.tbl flex_definition_file parameter/flex.defn flex_drive_file parameter/flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2 Molecule to be screened

12 12 score_ligand $score_ligand minimize_ligand $minimize_ligand multiple_ligands $multiple_ligands random_seed $random_seed anchor_search $anchor_search torsion_drive $torsion_drive clash_overlap $clash_overlap conformation_cutoff_factor $conformation_cutoff_factor torsion_minimize $torsion_minimize match_receptor_sites $match_receptor_sites random_search $random_search maximum_cycles $maximum_cycles ligand_atom_file ${ligand_number}.mol2 receptor_site_file $HOME/dock_inputs/${receptor_site_file} score_grid_prefix $HOME/dock_inputs/${score_grid_prefix} vdw_definition_file vdw.defn chemical_definition_file chem.defn chemical_score_file chem_score.tbl flex_definition_file flex.defn flex_drive_file flex_drive.tbl ligand_contact_file dock_cnt.mol2 ligand_chemical_file dock_chm.mol2 ligand_energy_file dock_nrg.mol2 Parameterize Dock input file (use Nimrod Tools: GUI/language) Molecule to be screened

13 13 parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300"; parameter score_ligand text default "yes"; parameter minimize_ligand text default "yes"; parameter multiple_ligands text default "no"; parameter random_seed integer default 7; parameter anchor_search text default "no"; parameter torsion_drive text default "yes"; parameter clash_overlap float default 0.5; parameter conformation_cutoff_factor integer default 5; parameter torsion_minimize text default "yes"; parameter match_receptor_sites text default "no"; parameter random_search text default "yes"; parameter maximum_cycles integer default 1; parameter receptor_site_file text default "ece.sph"; parameter score_grid_prefix text default "ece"; parameter ligand_number integer range from 1 to 2000 step 1; Create Dock PlanFile 1. Define Variable and their value Molecules to be screened

14 14 task nodestart copy./parameter/vdw.defn node:. copy./parameter/chem.defn node:. copy./parameter/chem_score.tbl node:. copy./parameter/flex.defn node:. copy./parameter/flex_drive.tbl node:. copy./dock_inputs/get_molecule node:. copy./dock_inputs/dock_base node:. endtask task main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out./results/dock_out.$jobname copy node:dock_cnt.mol2./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2./results/dock_nrg.mol2.$jobname endtask Create Dock PlanFile 2. Define Task that jobs need to do

15 15 Use Nimrod-G Submit & Play!

16 16 Legion hosts Globus Hosts Bezek is in both Globus and Legion Domains A Nimrod/G Monitor Cost Deadline

17 17 Discover Resources Distribute Jobs Establish Rates Meet requirements ? Remaining Jobs, Deadline, & Budget ? Evaluate & Reschedule Discover More Resources Adaptive Scheduling Algorithms Compose & Schedule

18 18 Scheduling Experiment on World Wide Grid Testbed EUROPE: ZIB/Germany PC 2 /Germany AEI/Germany Lecce/Italy CNR/Italy Calabria/Italy Pozman/Poland Lund/Sweden CERN/Swiss EUROPE: ZIB/Germany PC 2 /Germany AEI/Germany Lecce/Italy CNR/Italy Calabria/Italy Pozman/Poland Lund/Sweden CERN/Swiss ANL/Chicago USC-ISC/LA UTK/Tennessee UVa/Virginia Dartmouth/NH BU/Boston ANL/Chicago USC-ISC/LA UTK/Tennessee UVa/Virginia Dartmouth/NH BU/Boston Monash/Melbourne VPAC/Melbourne Monash/Melbourne VPAC/Melbourne Santiago/Chile TI-Tech/Tokyo ETL/Tsukuba AIST/Tsukuba TI-Tech/Tokyo ETL/Tsukuba AIST/Tsukuba Cardiff/UK Portsmoth/UK Cardiff/UK Portsmoth/UK Kasetsart/Bangkok WW Grid

19 19 Deadline and Budget Constrained Scheduling Experiment Workload: 165 jobs, each need 5 minute of CPU time Deadline: 2 hrs. and budget: units Strategy: minimise time / cost Execution Cost with cost optimisation Optimise Cost: (G$) (finished in 2hrs.) Optimise Time: (G$) (finished in 1.25 hr.) In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. Users can now trade-off between Time Vs. Cost.

20 20 Globus+Legion GRACE_TS Australia Monash Uni.: Linux cluster Solaris WS Nimrod/G Globus + GRACE_TS Europe ZIB/FUB: T3E/Mosix Cardiff: Sun E6500 Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster Pozman: SGI/SP2 Globus + GRACE_TS Asia/Japan Tokyo I-Tech.: ETL, Tuskuba Linux cluster Globus/Legion GRACE_TS North America ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster Internet World Wide Grid (WWG) Globus + GRACE_TS South America Chile: Cluster WW Grid

21 21 Resources Selected & Price/CPU-sec. Resource & Location Grid services & Fabric Cost/CPU sec. or unit No. of Jobs Executed Time_OptCost_Opt Linux Cluster-Monash, Melbourne, Australia Globus, GTS, Condor Linux-Prosecco-CNR, Pisa, Italy Globus, GTS, Fork 371 Linux-Barbera-CNR, Pisa, Italy Globus, GTS, Fork 461 Solaris/Ultas2 TITech, Tokyo, Japan Globus, GTS, Fork 391 SGI-ISI, LA, US Globus, GTS, Fork 8375 Sun-ANL, Chicago,US Globus, GTS, Fork 7424 Total Experiment Cost (G$) Time to Complete Exp. (Min.)70119

22 22 DBC Scheduling for Time Optimization

23 23 DBC Scheduling for Cost Optimization

24 24 Conclusions P2P and Grid Computing is emerging as a next generation computing platform for solving large scale problems through sharing of geographically distributed resources. Resource management is a complex undertaking as systems need to be adaptive, scalable, competitive, …, and driven by QoS. We proposed a framework based on computational economies and discussed several economic models for resource allocation and for regulating supply-and-demand for resources. Scheduling experiments on World Wide Grid demonstrate our Nimrod-G broker ability to dynamically lease or rent services at runtime based on their quality, cost, and availability depending on consumers QoS requirements. Easy to use tools for composing applications to run on Grid are essential to attracting and getting application community on board. Economics paradigm for QoS driven resource management is essential to push P2P/Grids into mainstream computing!

25 25 Download Software & Information Nimrod & Parameteric Computing: Economy Grid & Nimrod/G: Virtual Laboratory/Virtual Drug Design: Grid Simulation (GridSim) Toolkit (Java based): World Wide Grid (WWG) testbed: Looking for new volunteers to grow Please contact me to barter your & our machines! Want to build on our work/collaborate: Talk to me now or

Download ppt "Nimrod-G and Virtual Lab Tools for Data Intensive Computing on Grid: Drug Design Case Study Rajkumar Buyya Melbourne, Australia"

Similar presentations

Ads by Google