Rajkumar Buyya School of Computer Science and Software Engineering

Seamless, Scalable Computing from Desktops to Global Computational Power Grids: Hype or Reality ?
Rajkumar Buyya School of Computer Science and Software Engineering Monash University, Melbourne, Australia | Internet Home: | Thanks to: David Abramson Jonathan Giddy Jack Dongarra Wolfgang Gentzsch

Agenda Computing Platforms and How Grid is Different ?
Cluster Computing Technologies and new challenges ? Clusters of Clusters (Federated/Hyper Clusters) Towards Global (Grid) Computing Grid Resource Management and Scheduling Challenges Grid Technologies (in development) Grid Substrates and Interoperability Grid Middleware Resource Brokers and High Level Tools Applications and Portals Conclusion

Computing Power (HPC) Drivers
Solving grand challenge applications using computer modeling, simulation and analysis E-commerce/anything Life Sciences Aerospace Digital Biology CAD/CAM Military Applications Military Applications Military Applications

How to Run App. Faster ? There are 3 ways to improve performance: 1. Work Harder 2. Work Smarter 3. Get Help Computer Analogy 1. Use faster hardware: e.g. reduce the time per instruction (clock cycle). 2. Optimized algorithms and techniques 3. Multiple computers to solve problem: That is, increase no. of instructions executed per clock cycle.

Computing Platforms Evolution Breaking Administrative Barriers
2100 ? PERFORMANCE 2100 Administrative Barriers Individual Group Department Campus State National Globe Inter Planet Universe Desktop (Single Processor?) SMPs or SuperComputers Local Cluster Enterprise Cluster/Grid Global Cluster/Grid Inter Planet Cluster/Grid ??

Towards Inexpensive Supercomputing
It is: Cluster Computing.. The Commodity Supercomputing!

Clustering of Computers for Collective Computing: Trends
1990 1995+ 1960

Clustering Today Clustering is gained momentum when 4 technologies converged: 1. Killer Micros/CPUs: Very HP Microprocessors workstation performance = yesterday supercomputers. 2. Killer Networks: High speed communication Comm. between cluster nodes >= between processors in an SMP. 3. Killer Tools: Standard tools for parallel/ distributed computing & their growing popularity. 4. Killer Applications: scientific, engineering, Database, and Internet/Web serving.

Cluster Computer Architecture

Major issues in cluster design
Scalability (physical & application) Performance High Availability (fault-tolerance and failure management) Single System Image (look-and-feel of one system) Fast Communication (networks & protocols) Load Balancing (CPU, Memory, Disk, Net) Security and Encryption (clusters of clusters) Distributed Environment (Social issues) Manageability (administration and control) Programmability (simple API if required) Applicability (cluster-aware and non-aware apps.)

Killer Micro: Technology Trend (lead to the death of traditional supercomputers)

Killer Networks and Protocols
Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12micro-sec latency) ATM Myrinet (1.2Gbps) Digital Memory Channel Internet!!! Protocols (Light weight protocols (User Level) TCP/IP? Active Messages (Berkeley) Basic Interface for Parallelism (Lyon, France) Fast Messages (Illinois) U-net (Cornell) VIA (Industry standard)…...

Killer Tools/Middleware (e.g., HKU JESSICA)

Killer Applications Numerous Scientific & Engineering Apps.
Parametric Simulations Business Applications E-commerce Applications (Amazon.com, eBay.com ….) Database Applications (Oracle on cluster) Decision Support Systems Internet Applications Web serving Infowares (yahoo.com, AOL.com) ASPs (application service providers) eChat, ePhone, eBook, eCommerce, eBank, eSociety, eAnything! Computing Portals Mission Critical Applications command control systems, banks, nuclear reactor control, star-war, and handling life threatening situations.

Science Portals - e.g., PAPIA system
Pentiums Myrinet NetBSD/Linuux PM Score-D MPC++ RWCP Japan: PAPIA PC Cluster

Cluster Computing Forum
IEEE Task Force on Cluster Computing (TFCC)

Adoption of the Approach

Clusters of Clusters (HyperClusters)
Scheduler Master Daemon Execution Submit Graphical Control Clients Cluster 2 Cluster 3 Cluster 1 LAN/WAN

Towards Grid Computing….
For illustration, placed resources arbitrarily on the GUSTO test-bed!!

What is Grid ? An infrastructure that couples
Computers (PCs, workstations, clusters, traditional supercomputers, and even laptops, notebooks, mobile computers, PDA, and so on) … Software ? (e.g., ASPs renting expensive special purpose applications on demand) Databases (e.g., transparent access to human genome database) Special Instruments (e.g., radio Searching for Life in galaxy, for pulsars) People (may be even animals who knows ?) across the local/wide-area networks (enterprise, organisations, or Internet) and presents them as an unified integrated (single) resource.

Conceptual view of the Grid
Leading to Portal (Super)Computing

Grid Application-Drivers
Old and New applications getting enabled due to coupling of computers, databases, instruments, people, etc: (distributed) Supercomputing Collaborative engineering high-throughput computing large scale simulation & parameter studies Remote software access / Renting Software Data-intensive computing On-demand computing

The Grid Vision: To offer
“Dependable, consistent, pervasive access to [high-end] resources” Dependable: Can provide performance and functionality guarantees Consistent: Uniform interfaces to a wide variety of resources Pervasive: Ability to “plug in” from anywhere Source:

The Grid Impact! “The global computational grid is expected to drive the economy of the 21st century similar to the electric power grid that drove the economy of the 20th century”

Sources of Complexity in Grid Resource Management
No single adminstrative control No single policy each resource owners have their own policies or scheduling mechanisms. Users must honor them (particularly external users of the grid). Heterogenity of resources (static and dynamic) Unreliable - resource may come or disappear (die) No uniform cost model (it cannot be) varies from one user to another and time to time. No Single access mechanism

Resource Management Challenges introduced by the GRID
Site Autonomy Heteregenous Resources and Substrate Each site has systems with different architectures (SMPs, Clusters.Linux/NT/Unix./Intel) each resource owners have their own policies or scheduling mechanisms (Codine/Condor). Policity Extensibility (through resource brokers) Resource Allocation/Coallocation Online Control (can Apps (Graphics) tolerate nonavailablity of required resource and adapt itself) Computational Economy

Grid Resource Management: Challenging Issues
Authentication (once) Specify simulation (code, resources, etc.) Discover resources Negotiate authorization, acceptable use, Cost, etc. Acquire resources Schedule Jobs Initiate computation Steer computation Access remote data-sets Collaborate on results Account for usage Domain 1 Domain 2 Ack.: globus..

Grid Components … … … … Grid Apps. Scientific Engineering
Applications and Portals Grid Apps. … Scientific Engineering Collaboration Prob. Solving Env. Web enabled Apps Development Environments and Tools Grid Tools … Languages Libraries Debuggers Monitoring Resource Brokers Web tools Distributed Resources Coupling Services Grid Middleware … Comm. Sign on & Security Information Process Data Access QoS Local Resource Managers Operating Systems Queuing Systems Libraries & App Kernels … TCP/IP & UDP Grid Fabric Networked Resources across Organisations … Computers Clusters Storage Systems Data Sources Scientific Instruments

Many GRID Projects and Initiatives
PUBLIC FORUMS Computing Portals Grid Forum European Grid Forum IEEE TFCC! GRID’2000 and more. Australia Nimrod/G EcoGrid and GRACE DISCWorld Europe UNICORE MOL METODIS Globe Poznan Metacomputing CERN Data Grid MetaMPI DAS JaWS and many more... Public Grid Initiatives Distributed.net Compute Power Grid USA Globus Legion JAVELIN AppLes NASA IPG Condor Harness NetSolve NCSA Workbench WebFlow EveryWhere and many more... Japan Ninf Bricks

Distributed ASCI Supercomputer
Many GRID Testbeds... GUSTO Distributed ASCI Supercomputer NASA IPG

Scheduler Architecture Alternatives
Scheduler Model Scheduler Architecture Example System Centralized (Single Resource) R Unix, Linux, Windows OS R1 R2 Rn . Cluster systems such as LSF, Codine, Nimrod, Condor Centralized (Multiple Resources) (Single/Multiple Domains) (Jobs) (Queue) R Multi-clusters or Grid Systems such as AppLes, Nimrod/G, HTB, Legion, Condor Hierarchical (Multiple Domains) R1 (Local Scheduler) . (Resource Broker or Super Scheduler) . ... R Rn R Cluster systems like MOSIX follows simple model. Grid systems can follow, but it is complex to realize. ……. Decentralized (Self coordinated or Job Pool) (Single/Multiple Domains) R1 (Job Pool) . . Rn

Grid Components and our Focus: Resource Brokers and Computational Economy
Apps. Applications High-level Services and Tools GlobusView Testbed Status Grid Tools DUROC MPI MPI-IO CC++ Nimrod/G globusrun Core Services Grid Middleware Heartbeat Monitor Nexus GRAM GRACE Globus Security Interface Gloperf MDS GASS GARA Grid Fabric Local Services Condor MPI TCP UDP LSF Easy NQE AIX Irix Solaris Source: Globus

Nimrod/G Resource Broker
Nimrod/G Approach to Resource Management and Scheduling

What is Nimrod/G ? A global scheduler for managing and steering task farming (parametric simulation) applications on computational grid based on deadline and computational economy. Key Features A single window to manage and control whole experiment Resource Discovery Trade for Resources Scheduling Steering & data management Leverages Globus Services Other parallel application models can be supported easily (MPI & DUROC co-allocation)

A Nimrod/G User Console
Deadline Cost Available Machines

Nimrod/G Architecture
Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod Engine Schedule Advisor Persistent Store Trading Manager Dispatcher Grid Explorer TM TS Middleware Services GE GIS Grid Information Services RM & TS RM & TS RM & TS GUSTO Test Bed RM: Local Resource Manager, TS: Trade Server

Nimrod/G Interactions
Resource location MDS server Additional services used implicitly: GSI (authentication & authorization) Nexus (communication) Scheduler Trade Server Resource allocation (local) Prmtc.. Engine Dispatcher GRAM server Queuing System Job Wrapper User process GASS server File access Root node Gatekeeper node Computational node

Resource Location/Discovery
Need to locate suitable machines for an experiment Speed Number of processors Cost Availability User account Available resources will vary across experiment Supported through Directory Server (Globus MDS)

Computational Economy
Resource selection on based real money and market based A large number of sellers and buyers (resources may be dedicated/shared) Negotiation: tenders/bids and select those offers meet the requirement Trading and Advance Resource Reservation Schedule computations on those resources that meet all requirements

Cost Model 1 3 2 User 5 Machine 1 User 1 Machine 5 non-uniform costing
time to time one user to another usage duration encourages use of local resources first user can access remote resources, but pays a penalty in higher cost.

Nimrod/G Scheduling Algorithm
Find a set of machines (MDS search) Distribute jobs from root to machines Establish job consumption rate for each machine For each machine Can we meet deadline? If not, then return some jobs to root If yes, distribute more jobs to resource If cannot meet deadline with current resource Find additional resources

Nimrod/G Scheduling algorithm ...
Locate more Machines Locate Machines Establish Rates Re-distribute Jobs Distribute Jobs Meet Deadlines?

Resource Usage (for various deadlines)

Scheduling Methods for Global Resource Allocation
1. Equal Assignment, but no Load Balancing 2. Equal Assignment and Load Balanced 3. (2) + deadline (no worry about cost) 4. (2) + deadline (minimize the cost of computation) 5. (4) + budget --> deadline + cost (up front agreement) I am willing to pay $$$, can you complete by deadline. 6. (5) + Advance Resource Reservation 6. (6) + Grid Economy - dynamic pricing - use Trading/Tender/Bid process 7. (6) + Grid Economy - dynamic pricing - use auction technique 8. Genetic/Fuzzy Logic, etc algorithms

Grid Architecture for Computational Economy
GRACE Grid Architecture for Computational Economy GRACE aims help Nimrod/G overcome the current limitations. GRACE middleware offer generic interfaces (APIs) that other developers of grid tools can use along with Globus services.

Grid Resource Management Architecture (Economic/Market-based)
Grid Information Server Grid Explorer Application Job Control Agent Trading Schedule Advisor Trade Server Other components of local resource manager Resource Reservation Trade Manager Deployment Agent Resource Allocation R1 R2 … Rn User Resource Broker A Resource Domain

Why Computational Economy in Resource Management ?
“Observe Grid characteristics and current resource management policies” Grid resources are not owned by user or single organisation. They have their own administrative policy Mismatch in resource demand and supply overall resource demand may exceed supply. Traditional System-centric (performance matrix approaches does not suit in grid environment. System-Centric --> User Centric Like in real life, economic-based approach is one of the best ways to regulate selection and scheduling on the grid as it captures user-intent. Markets are an effective institution in coordinating the activities of several entities.

Advantages of Economic-based RM
System Centric --> User Centric Policy in RM Helps in regulating demand and supply resource access cost can fluctuate (based on demand and supply and system can adapt) Scalable Solution No need of central coordinator (during negotiation) Resources(sellers) and also Users(buyers) can make their own decisions and try to maximize utility and profit. Uniform Treatment of all Resources Everything can can be traded including CPU, Mem, Net, Storage/Disk, other devices/instruments Efficient allocation of resources

Grid Open Trading Protocols
Trade Manager Trade Server Get Connected Pricing Rules Call for Bid(DT) Reply to Bid (DT) Negotiate Deal(DT) …. API Confirm Deal(DT, Y/N) DT - Deal Template - resource requirements (BM) - resource profile (BS) - price (any one can set) - status - change the above values - negotiation can continue - accept/decline - validity period Cancel Deal(DT) Change Deal(DT) Get Disconnected

Open Trading Finite State Machine
DT < TM, Request for Resource > < TM, Ask Price > << TS, Update >> << TM, Update >> DT <TS, Bid > < TS, Final Offer > < TM, Final Offer > Offer TS Offer TM <TM, Rej.> < TS, Reject > < TM, Accept > DT - Deal Template TM - Trade Manager TM - Trade Server DA - Deal Accepted DN - Deal Not accepted DA DN

Conclusions The Emergence of Internet as a Powerful connectivity media is bridging the gap between a number of technologies leading to what is known as “Everything on IP”. Cluster-based systems have become a platform of choice for mainstream computing. Economic based approach to resource management is the way to go in the grid environment. The user can say “I am willing to pay $…, can you complete my job by this time…” Both sequential and parallel applications run seamless on desktops, SMPs, Clusters, and the Grid without any change. Grid: A Next Generation Internet ?

Further Information Cluster Computing Infoware:
Grid Computing Infoware: Millennium Compute Power Grid/Market Project You are invited to join CPG project! Books: High Performance Cluster Computing, V1, V2, R.Buyya (Ed), Prentice Hall, 1999. The GRID, I. Foster and C. Kesselman (Eds), Morgan-Kaufmann, 1999. IEEE Task Force on Cluster Computing GRID Forums | GRID’2000 Meeting

Thank You… Any ??

Back up Slides Thanks to Jack Dongarra; He allowed me to
use his survey slides particularly those slides which are cut and paste included.

Rajkumar Buyya School of Computer Science and Software Engineering

Similar presentations

Presentation on theme: "Rajkumar Buyya School of Computer Science and Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rajkumar Buyya School of Computer Science and Software Engineering

Similar presentations

Presentation on theme: "Rajkumar Buyya School of Computer Science and Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback