Grid Computing: like herding cats?

Name: Grid Computing: like herding cats?
Uploaded: 2017-10-11T06:37:50+00:00
Duration: PTM29S51
Channel: Joseph Taylor
Description: Grid Computing: like herding cats?

Grid Computing: like herding cats?
Stephen Jarvis High Performance Systems Group University of Warwick, UK

Sessions on Grid What are we going to cover today?
A brief history Why we are doing it Applications Users Challenges Middleware What are you going to cover next week? technical talk on the specifics of our work Including application to e-Business and e-Science

An Overused Analogy Electrical Power Grid
Computing power might somehow be like electrical power plug in switch on have access to unlimited power We don’t know who supplies the power, or where it comes from just pick up the bill at the end of the month Is this the future of computing? We think that world we are in today is the way it has always been - not the case Development of societal structure is bound up with infrastructure. So one question - (C) why is there Chicago? (C) Onions, lakes, mossies - not very promising. In the US at that time, farming was starting, clearing forests, monoculture farming, bison killed 5. Great lakes & rivers were previous means of travel. 6. Emergence of railroads linked the lakes and news lines. 7. Chicago became a transit centre - a cache for goods. 8. People started to exchange different goods - empowered by the infrastructure. 9. New institutions were formed such as CBT financial org. 10. “Grid” technologies such as fridge vans made export easier, cheaper, further... 11. New “middleware” created unexpected industries (great retailing etc) 12. If there was ever a city that resulted from the emergence of infrastructure - it is Chicago.

Sounds great - but how long?
Is the computing infrastructure available? Computing power 1986: Cray X-MP ($8M) 2000: Nintendo-64 ($149) 2003: Earth Simulator (NEC), ASCI Q (LANL) 2005: Blue Gene/L (IBM), 360 Teraflops Look at for current supercomputers! 1. It took Chicago over 100 years to do this 2. Computing and comms is driven by exponential growth (its on steroids) 3. For example, Cray (86) cost $8 million, had its own power substation, special cooling, no graphics. 4. Its connection was a 56Kb/s NSF link. 5. N64 in 2000 has the same processing power as the Cray X-MP and costs $150 bucks 6. It uses 5 watts, not 60,000 wats 7. It has 3 dimensional graphics 8. Those with broadband or similar have more available than the NSF did only years ago. 9. Think of todays compute power. 10. Earth simulator (40Tflops peak, 35 sustained) 11. ASCI LANL (20Tflops peak, 13 sustained)

Storage & Network Storage capabilities Network capabilities
1986: Local data stores (MB) 2002: Goddard Earth Observation System – 29TB Network capabilities 1986 : NFSNET 56Kb/s backbone 1990s: Upgraded to 45Mb/s (gave us the Internet) 2000s: 40 Gb/s 1. Massive increase in storage capabilities. 2. In 86, local data stores were orders of kilobyte/megabyte 3. NASA funded Goddard Earth system stores huge data sets (terabytes order) 4. Networking & communication has grown massively. 5. NSFNET upgraded to 45Mb in the early 90’s. Lead to the Internet today. 6. Modern high speed networks are gigabits.

Many Potential Resources
Terra-byte databases Space telescopes 50M Mobile Phones Millions of PCs 30% Utilisation GRID 1. Lots of different resources type (heterogeneous resources) 2. Mobile devices, large data sets, instrumentation, PCs, clusters, supercomputers, even playstations. 3. Can these be linked in some sensible way? 4. Well, it isn’t as simple as railroads and not all of this can be done. 5. Sometimes the application can make it easy -> for example which highly partitionable. 6. Most apps aren’t like that - plus they will have large data requirements. 10k PS/2 per week Supercomputing Centres

Some History: NASAs Information Power Grid
The vision … mid ’90s to promote a revolution in how NASA addresses large-scale science and engineering by providing a persistent HPC infrastructure Computing and data management services on-demand locate and co-schedule multi-Center resources address large-scale and/or widely distributed problems Ancillary services workflow management and coordination security, charging …

Whole system simulations are produced by coupling all of the sub-system simulations
Lift Capabilities Drag Capabilities Responsiveness Stabilizer Models Airframe Models Crew Capabilities - accuracy - perception - stamina - re-action times - SOP’s Human Models Engine Models Braking performance Steering capabilities Traction Dampening capabilities Thrust performance Reverse Thrust performance Responsiveness Fuel Consumption Landing Gear Models

NREN GRC GSFC LaRC JPL Boeing EDC NCSA SDSC NTON-II/SuperNet NGIX CMU
MCAT/SRB Boeing DMF MDS CA O2000 cluster 300 node Condor pool MDS EDC GRC O2000 NGIX CMU NREN NCSA GSFC LaRC JPL O2000 cluster SDSC NTON-II/SuperNet MSFC MDS O2000 JSC KSC

National Air Space Simulation Environment
Stabilizer Models GRC Engine Models 44,000 Wing Runs 50,000 Engine Runs Wing Models Airframe Models 66,000 Stabilizer Runs ARC LaRC 22,000 Commercial US Flights a day Virtual National Air Space VNAS 22,000 Airframe Impact Runs FAA Ops Data Weather Data Airline Schedule Data Digital Flight Data Radar Tracks Terrain Data Surface Data Human Models Simulation Drivers 48,000 Human Crew Runs 132,000 Landing/ Take-off Gear Runs (Being pulled together under the NASA AvSP Aviation ExtraNet (AEN) Landing Gear Models

What is a Computational Grid?
A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities. The capabilities need not be high end. The infrastructure needs to be relatively transparent. 1. Computational grids are for computing (naturally) 2. Capability computing (productivity versus high performance)

Selected Grid Projects
US Based NASA Information Power Grid DARPA CoABS Grid DOE Science Grid NSF National Virtual Observatory NSF GriPhyN DOE Particle Physics Data Grid NSF DTF TeraGrid DOE ASCI DISCOM Grid DOE Earth Systems Grid etc… EU Based DataGrid (CERN, ..) EuroGrid (Unicore) Damien (Metacomputing) DataTag (TransAtlanticTestbed, …) Astrophysical Virtual Observatory GRIP (Globus/Unicore) GRIA (Industrial applications) GridLab (Cactus Toolkit, ..) CrossGrid (Infrastructure Components) EGSO (Solar Physics) Other National Projects UK - e-Science Grid Netherlands – VLAM-G, DutchGrid Germany – UNICORE Grid, D-Grid France – Etoile Grid Italy – INFN Grid Eire – Grid-Ireland Scandinavia - NorduGrid Poland – PIONIER Grid Hungary – DemoGrid Japan – JpGrid, ITBL South Korea – N*Grid Australia – Nimrod-G, …. Thailand Singapore AsiaPacific Grid

The Big Spend: two examples
US Tera Grid $100 Million US Dollars (so far…) 5 supercomputer centres New ultra-fast optical network ≤ 40Gb/s Grid software and parallel middleware Coordinated virtual organisations Scientific applications and users UK e-Science Grid £250 Million (so far…) Regional e-Science centres New infrastructure Middleware development Big science projects SuperJANET4

e-Science Grid Edinburgh Glasgow DL Newcastle Lancaster White Rose
Belfast Manchester Birmingham/Warwick Cambridge Oxford UCL Bristol RL Hinxton Cardiff London Soton

Who wants Grids and why? NASA IBM Governments
Aerospace simulations, Air traffic control NWS, In-aircraft computing Virtual Airspace Free fly, Accident prevention IBM On-demand computing infrastructure Protect software Support business computing Governments Simulation experiments Biodiversity, genomics, military, space science…

Classes of Grid applications
Category Examples Characteristics Distributed supercomputing DIS, Stellar dynamics, Chemistry Very large problems, lots of CPU, memory High Throughput Chip design, cryptography Harnessing idle resources On Demand Medical, Weather prediction Remote resources, time bounded Data Intensive Physics, Sky surveys Synthesis of new information Collaborative Data exploration, virtual environments Connection between many parties Distributed Interactive Simulation HT: loosly-coupled or independent tasks. Get CPUS to work (SETI) Synthesing, creating new information P/B per year

Classes of Grid Category Examples Characteristics Data Grid
EU DataGrid Lots of data sources from one site, processing off site Compute Grid Chip design, cryptography Harnessing and connecting rare resources Scavenging Grid SETI CPU Cycle steeling, commodity resources Enterprise Grid Banking Multi-site, but one organisation Distributed Interactive Simulation HT: loosly-coupled or independent tasks. Get CPUS to work (SETI) Synthesing, creating new information P/B per year

Using Distributed Resources
Scientific Information Scientific Discovery In Real Time Literature Databases Operational Data Images Instrument Real Time Integration Dynamic Application Integration Workflow Construction Interactive Visual Analysis Discovery Net Project Using Distributed Resources

Execute distributed annotation workflow
Nucleotide Annotation Workflows NCBI EMBL TIGR SNP Inter Pro SMART SWISS PROT GO KEGG Execute distributed annotation workflow Download sequence from Reference Server Save to Distributed Annotation Server 1800 clicks 500 Web access 200 copy/paste 3 weeks work in 1 workflow and few second execution

Grand Challenge: Integrating Different Levels of Simulation
molecular Grand Challenge: Integrating Different Levels of Simulation cellular organism Sansom et al. (2000) Trends Biochem. Sci. 25:368 An e-science challenge – non-trivial NASA IPG as a possible paradigm Need to integrate rigorously if to deliver accurate & hence biomedically useful results Noble (2002) Nature Rev. Mol. Cell.Biol. 3:460

Classes of Grid users Class Purpose Makes Use Of Concerns End Users
Solve problems Applications Transparency, performance Application Developers Develop applications Programming models, tools Ease of use, performance Tool Developers Develop tools & prog. models Grid services Adaptivity, security Grid Developers Provide grid services Existing grid services Connectivity, security System Administrators Management of resources Management tools Balancing concerns

Grid architecture Composed of hierarchy of sub-systems
Scalability is vital Key elements: End systems Single compute nodes, storage systems, IO devices etc. Clusters Homogeneous networks of workstations; parallel & distributed management Intranet Heterogeneous collections of clusters; geographically distributed Internet Interconnected intranets; no centralised control 1. Grid has a hierarchical structure similar to the Internet principle of autonomous systems. 2. Sub-systems are responsible for themselves, but fit together using standards. 3. Scalability is vital. 4.

End Systems State of the art Future directions
Privileged OS; complete control of resources and services Integrated nature allows high performance Plenty of high level languages and tool Future directions Lack features for integration into larger systems OS support for distributed computation Mobile code (sandboxing) Reduction in network overheads

Clusters State of the art Future directions
High-speed LAN, 100s or 1000s of nodes Single administrative domain Programming libraries like MPI Inter-process communication, co-scheduling Future directions Performance improvements OS support

Intranets State of the art Future directions
Grids of many resources, but one admin. domain Management of heterogeneous resources Data sharing (e.g. databases, web services) Supporting software environments inc. CORBA Load sharing systems such as LSF and Condor Resource discovery Future directions Increasing complexity (physical scale etc) Performance Lack of global knowledge

Internets State of the art Future directions
Geographical distribution, no central control Data sharing is very successful Management is difficult Future directions Sharing other computing services (e.g. computation) Identification of resources Transparency Internet services

Basic Grid services Authentication Acquiring resources Security
Can the users use the system; what jobs can they run? Acquiring resources What resources are available? Resource allocation policy; scheduling Security Is the data safe? Is the user process safe? Accounting Is the service free, or should the user pay?

Research Challenges (#1)
Grids computing is a relatively new area There are many challenges Nature of Applications New methods of scientific and business computing Programming models and tools Rethinking programming, algorithms, abstraction etc. Use of software components/services System Architecture Minimal demands should be placed on contributing sites Scalability Evolution of future systems and services

Problem solving methods Latency- and fault-tolerant strategies Highly concurrent and speculative execution Resource management How are the resources shared? How do we achieve end-to-end performance? Need to specify QoS requirements Then need to translate this to resource level Contention?

Security How do we safely share data, resources, tasks? How is code transferred? How does licensing work? Instrumentation and performance How do we maintain good performance? How can load-balancing be controlled? How do we measure grid performance? Networking and infrastructure Significant impact on networking Need to combine high and low bandwidth

Development of middleware
Many people see middleware as the vital ingredient Globus toolkit Component services for security, resource location, resource management, information services OGSA Open Grid Services Architecture Drawing on web services technology GGF International organisation driving Grid development Contains partners such as Microsoft, IBM, NASA etc.

Middleware Conceptual Layers
Workload Generation, Visualization… Discovery, Mapping, Scheduling, Security, Accounting… 1. What is middleware 2. Is the bit in between the resources and users. 3. The Glue if you like. Computing, Storage, Instrumentation…

Requirements include:
Offers up useful resources Accessible and useable resources Stable and adequately supported Single user ‘Laptop feel’ Middleware has much of this responsibility

Demanding management issues
Users are (currently) likely to be sophisticated but probably not computer ‘techies’ Need to hide detail & ‘obscene’ complexity Provide the vision of access of full resources Provide contract for level(s) of support (SLAs)

Key Interface between Applications & Machines
Gate Keeper / Manager Acts as resource manager. Responsible for mapping applications to resources. Scheduling tasks. Ensuring service level agreements (SLAs) Distributed / Dynamic. Key Interface between Applications & Machines

Middleware Projects Globus, Argonne National Labs, USA
AppLeS, UC San Diego, USA Open Grid Services Architecture (OGSA) ICENI, Imperial, UK Nimrod, Melbourne, Australia Many others... including us!! 1. Middleware is a complex problem with a lot of attention.

HPSG’s approach: Determine what resources are required
(advertise) Determine what resources are available (discovery) Map requirements to available resources (scheduling) Maintain contract of performance (service level of agreement) Performance drives the middleware decisions PACE

High Performance Systems Group, Warwick
‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’ High Performance Systems Group, Warwick Tony Blair, 2002

And herding cats … 100,000s computers Sat. links, miles of networking
Space telescopes, atomic colliders, medical scanners Tera-bytes of data Software stack a mile high…

Grid Computing: like herding cats?

Similar presentations

Presentation on theme: "Grid Computing: like herding cats?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grid Computing: like herding cats?

Similar presentations

Presentation on theme: "Grid Computing: like herding cats?"— Presentation transcript:

Similar presentations

About project

Feedback