Presentation is loading. Please wait.

Presentation is loading. Please wait.

Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000.

Similar presentations


Presentation on theme: "Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000."— Presentation transcript:

1 Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000

2 LHC Vision: Data Grid Hierarchy Tier 1 Tier2 Center Online System Offline Farm, CERN Computer Ctr > 20 TIPS FranceCenter FNAL Center Italy Center UK Center Institute Institute ~0.25TIPS Workstations ~100 MBytes/sec ~2.5 Gbits/sec 100 - 1000 Mbits/sec 1 Bunch crossing; ~17 interactions per 25 nsecs; 100 triggers per second. Event is ~1 MByte in size Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels Physics data cache ~PBytes/sec ~0.6-2.5 Gbits/sec Tier2 Center ~622 Mbits/sec Tier 0 +1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment

3 US-CERN Link BW Requirements Projection (PRELIMINARY) [#] Includes ~1.5 Gbps Each for ATLAS and CMS, Plus Babar, Run2 and Other [*] D0 and CDF at Run2: Needs Presumed to Be to be Comparable to BaBar

4 Grids: The Broader Issues and Requirements u A New Level of Intersite Cooperation, and Resource Sharing è Security and Authentication Across World-Region Boundaries è Start with cooperation among Grid Projects (PPDG, GriPhyN, EU DataGrid, etc.) u Develop Methods for Effective HEP/CS Collaboration In Grid and VDT Design è Joint Design and Prototyping Effort, with (Iterative) Design Specifications è Find an Appropriate Level of Abstraction k Adapted to > 1 Experiment; > 1 Working Environment u Be Ready to Adapt to the Coming Revolutions è In Network, Collaborative, and Internet Information Technologies

5 PPDG BaBar Data Management BaBar D0 CDF Nuclear Physics CMSAtlas Globus Users SRB Users Condor Users HENP GC Users CMS Data Management Nucl Physics Data Management D0 Data Management CDF Data Management Atlas Data Management Globus Team Condor SRB Team HENP GC

6 GriPhyN: PetaScale Virtual Data Grids Build the Foundation for Petascale Virtual Data Grids Build the Foundation for Petascale Virtual Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, computers, and network ) è Resource è Management è Services Resource Management Services è Security and è Policy è Services Security and Policy Services è Other Grid è Services Other Grid Services Interactive User Tools Production Team Individual Investigator Workgroups Raw data source

7         EU-Grid Project Work Packages

8 Grid Issues: A Short List of Coming Revolutions u Network Technologies è Wireless Broadband (from ca. 2003) è 10 Gigabit Ethernet (from 2002: See www.10gea.org) 10GbE/DWDM-Wavelength (OC-192) integration: OXC u Internet Information Software Technologies è Global Information “Broadcast” Architecture k E.g the Multipoint Information Distribution Protocol (MIDP; Tie.Liao@inria.fr) è Programmable Coordinated Agent Archtectures k E.g. Mobile Agent Reactive Spaces (MARS) by Cabri et al., Univ. of Modena u The “Data Grid” - Human Interface è Interactive monitoring and control of Grid resources k By authorized groups and individuals k By Autonomous Agents

9 GigaPOP Vancouver Calgary Regina Winnipeg Ottawa Montreal Toronto Halifax St. John’s Fredericton Charlottetown ORAN BCnet Netera SRnet MRnet ONet RISQ ACORN Chicago STAR TAP CA*net 3 Primary Route Seattle New York Los Angeles CA*net 3 Diverse Route Deploying a 4 channel CWDM Gigabit Ethernet network – 400 km Deploying a 4 channel Gigabit Ethernet transparent optical DWDM– 1500 km Multiple Customer Owned Dark Fiber Networks connecting universities and schools 16 channel DWDM -8 wavelengths @OC-192 reserved for CANARIE -8 wavelengths for carrier and other customers Consortium Partners: Bell Nexxia Nortel Cisco JDS Uniphase Newbridge Condo Dark Fiber Networks connecting universities and schools Condo Fiber Network linking all universities and hospital CA*net 3 National Optical Internet in Canada

10 Vancouver Calgary Regina Winnipeg Ottawa Montreal Toronto Halifax St. John’s Fredericton Charlottetown Chicago Seattle New York Los Angeles Miami Europe Dedicated Wavelength or SONET channel OBGP switches Optional Layer 3 aggregation service Large channel WDM system CA*net 4 Possible Architecture

11 Intermediate ISP Tier 1 ISP Tier 2 ISP AS 1 AS 2 AS 3 AS 4 AS 5 Dual Connected Router to AS 5 Optical switch looks like BGP router and AS1 is direct connected to Tier 1 ISP but still transits AS 5 Router redirects networks with heavy traffic load to optical switch, but routing policy still maintained by ISP Bulk of AS 1 traffic is to Tier 1 ISP For simplicity only data forwarding paths in one direction shown Red Default Wavelength OBGP Traffic Engineering - Physical

12 VRVS Remote Collaboration System: Statistics VRVS Remote Collaboration System: Statistics 30 Reflectors 52 Countries Mbone, H.323, MPEG2 Streaming, VNC

13 VRVS: Mbone/H.323/QT Snapshot

14 VRVS R&D: Sharing Desktop VNC technology integrated in the upcoming VRVS release

15 Worldwide Computing Issues u Beyond Grid Prototype Components: Integration of Grid Prototypes for End-to-end Data Transport è Particle Physics Data Grid (PPDG) ReqM; SAM in D0 è PPDG/EU DataGrid GDMP for CMS HLT Productions u Start Building the Grid System(s): Integration with Experiment-specific software frameworks u Derivation of Strategies (MONARC Simulation System) è Data caching, query estimation, co-scheduling è Load balancing and workload management amongst Tier0/Tier1/Tier2 sites (SONN by Legrand) è Transaction robustness: simulate and verify u Transparent Interfaces for Replica Management è Deep versus shallow copies: Thresholds; tracking, monitoring and control

16 Grid Data Management Prototype (GDMP) Distributed Job Execution and Data Handling: Goals è Transparency è Performance è Security è Fault Tolerance è Automation Submit job Replicate data Replicate data Site A Site B Site C r Jobs are executed locally or remotely r Data is always written locally r Data is replicated to remote sites Job writes data locally GDMP V1.1: Caltech + EU DataGrid WP2 Tests by CALTECH, CERN, FNAL, Pisa for CMS “HLT” Production 10/2000; Integration with ENSTORE, HPSS, Castor

17 MONARC Simulation: Physics Analysis at Regional Centres MONARC Simulation: Physics Analysis at Regional Centres èSimilar data processing jobs are performed in each of several RCs èThere is profile of jobs, each submitted to a job scheduler èEach Centre has “TAG” and “AOD” databases replicated. èMain Centre provides “ESD” and “RAW” data èEach job processes AOD data, and also a a fraction of ESD and RAW data.

18 ORCA Production on CERN/IT-Loaned Event Filter Farm Test Facility Pileup DB Pileup DB Pileup DB Pileup DB Pileup DB HPSS Pileup DB Pileup DB Signal DB Signal DB Signal DB... 6 Servers for Signal Output Server Output Server Lock Server Lock Server SUN... FARM 140 Processing Nodes 17 Servers 9 Servers Total 24 Pile Up Servers 2 Objectivity Federations The strategy is to use many commodity PCs as Database Servers

19 Network Traffic & Job efficiency Network Traffic & Job efficiency Mean measured Value ~48MB/s Measurement Simulation Jet Muon

20 CD CH MD MH TH MC UF.boot MyFED.boot User Collection MD CD MC TD AMS ORCA 4 tutorial, part II - 14. October 2000 From UserFederation To Private Copy

21 Mobile Agents: (Semi)-Autonomous, Goal Driven, Adaptive è Execute Asynchronously è Reduce Network Load: Local Conversations è Overcome Network Latency; Some Outages è Adaptive  Robust, Fault Tolerant è Naturally Heterogeneous è Extensible Concept: Coordinated Agent Architectures Beyond Traditional Architectures: Mobile Agents “Agents are objects with rules and legs” -- D. Taylor Application ServiceAgent

22 Coordination Architectures for Mobile Java Agents u A lot of Progress since 1998 u Fourth Generation Architecture: “Associative Blackboards” è After 1) Client/Server, 2) Meeting-Oriented, 3) Blackboards; è Analogous to CMS ORCA software: Observer-based “action on demand” u MARS: Mobile Agent Reactive Spaces (Cabri et al.) See http://sirio.dsi.unimo.it/MOON è Resilient and Scalable; Simple Implementation è Works with standard Agent implementations (e.g. Aglets: http://www.trl.ibm.co.jp) è Data-oriented, to provide temporal and spatial asynchronicity (See Java Spaces, Page Spaces) è Programmable, authorized reactions, based on “virtual Tuple spaces”

23 Mobile Agent Reactive Spaces (MARS) Architecture u MARS Programmed Reactions: Based on Metalevel 4-Ples: (Reaction, Tuple, Operation-Type, Agent-ID) è Allows Security, Policies è Allows Production of Tuple on Demand The Internet NETWORK NODE Tuple Space MetaLevel Tuple space Agent Server NETWORK NODE A Referenc e to the local Tuple Space B C A: Agents Arrive B: They Get Ref. To Tuple Space C: They Access Tuple Space D: Tuple Space Reacts, with Programmed Behavior D

24 GRIDs In 2000: Summary u Grids are (in) our Future… Let’s Get to Work

25 Grid Data Management Issues è Data movement and responsibility for updating the Replica Catalog è Metadata update and replica consistency k Concurrency and locking è Performance characteristics of replicas è Advance Reservation: Policy, time-limit k How to advertise policy and resource availability è Pull versus push (strategy; security) è Fault tolerance; recovery procedures è Queue management è Access control, both global and local


Download ppt "Issues for Grids and WorldWide Computing Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000."

Similar presentations


Ads by Google