University of Mississippi (Nov. 11, 2003)Paul Avery1 University of Florida Data Grids Enabling Data Intensive Global Science Physics.

Slides:



Advertisements
Similar presentations
International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.
Advertisements

Virtual Data and the Chimera System* Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
LHC Computing Review (Jan. 14, 2003)Paul Avery1 University of Florida GriPhyN, iVDGL and LHC Computing.
Lishep2004 (February 17, 2004)Paul Avery1 University of Florida Grid Computing in High Energy Physics Enabling Data Intensive Global.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
University of Michigan (May 8, 2003)Paul Avery1 University of Florida Grids for 21 st Century Data Intensive.
Other servers Java client, ROOT (analysis tool), IGUANA (CMS viz. tool), ROOT-CAVES client (analysis sharing tool), … any app that can make XML-RPC/SOAP.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Beauty 2003 (October 14, 2003)Paul Avery1 University of Florida Grid Computing in High Energy Physics Enabling Data Intensive Global.
Knowledge Environments for Science: Representative Projects Ian Foster Argonne National Laboratory University of Chicago
The Grid as Infrastructure and Application Enabler Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Korean HEP Grid Workshop (Nov. 8, 2002)Paul Avery1 University of Florida U.S. Physics Data Grid Projects.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
CANS Meeting (December 1, 2004)Paul Avery1 University of Florida UltraLight U.S. Grid Projects and Open Science Grid Chinese American.
Peer to Peer & Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
From GEANT to Grid empowered Research Infrastructures ANTONELLA KARLSON DG INFSO Research Infrastructures Grids Information Day 25 March 2003 From GEANT.
Jarek Nabrzyski, Ariel Oleksiak Comparison of Grid Middleware in European Grid Projects Jarek Nabrzyski, Ariel Oleksiak Poznań Supercomputing and Networking.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Manuela Campanelli The University of Texas at Brownsville EOT-PACI Alliance All-Hands Meeting 30 April 2003 Urbana, Illinois GriPhyN.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
GriPhyN EAC Meeting (Jan. 7, 2002)Carl Kesselman1 University of Southern California GriPhyN External Advisory Committee Meeting Gainesville,
Brussels Grid Meeting (Mar. 23, 2001)Paul Avery1 University of Florida Extending the Grid Reach in Europe.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE – paving the way for a sustainable infrastructure.
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
The Swiss Grid Initiative Context and Initiation Work by CSCS Peter Kunszt, CSCS.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
…building the next IT revolution From Web to Grid…
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Middleware Camp NMI (NSF Middleware Initiative) Program Director Alan Blatecky Advanced Networking Infrastructure and Research.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
OSG Consortium Meeting (January 23, 2006)Paul Avery1 University of Florida Open Science Grid Progress Linking Universities and Laboratories.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
The Grid Effort at UF Presented by Craig Prescott.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
The Particle Physics Data Grid Collaboratory Pilot Richard P. Mount For the PPDG Collaboration DOE SciDAC PI Meeting January 15, 2002.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
LIGO-G E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.
GriPhyN EAC Meeting (Jan. 7, 2002)Paul Avery1 Integration with iVDGL è International Virtual-Data Grid Laboratory  A global Grid laboratory (US, EU, Asia,
Open Science Grid & its Security Technical Group ESCC22 Jul 2004 Bob Cowles
GriPhyN Project Paul Avery, University of Florida, Ian Foster, University of Chicago NSF Grant ITR Research Objectives Significant Results Approach.
U.S. Grid Projects and Involvement in EGEE Ian Foster Argonne National Laboratory University of Chicago EGEE-LHC Town Meeting,
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
Open Science Grid in the U.S. Vicky White, Fermilab U.S. GDB Representative.
Management & Coordination Paul Avery, Rick Cavanaugh University of Florida Ian Foster, Mike Wilde University of Chicago, Argonne
LHC Computing, SPC-FC-CC-C; H F Hoffmann1 CERN/2379/Rev: Proposal for building the LHC computing environment at CERN (Phase 1) Goals of Phase.
1 Open Science Grid.. An introduction Ruth Pordes Fermilab.
Realizing the Promise of Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
Hall D Computing Facilities Ian Bird 16 March 2001.
R. Ruchti DPF October 2006 Broader Impacts in Particle Physics A Responsibility and an Opportunity R. Ruchti National Science Foundation and University.
Bob Jones EGEE Technical Director
Presentation transcript:

University of Mississippi (Nov. 11, 2003)Paul Avery1 University of Florida Data Grids Enabling Data Intensive Global Science Physics Colloquium University of Mississippi November 11, 2003

University of Mississippi (Nov. 11, 2003)Paul Avery2 Overview  Grids, Data Grids and examples  LHC computing as principal Grid driver  Grid & network projects  Sample Grid R&D efforts  Promising new directions  Success story for HEP  Leadership, partnership, collaboration

University of Mississippi (Nov. 11, 2003)Paul Avery3 The Grid Concept  Grid: Geographically distributed computing resources configured for coordinated use  Fabric: Physical resources & networks provide raw capability  Middleware: Software ties it all together: tools, services, etc.  Ownership:Resources controlled by owners and shared w/ others  Goal: Transparent resource sharing

University of Mississippi (Nov. 11, 2003)Paul Avery4 Grids and Resource Sharing  Resources for complex problems are distributed  Advanced scientific instruments (accelerators, telescopes, …)  Storage, computing, people, institutions  Organizations require access to common services  Research collaborations (physics, astronomy, engineering, …)  Government agencies, health care organizations, corporations, …  Grids make possible “Virtual Organizations”  Create a “VO” from geographically separated components  Make all community resources available to any VO member  Leverage strengths at different institutions  Grids require a foundation of strong networking  Communication tools, visualization  High-speed data transmission, instrument operation

University of Mississippi (Nov. 11, 2003)Paul Avery5 Grid Challenges  Operate a fundamentally complex entity  Geographically distributed resources  Each resource under different administrative control  Many failure modes  Manage workflow of 1000s of jobs across Grid  Balance policy vs. instantaneous capability to complete tasks  Balance effective resource use vs. fast turnaround for priority jobs  Match resource usage to policy over the long term  Maintain a global view of resources and system state  Coherent end-to-end system monitoring  Adaptive learning for execution optimization  Build managed system & integrated user environment

University of Mississippi (Nov. 11, 2003)Paul Avery6 Data Grids & Data Intensive Sciences  Scientific discovery increasingly driven by data collection  Computationally intensive analyses  Massive data collections  Data distributed across networks of varying capability  Internationally distributed collaborations  Dominant factor: data growth (1 Petabyte = 1000 TB)  2000~0.5 Petabyte  2005~10 Petabytes  2010~100 Petabytes  2015~1000 Petabytes? How to collect, manage, access and interpret this quantity of data? Drives demand for “Data Grids” to handle additional dimension of data access & movement

University of Mississippi (Nov. 11, 2003)Paul Avery7 Data Intensive Physical Sciences  High energy & nuclear physics  Belle/BaBar, Tevatron, RHIC, JLAB, LHC  Astronomy  Digital sky surveys: SDSS, VISTA, other Gigapixel arrays  VLBI arrays: multiple- Gbps data streams  “Virtual” Observatories (multi-wavelength astronomy)  Gravity wave searches  LIGO, GEO, VIRGO, TAMA  Time-dependent 3-D systems (simulation & data)  Earth Observation  Climate modeling, oceanography, coastal dynamics  Geophysics, earthquake modeling  Fluids, aerodynamic design  Pollutant dispersal

University of Mississippi (Nov. 11, 2003)Paul Avery8 Data Intensive Biology and Medicine  Medical data and imaging  X-Ray, mammography data, etc. (many petabytes)  Radiation Oncology (real-time display of 3-D images)  X-ray crystallography  Bright X-Ray sources, e.g. Argonne Advanced Photon Source  Molecular genomics and related disciplines  Human Genome, other genome databases  Proteomics (protein structure, activities, …)  Protein interactions, drug delivery  High-res brain scans (1-10  m, time dependent)

University of Mississippi (Nov. 11, 2003)Paul Avery9 LHC and Data Grids

University of Mississippi (Nov. 11, 2003)Paul Avery10 Search for Origin of Mass & Supersymmetry (2007 – ?) TOTEM LHCb ALICE  27 km Tunnel in Switzerland & France CMS ATLAS Large Hadron Collider CERN

University of Mississippi (Nov. 11, 2003)Paul Avery11 CMS Experiment at LHC “Compact” Muon Solenoid at the LHC (CERN) Smithsonian standard man

University of Mississippi (Nov. 11, 2003)Paul Avery Physicists 159 Institutes 36 Countries LHC: Key Driver for Data Grids  Complexity:Millions of individual detector channels  Scale:PetaOps (CPU), Petabytes (Data)  Distribution:Global distribution of people & resources

University of Mississippi (Nov. 11, 2003)Paul Avery13 LHC Data Rates: Detector to Storage Level 1 Trigger: Special Hardware 40 MHz 75 KHz 75 GB/sec 5 KHz 5 GB/sec Level 2 Trigger: Commodity CPUs 100 Hz 0.1 – 1.5 GB/sec Level 3 Trigger: Commodity CPUs Raw Data to storage (+ simulated data) Physics filtering

University of Mississippi (Nov. 11, 2003)Paul Avery collisions/sec, selectivity: 1 in LHC: Higgs Decay into 4 muons

University of Mississippi (Nov. 11, 2003)Paul Avery15 CMS ATLAS LHCb Storage  Raw recording rate 0.1 – 1 GB/s  Accumulating at 5-8 PB/year  10 PB of disk  ~100 PB total within a few years Processing  200,000 of today’s fastest PCs LHC Data Requirements

University of Mississippi (Nov. 11, 2003)Paul Avery16 CMS Experiment Hierarchy of LHC Data Grid Resources Online System CERN Computer Center > 20 TIPS USA Korea Russia UK Institute MBytes/s Gbps 1-10 Gbps Gbps Gbps Tier 0 Tier 1 Tier 3 Tier 4 Tier 2 Physics cache PCs Institute Tier2 Center ~10s of Petabytes/yr by ~1000 Petabytes in < 10 yrs?

University of Mississippi (Nov. 11, 2003)Paul Avery17 Most IT Resources Outside CERN 2008 Resources

University of Mississippi (Nov. 11, 2003)Paul Avery18 LHC and Global Knowledge Communities Non-hierarchical use of Data Grid

University of Mississippi (Nov. 11, 2003)Paul Avery19 Data Grid Projects

University of Mississippi (Nov. 11, 2003)Paul Avery20 Global Context: Data Grid Projects  U.S. Infrastructure Projects  GriPhyN (NSF)  iVDGL (NSF)  Particle Physics Data Grid (DOE)  PACIs and TeraGrid (NSF)  DOE Science Grid (DOE)  NSF Middleware Infrastructure (NSF)  EU, Asia major projects  European Data Grid (EU)  EDG-related national Projects  LHC Computing Grid (CERN)  EGEE (EU)  CrossGrid (EU)  DataTAG (EU)  GridLab (EU)  Japanese Grid Projects  Korea Grid project  Not exclusively HEP (LIGO, SDSS, ESA, Biology, …)  But most driven/led by HEP  Many $M brought into the field

University of Mississippi (Nov. 11, 2003)Paul Avery21         EU DataGrid Project

University of Mississippi (Nov. 11, 2003)Paul Avery22 U.S. Particle Physics Data Grid  Funded 1999 – US$9.5M (DOE)  Driven by HENP experiments: D0, BaBar, STAR, CMS, ATLAS  Maintains practical orientation: Grid tools for experiments DOE funded

University of Mississippi (Nov. 11, 2003)Paul Avery23 U.S. GriPhyN and iVDGL Projects  Both funded by NSF (ITR/CISE + Physics)  GriPhyN: $11.9M (NSF)(2000 – 2005)  iVDGL:$14.0M (NSF)(2001 – 2006)  Basic composition (~120 people)  GriPhyN:12 universities, SDSC, 3 labs  iVDGL:20 universities, SDSC, 3 labs, foreign partners  Expts:CMS, ATLAS, LIGO, SDSS/NVO  Grid research/infrastructure vs Grid deployment  GriPhyN:CS research, Virtual Data Toolkit (VDT) development  iVDGL:Grid laboratory deployment using VDT  4 physics experiments provide frontier challenges  Extensive student involvement  Undergrads, grads, postdocs participate at all levels  Strong outreach component

University of Mississippi (Nov. 11, 2003)Paul Avery24 GriPhyN/iVDGL Science Drivers  LHC experiments  High energy physics  100s of Petabytes  LIGO  Gravity wave experiment  100s of Terabytes  Sloan Digital Sky Survey  Digital astronomy (1/4 sky)  10s of Terabytes Data growth Community growth  Massive CPU (PetaOps)  Large distributed datasets (>100PB)  International (global) communities (1000s)

University of Mississippi (Nov. 11, 2003)Paul Avery25 GriPhyN: PetaScale Virtual-Data Grids Virtual Data Tools Request Planning & Scheduling Tools Request Execution & Management Tools Transforms Distributed resources (code, storage, CPUs, networks) Resource Management Services Security and Policy Services Other Grid Services Interactive User Tools Production Team Single Researcher Workgroups Raw data source  PetaOps  Petabytes  Performance

University of Mississippi (Nov. 11, 2003)Paul Avery26 International Virtual Data Grid Laboratory (Fall 2003) UF Wisconsin BNL Indiana Boston U SKC Brownsville Hampton PSU J. Hopkins Caltech Tier1 Tier2 Tier3 FIU FSU Arlington Michigan LBL Oklahoma Argonne Vanderbilt UCSD/SDSC Fermilab Partners  EU  Brazil  Korea  Japan?

University of Mississippi (Nov. 11, 2003)Paul Avery27  Extended runs for Monte Carlo data production  200K event test run identified many bugs in core Grid middleware  2 months continuous running across 5 testbed sites (1.5M events)  Demonstrated at Supercomputing 2002 Testbed Successes: US-CMS Example

University of Mississippi (Nov. 11, 2003)Paul Avery28 LCG: LHC Computing Grid Project  Prepare & deploy computing environment for LHC expts  Common applications, tools, frameworks and environments  Emphasis on collaboration, coherence of LHC computing centers  Deployment only: no middleware development  Move from testbed systems to real production services  Operated and supported 24x7 globally  Computing fabrics run as production physics services  A robust, stable, predictable, supportable infrastructure  Need to resolve some issues  Federation vs integration  Grid tools and technologies

University of Mississippi (Nov. 11, 2003)Paul Avery29 Sep. 29, 2003 announcement

University of Mississippi (Nov. 11, 2003)Paul Avery30 Current LCG Sites

University of Mississippi (Nov. 11, 2003)Paul Avery31 Sample Grid R&D

University of Mississippi (Nov. 11, 2003)Paul Avery32 Sphinx Grid-Scheduling Service Sphinx Server VDT Client VDT Server Site MonALISA Monitoring Service Globus Resource Replica Location Service Condor-G/DAGMan Request Processing Information Warehouse Data Management Information Gathering Sphinx Client Chimera Virtual Data System

University of Mississippi (Nov. 11, 2003)Paul Avery33 GAE: Grid Analysis Environment  GAE is crucial for LHC experiments  Large, diverse, distributed community of users  Support for 100s-1000s of analysis tasks, over dozens of sites  Dependent on high-speed networks  GAE is where the physics gets done  analysis teams  Team structure: Local, national, global  Teams share computing, storage & network resources  But the global system has finite resources  Widely varying task requirements and priorities  Need for robust authentication and security  Need to define and implement collaboration policies & strategies

University of Mississippi (Nov. 11, 2003)Paul Avery34 CAIGEE: A Prototype GAE  CMS Analysis – an Integrated Grid Enabled Environment  Exposes “Global System” to physicists  Supports data requests, preparation, production, movement, analysis  Targets US-CMS physicists

University of Mississippi (Nov. 11, 2003)Paul Avery35 ROOT (via Clarens)JASOnPDA (via Clarens) Collaboration Analysis Desktop COJAC (via Web Services) Grid-Enabled Analysis Prototypes

University of Mississippi (Nov. 11, 2003)Paul Avery36 Virtual Data: Derivation and Provenance  Most scientific data are not simple “measurements”  They are computationally corrected/reconstructed  They can be produced by numerical simulation  Science & eng. projects are more CPU and data intensive  Programs are significant community resources (transformations)  So are the executions of those programs (derivations)  Management of dataset transformations important!  Derivation: Instantiation of a potential data product  Provenance: Exact history of any existing data product We already do this, but manually!

University of Mississippi (Nov. 11, 2003)Paul Avery37 Transformation Derivation Data product-of execution-of consumed-by/ generated-by “I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.” “I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.” “I want to search a database for 3 muon SUSY events. If a program that does this analysis exists, I won’t have to write one from scratch.” “I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.” Virtual Data Motivations (1)

University of Mississippi (Nov. 11, 2003)Paul Avery38 Virtual Data Motivations (2)  Data track-ability and result audit-ability  Universally sought by scientific applications  Facilitate resource sharing and collaboration  Data is sent along with its recipe  A new approach to saving old data: economic consequences?  Manage workflow  Organize, locate, specify, request data products  Repair and correct data automatically  Identify dependencies, apply x-tions  Optimize performance  Re-create data or copy it (caches) Manual /error prone  Automated /robust

University of Mississippi (Nov. 11, 2003)Paul Avery39 mass = 160 decay = WW WW  e Pt > 20 mass = 160 decay = WW WW  e mass = 160 decay = WW WW  leptons mass = 160 decay = WW mass = 160 decay = ZZ mass = 160 decay = bb Other cuts Scientist adds a new derived data branch Other cuts LHC Analysis with Virtual Data

University of Mississippi (Nov. 11, 2003)Paul Avery40  Virtual Data Language (VDL)  Describes virtual data products  Virtual Data Catalog (VDC)  Used to store VDL  Abstract Job Flow Planner  Creates a logical DAG (dependency graph)  Concrete Job Flow Planner  Interfaces with a Replica Catalog  Provides a physical DAG submission file to Condor-G  Generic and flexible  As a toolkit and/or a framework  In a Grid environment or locally Logical Physical Abstract Planner VDC Replica Catalog Concrete Planner DAX DAGMan DAG VDL XML Chimera Virtual Data System XML Virtual data & CMS production MCRunJob

University of Mississippi (Nov. 11, 2003)Paul Avery41 Test: Sloan Galaxy Cluster Analysis Galaxy cluster size distribution Sloan Data

University of Mississippi (Nov. 11, 2003)Paul Avery42 Promising New Directions

University of Mississippi (Nov. 11, 2003)Paul Avery43 HEP: Driver for International Networks  BW in Mbps (2001 estimates)  Now seen as too conservative!

University of Mississippi (Nov. 11, 2003)Paul Avery44 HEP & Network Land Speed Records  9/01102 Mbps CIT-CERN  5/ Mbps SLAC-Manchester  9/ Mbps Chicago-CERN  11/02[LSR] 930 Mbps California-CERN  11/02[LSR] 9.4 Gbps in 10 Flows California-Chicago  2/03[LSR] 2.38 Gbps in 1 Stream California-Geneva  10/03[LSR] 5 Gbps in 1 Stream HEP/LHC driving network developments  New network protocols  Land speed records  ICFA networking chair  HENP working group in Internet2

University of Mississippi (Nov. 11, 2003)Paul Avery45 U.S. Grid Coordination: Trillium  Trillium = GriPhyN + iVDGL + PPDG  Large overlap in leadership, people, experiments  HEP primary driver, but other disciplines too  Benefit of coordination  Common software base + packaging: VDT + Pacman  Wide deployment of new technologies, e.g. Virtual Data  Stronger, broader outreach effort  Unified U.S. entity to interact with international Grid projects  Goal: establish permanent production Grid  Short term:Grid2003  Medium term:Grid2004, etc. (increasing scale)  Long term:Open Science Grid

University of Mississippi (Nov. 11, 2003)Paul Avery46 Grid2003  27 sites (U.S., Korea)  ~2000 CPUs  SC2003 demo

University of Mississippi (Nov. 11, 2003)Paul Avery47 Open Science Grid   Specific goal: Support US-LHC research program  General goal: U.S. Grid supporting other disciplines  Funding mechanism: DOE/NSF  Laboratories (DOE) and universities (NSF)  Sep. 17 NSF meeting: physicists, educators, NSF/EHR, QuarkNet  Getting there: “Functional Demonstration Grids”  Grid2003, Grid2004, Grid2005, …  New release every 6-12 months, increasing functionality & scale  Constant participation in LHC computing exercises

CHEPREO: Center for High Energy Physics Research and Educational Outreach Florida International University  E/O Center in Miami area  iVDGL Grid Activities  CMS Research  AMPATH network (S. America) Funded September 2003

University of Mississippi (Nov. 11, 2003)Paul Avery49 UltraLight: 10 Gb/s Network 10 Gb/s+ network Caltech, UF, FIU, UM, MIT SLAC, FNAL, BNL Int’l partners Cisco, Level(3), Internet2 Submitted Nov. 10, 2003

University of Mississippi (Nov. 11, 2003)Paul Avery50 A Global Grid Enabled Collaboratory for Scientific Research (GECSR)  Main participants  Michigan  Caltech  Maryland  FIU  First Grid-enabled Collaboratory  Tight integration between  Science of Collaboratories  Globally scalable work environment  Sophisticated collaborative tools (VRVS, VNC; Next-Gen)  Monitoring system (MonALISA)  Initial targets: Global HEP collaborations  Applicable to other large-scale scientific endeavors

University of Mississippi (Nov. 11, 2003)Paul Avery51 Dynamic Workspaces Enabling Global Analysis Communities

University of Mississippi (Nov. 11, 2003)Paul Avery52 GLORIAD: US-Russia-China Network  New 10 Gb/s network linking US-Russia-China  Plus Grid component linking science projects  Meeting at NSF April 14 with US-Russia-China reps.  HEP people (Hesheng, et al.)  Broad agreement that HEP can drive Grid portion  Other applications will be solicited  More meetings planned

University of Mississippi (Nov. 11, 2003)Paul Avery53 Grids: Enhancing Research & Learning  Fundamentally alters conduct of scientific research  “Lab-centric”:Activities center around large facility  “Team-centric”:Resources shared by distributed teams  “Knowledge-centric”:Knowledge generated/used by a community  Strengthens role of universities in research  Couples universities to data intensive science  Couples universities to national & international labs  Brings front-line research and resources to students  Exploits intellectual resources of formerly isolated schools  Opens new opportunities for minority and women researchers  Builds partnerships to drive advances in IT/science/eng  HEP  Physics, astronomy, biology, CS, etc.  “Application” sciences  Computer Science  Universities  Laboratories  Scientists  Students  Research Community  IT industry

University of Mississippi (Nov. 11, 2003)Paul Avery54 HEP’s Broad Impact and Relevance  HEP is recognized as the strongest science driver for Grids  (In collaboration with computer scientists)  LHC a particularly strong driving function  Grid projects are driving important network developments  “Land speed records” attract much attention  ICFA-SCIC, I-HEPCCC, US-CERN link, ESNET, Internet2  We are increasing our impact on education and outreach  Providing technologies, resources for training, education, outreach  HEP involvement in Grid projects has helped us!  Many $M brought into the field  Many visible national and international initiatives  Partnerships with other disciplines  increasing our visibility  Recognition at high levels (NSF, DOE, EU, Asia)

University of Mississippi (Nov. 11, 2003)Paul Avery55 Summary  Progress occurring on many fronts  CS research, 11 VDT releases, simplified installation  Testbeds, productions based on Grid tools using iVDGL resources  Functional Grid testbeds providing excellent experience  Real applications  Scaling up sites, CPUs, people  Collaboration occurring with more partners  National, international (Asia, South America)  Testbeds, monitoring, deploying VDT more widely  New directions being followed  Networks:Increased capabilities (bandwidth, effic, services)  Packaging:Auto-install+run+exit at remote computing sites  Virtual data:Powerful paradigm for scientific computing  Research:Collaborative and Grid tools for distributed teams

University of Mississippi (Nov. 11, 2003)Paul Avery56 Grid References  Grid Book   Globus   Global Grid Forum   PPDG   GriPhyN   iVDGL   TeraGrid   EU DataGrid 

University of Mississippi (Nov. 11, 2003)Paul Avery57 Extra Slides

University of Mississippi (Nov. 11, 2003)Paul Avery58 Some (Realistic) Grid Examples  High energy physics  3,000 physicists worldwide pool Petaflops of CPU resources to analyze Petabytes of data  Fusion power (ITER, etc.)  Physicists quickly generate 100 CPU-years of simulations of a new magnet configuration to compare with data  Astronomy  An international team remotely operates a telescope in real time  Climate modeling  Climate scientists visualize, annotate, & analyze Terabytes of simulation data  Biology  A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour

University of Mississippi (Nov. 11, 2003)Paul Avery59 GriPhyN Goals  Conduct CS research to achieve vision  “Virtual Data” as unifying principle  Disseminate through Virtual Data Toolkit (VDT)  Primary deliverable of GriPhyN  Integrate into GriPhyN science experiments  Common Grid tools, services  Impact other disciplines  HEP, biology, medicine, virtual astronomy, eng.  Educate, involve, train students in IT research  Undergrads, grads, postdocs, underrepresented groups

University of Mississippi (Nov. 11, 2003)Paul Avery60 iVDGL Goals and Context  International Virtual-Data Grid Laboratory  A global Grid laboratory (US, EU, E. Europe, Asia, S. America, …)  A place to conduct Data Grid tests “at scale”  A mechanism to create common Grid infrastructure  A laboratory for other disciplines to perform Data Grid tests  A focus of outreach efforts to small institutions  Context of iVDGL in LHC computing program  Develop and operate proto-Tier2 centers  Learn how to do Grid operations (GOC)  International participation  DataTag partner project in EU  New international partners: Korea and Brazil  UK e-Science programme: support 6 CS Fellows per year in U.S.

University of Mississippi (Nov. 11, 2003)Paul Avery61 ATLAS Simulations on iVDGL Resources BNL Boston U Argonne, Chicago Michigan Tier1 Prototype Tier2 Testbed sites UTA OU Indiana LBL Fermilab SDSS Florida US CMS UW Milwaukee LIGO Joint project with iVDGL

University of Mississippi (Nov. 11, 2003)Paul Avery62 US-CMS Testbed Brazil UCSD Florida Wisconsin Caltech Fermilab FIU FSU Korea Rice Taiwan Russia MIT

University of Mississippi (Nov. 11, 2003)Paul Avery63 WorldGrid Demonstration (Nov. 2002)  Joint iVDGL + EU effort  Resources from both sides (15 sites)  Monitoring tools (Ganglia, MDS, NetSaint, …)  Visualization tools (Nagios, MapCenter, Ganglia)  Applications  CMS:CMKIN, CMSIM  ATLAS:ATLSIM  Submit jobs from US or EU  Jobs can run on any cluster  Major demonstrations  IST2002 (Copenhagen)  SC2002 (Baltimore)

University of Mississippi (Nov. 11, 2003)Paul Avery64 WorldGrid Sites (Nov. 2002)

University of Mississippi (Nov. 11, 2003)Paul Avery65 International Grid Coordination  Global Grid Forum (GGF)  International forum for general Grid efforts  Many working groups, standards definitions  Close collaboration with EU DataGrid (EDG)  Many connections with EDG activities  HICB: HEP Inter-Grid Coordination Board  Non-competitive forum, strategic issues, consensus  Cross-project policies, procedures and technology, joint projects  HICB-JTB Joint Technical Board  Definition, oversight and tracking of joint projects  GLUE interoperability group  Participation in LHC Computing Grid (LCG)  Software Computing Committee (SC2)  Project Execution Board (PEB)  Grid Deployment Board (GDB)