Presentation is loading. Please wait.

Presentation is loading. Please wait.

European DataGrid Project status and plans Peter Kunszt, CERN DataGrid, WP2 Manager

Similar presentations


Presentation on theme: "European DataGrid Project status and plans Peter Kunszt, CERN DataGrid, WP2 Manager"— Presentation transcript:

1 European DataGrid Project status and plans Peter Kunszt, CERN DataGrid, WP2 Manager Peter.Kunszt@cern.ch

2 ACAT, Moscow – 26 June 2002 - n° 2 Outline EU DataGrid Project  EDG overview  Project Organisation  Objectives  Current Status overall and by WP  Plans for next releases and testbed 2  Conclusions

3 ACAT, Moscow – 26 June 2002 - n° 3 The Grid vision  Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource  From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”  Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of…  central location,  central control,  omniscience,  existing trust relationships.

4 ACAT, Moscow – 26 June 2002 - n° 4 Grids: Elements of the Problem  Resource sharing  Computers, storage, sensors, networks, …  Sharing always conditional: issues of trust, policy, negotiation, payment, …  Coordinated problem solving  Beyond client-server: distributed data analysis, computation, collaboration, …  Dynamic, multi-institutional virtual orgs  Community overlays on classic org structures  Large or small, static or dynamic

5 ACAT, Moscow – 26 June 2002 - n° 5 EU DataGrid Project Objectives  DataGrid is a project funded by European Union whose objective is to exploit and build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases.  Enable data intensive sciences by providing world wide Grid test beds to large distributed scientific organisations ( “Virtual Organisations, VO”)  Start ( Kick off ) : Jan 1, 2001 End : Dec 31, 2003  Applications/End Users Communities : HEP, Earth Observation, Biology  Specific Project Objetives:  Middleware for fabric & grid management  Large scale testbed  Production quality demonstrations  To collaborate with and complement other European and US projects  Contribute to Open Standards and international bodies ( GGF, Industry&Research forum)

6 ACAT, Moscow – 26 June 2002 - n° 6 DataGrid Main Partners  CERN – International (Switzerland/France)  CNRS - France  ESA/ESRIN – International (Italy)  INFN - Italy  NIKHEF – The Netherlands  PPARC - UK

7 ACAT, Moscow – 26 June 2002 - n° 7 Research and Academic Institutes CESNET (Czech Republic) Commissariat à l'énergie atomique (CEA) – France Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) Consiglio Nazionale delle Ricerche (Italy) Helsinki Institute of Physics – Finland Institut de Fisica d'Altes Energies (IFAE) - Spain Istituto Trentino di Cultura (IRST) – Italy Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany Royal Netherlands Meteorological Institute (KNMI) Ruprecht-Karls-Universität Heidelberg - Germany Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands Swedish Research Council - Sweden Assistant Partners Industrial Partners Datamat (Italy) IBM-UK (UK) CS-SI (France)

8 ACAT, Moscow – 26 June 2002 - n° 8 Project Schedule  Project started on 1/Jan/2001  TestBed 0 (early 2001) International test bed 0 infrastructure deployed Globus 1 only - no EDG middleware  TestBed 1 ( now ) First release of EU DataGrid software to defined users within the project: HEP experiments (WP 8), Earth Observation (WP 9), Biomedical applications (WP 10)  Successful Project Review by EU: March 1 st 2002  TestBed 2 (October 2002) Builds on TestBed 1 to extend facilities of DataGrid  TestBed 3 (March 2003) & 4 (September 2003)  Project stops on 31/Dec/2003

9 ACAT, Moscow – 26 June 2002 - n° 9 EDG Highlights  The project is up and running!  All 21 partners are now contributing at contractual level  total of ~60 man years for first year  All EU deliverables (40, >2000 pages) submitted  in time for the review according to the contract technical annex  First test bed delivered with real production demos  All deliverables (code & documents) available via www.edg.org www.edg.org  http://eu-datagrid.web.cern.ch/eu-datagrid/Deliverables/default.htm http://eu-datagrid.web.cern.ch/eu-datagrid/Deliverables/default.htm  requirements, surveys, architecture, design, procedures, testbed analysis etc.

10 ACAT, Moscow – 26 June 2002 - n° 10 Working Areas Applications Middleware Infrastructure Management Testbed  The DataGrid project is divided in 12 Work Packages distributed in four Working Areas

11 ACAT, Moscow – 26 June 2002 - n° 11 Work Packages WP1: Work Load Management System WP2: Data Management WP3: Grid Monitoring / Grid Information Systems WP4: Fabric Management WP5: Storage Element WP6: Testbed and demonstrators WP7: Network Monitoring WP8: High Energy Physics Applications WP9: Earth Observation WP10: Biology WP11: Dissemination WP12: Management

12 ACAT, Moscow – 26 June 2002 - n° 12 Objectives for the first year of the project  Collect requirements for middleware Take into account requirements from application groups  Survey current technology For all middleware  Core Services testbed Testbed 0: Globus (no EDG middleware)  First Grid testbed release Testbed 1: first release of EDG middleware  WP1: workload Job resource specification & scheduling  WP2: data management Data access, migration & replication  WP3: grid monitoring services Monitoring infrastructure, directories & presentation tools  WP4: fabric management Framework for fabric configuration management & automatic sw installation  WP5: mass storage management Common interface for Mass Storage Sys.  WP7: network services Network services and monitoring

13 ACAT, Moscow – 26 June 2002 - n° 13 DataGrid Architecture Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authenticatio n and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index

14 ACAT, Moscow – 26 June 2002 - n° 14 EDG Interfaces Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index Computing Elements SystemManagers Scientist s OperatingSystems File Systems StorageElements Mass Storage Systems HPSS, Castor User Accounts Certificate Authorities ApplicationDevelopers Batch Systems PBS, LSF

15 ACAT, Moscow – 26 June 2002 - n° 15 WP1: Work Load Management  Goals  Maximise use of resources by efficient scheduling of user jobs  Achievements  Analysis of work-load management system requirements & survey of existing mature implementations Globus & Condor (D1.1)  Definition of architecture for scheduling & res. mgmt. (D1.2)  Development of "super scheduling" component using application data and computing elements requirements  Issues  Integration with software from other WPs  Advanced job submission facilities Components Job Description Language Resource Broker Job Submission Service Information Index User Interface Logging & Bookkeeping Service Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index

16 ACAT, Moscow – 26 June 2002 - n° 16 WP2: Data Management  Goals  Coherently manage and share petabyte-scale information volumes in high-throughput production- quality grid environments  Achievements  Survey of existing tools and technologies for data access and mass storage systems (D2.1)  Definition of architecture for data management (D2.2)  Deployment of Grid Data Mirroring Package (GDMP) in testbed 1  Close collaboration with Globus, PPDG/GriPhyN & Condor  Working with GGF on standards  Issues  Security: clear methods handling authentication and authorization  Data replication - how to maintain consistent up to date catalogues of application data and its replicas Components GDMP Replica Catalog SpitFire Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index

17 ACAT, Moscow – 26 June 2002 - n° 17 WP3: Grid Monitoring Services  Goals  Provide information system for discovering resources and monitoring status  Achievements  Survey of current technologies (D3.1)  Coordination of schemas in testbed 1  Development of Ftree caching backend based on OpenLDAP (Light Weight Directory Access Protocol) to address shortcoming in MDS v1  Design of Relational Grid Monitoring Architecture (R-GMA) (D3.2) – to be further developed with GGF  GRM and PROVE adapted to grid environments to support end-user application monitoring Components MDS/Ftree R-GMA GRM/PROVE Collective Services Informat ion & Monitori ng Replica Manager Grid Schedul er Local Application Local Database Underlying Grid Services Computi ng Element Services Authorizat ion Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configurati on Manageme nt Configurati on Manageme nt Node Installation & Manageme nt Node Installation & Manageme nt Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Manageme nt Fabric Storage Manageme nt Fabric Storage Manageme nt Grid Application Layer Data Managem ent Job Managem ent Metadata Managem ent Object to File Mapping Service Index

18 ACAT, Moscow – 26 June 2002 - n° 18 WP4: Fabric Management  Goals  manage clusters (~thousands) of nodes  Achievements  Survey of existing tools, techniques and protocols (D4.1)  Defined an agreed architecture for fabric management (D4.2)  Initial implementations deployed at several sites in testbed 1  Issues  How to install reference platform and EDG software on large numbers of hosts with minimal human intervention per node  How to ensure the node configurations are consistent and handle updates to the software suites Components LCFG PBS & LSF info providers Image installation Config. Cache Mgr Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index

19 ACAT, Moscow – 26 June 2002 - n° 19 WP5: Mass Storage Management  Goals  Provide common user and data export/import interfaces to existing local mass storage systems  Achievements  Review of Grid data systems, tape and disk storage systems and local file systems (D5.1)  Definition of Architecture and Design for DataGrid Storage Element (D5.2)  Collaboration with Globus on GridFTP/RFIO  Collaboration with PPDG on control API  First attempt at exchanging Hierarchical Storage Manager (HSM) tapes  Issues  Scope and requirements for storage element  Inter-working with other Grids Components Storage Element info. providers RFIO MSS staging Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index

20 ACAT, Moscow – 26 June 2002 - n° 20 WP7: Network Services  Goals  Review the network service requirements for DataGrid  Establish and manage the DataGrid network facilities  Monitor the traffic and performance of the network  Deal with the distributed security aspects  Achievements  Analysis of network requirements for testbed 1 & study of available network physical infrastructure (D7.1)  Use of European backbone GEANT since Dec. 2001  Initial network monitoring architecture defined (D7.2) and first tools deployed in testbed 1  Collaboration with Dante & DataTAG  Working with GGF (Grid High Performance Networks) & Globus (monitoring/MDS)  Issues  Resources for study of security issues  End-to-end performance for applications depend on a complex combination of components Components network monitoring tools: PingER Udpmon Iperf Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index

21 ACAT, Moscow – 26 June 2002 - n° 21 WP6: TestBed Integration  Goals  Deploy testbeds for the end-to-end application experiments & demos  Integrate successive releases of the software components  Achievements  Integration of EDG sw release 1.0 and deployment  Working implementation of multiple Virtual Organisations (VOs) s & basic security infrastructure  Definition of acceptable usage contracts and creation of Certification Authorities group  Issues  Procedures for software integration  Test plan for software release  Support for production-style usage of the testbed Components Globus packaging & EDG config Build tools End-user documents Collective Services Informat ion & Monitori ng Replica Manager Grid Schedul er Local Application Local Database Underlying Grid Services Computi ng Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configurati on Manageme nt Configurati on Manageme nt Node Installation & Manageme nt Node Installation & Manageme nt Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Manageme nt Fabric Storage Manageme nt Fabric Storage Manageme nt Grid Application Layer Data Managem ent Job Managem ent Metadata Managem ent Object to File Mapping Service Index WP6 additions to Globus GlobusEDG release

22 ACAT, Moscow – 26 June 2002 - n° 22 Software Release Procedure  Coordination meeting Gather feedback on previous release Review plan for next release  WP meeting Take basic plan and clarify effort/people/dependencies  Sw development Performed by WPs in dispersed institutes and run unit tests  Software integration Performed by WP6 on frozen sw Integration tests run http://edms.cern.ch/document/341943  Acceptance tests Performed by Loose Cannons et al.  Roll-out Present sw to application groups Deploy on testbed Coord. meeting Release Plan++ Release feedback Release Plan WP meetings WP1 WP3 WP7 Component 1 Component n Globus EDG release Distributed EDG release Software release Plan http://edms.cern.ch/document/333297 Roll-out. meeting testbed 1: Dec 11 2001 ~100 participants

23 ACAT, Moscow – 26 June 2002 - n° 23 Grid aspects covered by EDG testbed 1 VO servers LDAP directory for mapping users (with certificates) to correct VO Storage Element Grid-aware storage area, situated close to a CE User Interface Submit & monitor jobs, retrieve output Replica Manager Replicates data to one or more CEs Job Submission Service Manages submission of jobs to Res. Broker Replica Catalog Keeps track of multiple data files “replicated” on different CEs Information index Provides info about grid resources via GIIS/GRIS hierarchy Information & Monitoring Provides info on resource utilization & performance Resource Broker Uses Info Index to discover & select resources based on job requirements Grid Fabric Mgmt Configure, installs & maintains grid sw packages and environ. Logging and Bookkeeping Collects resource usage & job status Network performance, security and monitoring Provides efficient network transport, security & bandwidth monitoring Computing Element Gatekeeper to a grid computing resource Testbed admin. Certificate auth.,user reg., usage policy etc.

24 ACAT, Moscow – 26 June 2002 - n° 24 TestBed 1 Sites Status Web interface showing status of servers at testbed 1 sites

25 ACAT, Moscow – 26 June 2002 - n° 25 DataGrid Testbed Dubna Moscow RAL Lund Lisboa Santander Madrid Valencia Barcelona Paris Berlin Lyon Grenoble Marseille Brno Prague Torino Milano BO-CNAF PD-LNL Pisa Roma Catania ESRIN CERN HEP sites ESA sites IPSL Estec KNMI (>40) Francois.Etienne@in2p3.frFrancois.Etienne@in2p3.fr - Antonia.Ghiselli@cnaf.infn.itAntonia.Ghiselli@cnaf.infn.it Testbed Sites

26 ACAT, Moscow – 26 June 2002 - n° 26 Physicists from LHC experiments submit jobs with their application software that uses:  User interface (job submission language etc.)  Resource Broker & Job submission service  Information Service & Monitoring  Data Replication Initial testbed usage Add lfn/pfn to Rep Catalog Generate Raw events on local disk Raw/dst ? Job arguments Data Type : raw/dst Run Number :xxxxxx Number of evts :yyyyyy Number of wds/evt:zzzzzz Rep Catalog flag : 0/1 Mass Storage flag : 0/1 Write logbook On client node raw_xxxxxx_dat.log dst_xxxxxx_dat.log Read raw events Write dst events Get pfn from Rep Catalog Add lfn/pfn to Rep Catalog MS Move to SE, MS ? Write logbook On client node pfn local ? n y raw_xxxxxx_dat.log Copy raw data From SE to Local disk Generic HEP application flowchart SE Move to SE, MS? SE [reale@testbed006 JDL]$ dg-job-submit gridpawCNAF.jdl Connecting to host testbed011.cern.ch, port 7771 Transferring InputSandbox files...done Logging to host testbed011.cern.ch, port 1 5830 =========dg-job-submit Success ============ The job has been successfully submitted to the Resource Broker. Use dg-job-status command to check job current status. Your job identifier (dg_jobId) is: https://testbed011.cern.ch:7846/137.138.181.253/185337169921026?testbed011.cern.ch:7771 ======================================== [reale@testbed006 JDL]$ dg-job-get-output https://testbed011.cern.ch:7846/137.138.181.253/185337169921026?testbed011.cern.ch:7771 Retrieving OutputSandbox files...done ============ dg-get-job-output Success ============ Output sandbox files for the job: - https://testbed011.cern.ch:7846/137.138.181.253/185337169921026?testbed011.cern.ch:7771 have been successfully retrieved and stored in the directory: /sandbox/185337169921026 First simulated ALICE event generated by using the DataGrid Job Submission Service

27 ACAT, Moscow – 26 June 2002 - n° 27 Biomedical applications  Data mining on genomic databases (exponential growth)  Indexing of medical databases (Tb/hospital/year)  Collaborative framework for large scale experiments (e.g. epidemiological studies)  Parallel processing for  Databases analysis  Complex 3D modelling

28 ACAT, Moscow – 26 June 2002 - n° 28 Earth Observations ESA missions: about 100 Gbytes of data per day (ERS 1/2) 500 Gbytes, for the next ENVISAT mission (launched March 1st) EO requirements for the Grid: enhance the ability to access high level products allow reprocessing of large historical archives improve Earth science complex applications (data fusion, data mining, modelling …)

29 ACAT, Moscow – 26 June 2002 - n° 29 Development & Production testbeds  Development  Initial set of 5 sites will keep small cluster of PCs for development purposes to test new versions of the software, configurations etc.  Production  More stable environment for use by application groups more sites more nodes per site (grow to meaningful size at major centres) more users per VO  Usage already foreseen in Data Challenge schedules for LHC experiments harmonize release schedules

30 ACAT, Moscow – 26 June 2002 - n° 30  Planned intermediate release schedule TestBed 1:November 2001 Release 1.1: January 2002 Release 1.2: July 2002 Release 1.3: internal release only Release 1.4: August 2002 TestBed 2: October 2002  Similar schedule will be made for 2003  Each release includes feedback from use of previous release by application groups planned improvements/extension by middle- ware WPs more use of WP6 software infrastructure feeds into architecture group Plans for 2002  Extension of testbed more users, sites & nodes-per-site split testbed into development and production sites investigate inter-operability with US grids  Iterative releases up to testbed 2 incrementally extend functionality provided via each Work Package better integrate the components improve stability  Testbed 2 (autumn 2002) extra requirements interactive jobs job partitioning for parallel execution advance reservation accounting & Query optimization security design (D7.6)...

31 ACAT, Moscow – 26 June 2002 - n° 31 Release Plan details  Current release EDG 1.1.4 Deployed on testbed under RedHat 6.2  Finalising build of EDG 1.2 GDMP 3.0 GSI-enabled RFIO client and server  EDG 1.3 (internal) Build using autobuild tools – to ease future porting Support for MPI on single site  EDG 1.4 (august)  Support RH 6.2 & 7.2  Basic support for interactive jobs  Integration of Condor DAGman  Use MDS 2.2 with first GLUE schema  EDG 2.0 (Oct)  Still based on Globus 2.x (pre-OGSA)  Use updated GLUE schema  Job partitioning & check-pointing  Advanced reservation/co-allocation See http://edms.cern.ch/document/333297 for further detailshttp://edms.cern.ch/document/333297

32 ACAT, Moscow – 26 June 2002 - n° 32 Issues  Support for production testbed  Effort for testing  Software Release Procedure: Integrated testing  CA explosion, CAS introduction and policy support  Packaging & distribution  S/W licensing  Convergence on Architecture  Impact of OGSA

33 ACAT, Moscow – 26 June 2002 - n° 33 Issues - Actions  Support for production testbed – support team and dedicated site  Effort for testing – test team  Software Release Procedure: Integrated testing – expand procedure  CA explosion, CAS introduction and policy support – security group’s security design  Packaging & distribution – ongoing  S/W licensing – has been addressed, see http://www.edg.org/license  Convergence on Architecture – architecture group  Impact of OGSA – design of OGSA services in WP2, WP3

34 ACAT, Moscow – 26 June 2002 - n° 34 Future Plans  Expand and consolidate testbed operations  Improve the distribution, maintenance and support process  Understand, refine Grid operations  Evolve architecture and software on the basis of TestBed usage and feedback from users  GLUE  Converging to common documents with PPDG/GriPhyN  OGSA interfaces and components  Prepare for second test bed in autumn 2002 in close collaboration with LCG  Enhance synergy with US via DataTAG-iVDGL and InterGrid  Promote early standards adoption with participation to GGF and other international bodies  Explore possible Integrated Project within FP6

35 ACAT, Moscow – 26 June 2002 - n° 35 Learn more on EU-DataGrid  For more information, see the EDG website  http://www.edg.org/ http://www.edg.org/  EDG Tutorials at ACAT:  Tuesday 15.00-17.00  Wednesday 17.30-19.30  EDG Tutorials at GGF5 in Edinburgh 25.7.2002 – see http://www.gridforum.org/ http://www.gridforum.org/  Cern School of Computing Vico Equense, Italy, 15-28 September 2002  Programme includes Grid Lectures by Ian Foster and Carl Kesselman and a hands-on tutorial on DataGrid, http://cern.ch/CSC/http://cern.ch/CSC/


Download ppt "European DataGrid Project status and plans Peter Kunszt, CERN DataGrid, WP2 Manager"

Similar presentations


Ads by Google