Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org The EGEE Project Status Ian Bird EGEE Operations Manager CERN Geneva, Switzerland ISGC, Taipei.

Similar presentations


Presentation on theme: "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org The EGEE Project Status Ian Bird EGEE Operations Manager CERN Geneva, Switzerland ISGC, Taipei."— Presentation transcript:

1 INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org The EGEE Project Status Ian Bird EGEE Operations Manager CERN Geneva, Switzerland ISGC, Taipei 27 th April 2005

2 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 2 Contents The EGEE Project –Overview and Structure –Grid Operations –Middleware –Networking Activities –Applications  HEP  …  Biomedical Summary

3 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 3 EGEE goals Goal of EGEE: develop a service grid infrastructure which is available to scientists 24 hours-a-day The project concentrates on: –building a consistent, robust and secure Grid network that will attract additional computing resources –continuously improve and maintain the middleware in order to deliver a reliable service to users –attracting new users from industry as well as science and ensure they receive the high standard of training and support they need

4 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 4 EGEE EGEE is the largest Grid infrastructure project in Europe: 70 leading institutions in 27 countries, federated in regional Grids Leveraging national and regional grid activities ~32 M Euros EU funding for initially 2 years starting 1st April 2004 EU review, February 2005 successful Preparing 2 nd phase of the project – proposal to EU Grid call September 2005 Promoting scientific partnership outside EU

5 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 5 EGEE Activities 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) Emphasis in EGEE is on operating a production grid and supporting the end-users

6 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 6 EGEE Activities Emphasis in EGEE is on operating a production grid and supporting the end-users

7 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 7 Country providing resources Country anticipating joining EGEE/LCG In EGEE-0 (LCG-2):  >130 sites  >14,000 CPUs  >5 PB storage Computing Resources – April 2005 This greatly exceeds the project expectations for numbers of sites Shows that the main issue of complexity is the number of sites

8 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 8 SA1 – Operations Structure Operations Management Centre (OMC): –At CERN – coordination etc Core Infrastructure Centres (CIC) –Manage daily grid operations – oversight, troubleshooting –Run essential infrastructure services –Provide 2 nd level support to ROCs –UK/I, Fr, It, CERN, + Russia (M12) –Taipei will also run a CIC Regional Operations Centres (ROC) –Act as front-line support for user and operations issues –Provide local knowledge and adaptations –One in each region – many distributed User Support Centre (GGUS) –In FZK – manage PTS – provide single point of contact (service desk) –Not foreseen as such in TA, but need is clear

9 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 9 Grid Operations The grid is flat, but Hierarchy of responsibility –Essential to scale the operation CICs act as a single Operations Centre –Operational oversight (grid operator) responsibility –rotates weekly between CICs –Report problems to ROC/RC –ROC is responsible for ensuring problem is resolved –ROC oversees regional RCs ROCs responsible for organising the operations in a region –Coordinate deployment of middleware, etc CERN coordinates sites not associated with a ROC CIC RC ROC RC ROC RC ROC RC ROC OMC RC - Resource Centre ROC - Regional Operations Centre CIC – Core Infrastructure Centre

10 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 10 Grid monitoring –GIIS Monitor + Monitor Graphs –Sites Functional Tests –GOC Data Base –Scheduled Downtimes –Live Job Monitor –GridIce – VO + fabric view –Certificate Lifetime Monitor Operation of Production Service: real-time display of grid operations Accounting information Selection of Monitoring tools:

11 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 11 Operations focus Main focus of activities now: –Improving the operational reliability and application efficiency:  Automating monitoring  alarms  Ensuring a 24x7 service  Removing sites that fail functional tests  Operations interoperability with OSG and others –Improving user support:  Demonstrate to users a reliable and trusted support infrastructure –Deployment of gLite components:  Testing, certification  pre-production service  Migration planning and deployment – while maintaining/growing interoperability  Further developments now have to be driven by experience in real use LCG-2 (=EGEE-0) prototyping product 2004 2005 LCG-3 (=EGEE-x?) product

12 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 12 EGEE Activities Emphasis in EGEE is on operating a production grid and supporting the end-users

13 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 13 gLite middleware –The 1st release of gLite (v1.0) made end March’05  http://glite.web.cern.ch/glite/packages/R1.0/R20050331 http://glite.web.cern.ch/glite/packages/R1.0/R20050331  http://glite.web.cern.ch/glite/documentation http://glite.web.cern.ch/glite/documentation –Lightweight services –Interoperability & Co-existence with deployed infrastructure –Performance & Fault Tolerance –Portable –Service oriented approach –Site autonomy –Open source license

14 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 14 Job management Services –Workload Management –Computing Element –Logging and Bookkeeping Data management Services –File and Replica catalog –File Transfer and Placement Services –gLite I/O Information Services –R-GMA –Service Discovery Security Deployment Modules –Distribution available as RPM’s, Binary Tarballs, Source Tarballs and APT cache gLite Release 1.0 Grid Access Service API Access Services Job Provenance Job Management Services Computing Element Workload Management Package Manager Metadata Catalog Data Services Storage Element Data Management File & Replica Catalog Authorization Security Services Authentication Auditing Information & Monitoring Information & Monitoring Services Application Monitoring Site Proxy Accounting JRA3 UK CERN IT/CZ Serious testing & certification is just starting

15 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 15 gLite Services for Release 1.0 Components Summary and Origin Computing Element –Gatekeeper, WSS (Globus) –Condor-C (Condor) –CE Monitor (EGEE) –Local batch system (PBS, LSF, Condor) Workload Management –WMS (EDG) –Logging and bookkeeping (EDG) –Condor-C (Condor) Storage Element –File Transfer/Placement (EGEE) –glite-I/O (AliEn) –GridFTP (Globus) –SRM: Castor (CERN), dCache (FNAL, DESY), other SRMs Catalog –File and Replica Catalog (EGEE) –Metadata Catalog (EGEE) Information and Monitoring –R-GMA (EDG) –Service Discovery (EGEE) Security –VOMS (DataTAG, EDG) –GSI (Globus) –Authentication for C and Java based (web) services (EDG)

16 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 16 Main Differences to LCG-2 Workload Management System works in push and pull mode Computing Element moving towards a VO based scheduler guarding the jobs of the VO (reduces load on GRAM) Re-factored file & replica catalogs Secure catalogs (based on user DN; VOMS certificates being integrated) Scheduled data transfers SRM based storage Information Services: R-GMA with improved API, Service Discovery and registry replication Move towards Web Services

17 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 17 EGEE Activities Emphasis in EGEE is on operating a production grid and supporting the end-users

18 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 18 Outreach & Training Public and technical websites constantly evolving to expand information available and keep it up to date 2 conferences organised –~ 300 @ Cork, ~ 400 @ Den Haag Athens 3rd project conference 18-22 April ’05 –http://public.eu-egee.org/conferences/3rd/http://public.eu-egee.org/conferences/3rd/ Pisa 4th project conference 24-28 October ’05 More than 70 training events (including the GGF grid school) across many countries –~1000 people trained  induction; application developer; advanced; retreats –Material archive with more than 100 presentations Strong links with GILDA testbed and GENIUS portal developed in EU DataGrid

19 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 19 Deployment of applications Pilot applications –High Energy Physics –Biomed applications http://egee-na4.ct.infn.it/biomed/applications.html Generic applications – Deployment under way –Computational Chemistry –Earth science research –EGEODE: first industrial application –Astrophysics With interest from –Hydrology –Seismology –Grid search engines –Stock market simulators –Digital video etc. –Industry (provider, user, supplier) Many users –broad range of needs –different communities with different background and internal organization PilotNew

20 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 20 High Energy Physics Very experienced and large international user community –Involvement in many projects worldwide and users of several grids (e.g. all LHC experiments do use multiple grids at the same time for their data challenges) –LG experiments; ZEUS, D0, CDF, H1, Babar Production infrastructure (LCG/EGEE) –Intensive usage during 2004 data challenges –LHCb – 3500 concurrent jobs for long periods –Many issues of functionality and performance were exposed –Data challenges were also first real use of LCG-2 – only limited testing had been done in advance –Major issue was reliability – badly configured and unstable sites –Nevertheless significant work was done:  >1 M SI2K years of cpu time (~1000 cpu years)  400 TB of data generated, moved and stored  4000-5000 simultaneous jobs (~4 times CERN grid capacity) ARDA role in application development and middleware testing –Helping the evolution of the experiments specific middleware towards analysis usage  Large effort on the 4 LHC experiments’ prototypes  CMS prototype migrated to gLite version 1 and exposed to several users –Early feedback on the utilisation of the gLite prototype right from the start of EGEE –Contribution to the common testing effort together with JRA1, SA1 and NA4-testing Improved reliability has been achieved by selecting well maintained sites Efficiencies of better than 90% have been possible – D0, CMS, ATLAS, in well controlled conditions This remains main area of focus for improvement – due in large part to number of sites in the infrastructure

21 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 21 Recent ATLAS work ATLAS jobs in EGEE/LCG-2 in 2005 In latest period up to 8K jobs/day Used a combination of RB and Condor_G submissions Number of jobs/day ~10,000 concurrent jobs in the system

22 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 22 … ZEUS on LCG-2

23 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 23 LCG Deployment Schedule LHC starts in 2007 Ramp-up with series of service challenges to ensure key services & infrastructure in place Extremely aggressive timescale

24 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 24 Introduction: The MAGIC Telescope Ground based Air Cerenkov Telescope Gamma ray: 30 GeV - TeV LaPalma, Canary Islands ( 28° North, 18° West ) 17 m diameter operation since autumn 2003 (still in commissioning) Collaborators: IFAE Barcelona, UAB Barcelona, Humboldt U. Berlin, UC Davis, U. Lodz, UC Madrid, MPI München, INFN / U. Padova, U. Potchefstrom, INFN / U. Siena, Tuorla Observatory, INFN / U. Udine, U. Würzburg, Yerevan Physics Inst., ETH Zürich Physics Goals: Origin of VHE Gamma rays Active Galactic Nuclei Supernova Remnants Unidentified EGRET sources Gamma Ray Burst

25 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 25 ~ 10 km Particle shower Introduction – ground γ-ray astronomy ~ 1 o Cherenkov light ~ 120 m Gamma ray GLAST (~ 1 m 2 ) Cherenkov light Image of particle shower in telescope camera reconstruct: arrival direction, energy reject hadron background

26 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 26 MAGIC – Hadron rejection Based on extensive Monte Carlo Simulation –air shower simulation program CORSIKA –Simulation of hadronic background is very CPU consuming  to simulate the background of one night, 70 CPUs (P4 2GHz) needs to run 19200 days  to simulate the gamma events of one night for a Crab like source takes 288 days. –At higher energies (> 70 GeV) observations are possible already by On-Off method (This reduces the On-time by a factor of two) –Lowering the threshold of the MAGIC telescope requires new methods based on Monte Carlo Simulations

27 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 27 Experiences Data challenge Grid-1 12M hadron events 12000 jobs needed started march 2005 up to now ~ 4000 jobs First tests: with manual GUI submission Reasons for failure Network problems RB problems Queue problems Job successful: Output file registered at PIC Diagnostic: no tools found complex and time consuming  use metadata base, log the failure, resubmit and don‘t care 170/3780 Jobs failed  4.5 % failure

28 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 28 Biomed applications Loosely coupled community Had to go the long way of getting up to speed –VO creation and core services installation –Setting up a task force of experts –Recently joined the user support at application level Applications –See list and description from web site  http://egee-na4.ct.infn.it/biomed/applications.html http://egee-na4.ct.infn.it/biomed/applications.html –12 applications running today New applications emerging –medical imaging, bioinformatics, phylogenetics, molecule structures and drug discovery... Grown to a significant infrastructure usage –29kCPU hours and 24k jobs reported on January

29 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 29 Bioinformatics GPS@: Grid Protein Sequence Analysis –NPSA is a web portal offering proteins databases and sequence analysis algorithms to the bioinformaticians (3000 hits per day) –GPS@ is a gridified version with increased computing power –Need for large databases and big number of short jobs xmipp_MLrefine –3D structure analysis of macromolecules from (very noisy) electron microscopy images –Maximum likelihood approach for finding the optimal model –Very compute intensive Drug discovery –Health related area with high performance computation need –An application currently being ported in Germany (Fraunhofer institute)

30 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 30 Medical imaging GATE –Radiotherapy planning –Improvement of precision by Monte Carlo simulation –Processing of DICOM medical images –Objective: very short computation time compatible with clinical practice –Status: development and performance testing CDSS –Clinical Decision Support System –knowledge databases assembling –image classification engines widespreading –Objective: access to knowledge databases from hospitals –Status: from development to deployment, some medical end users

31 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 31 Medical imaging SiMRI3D –3D Magnetic Resonance Image Simulator –MRI physics simulation, parallel implementation –Very compute intensive –Objective: offering an image simulator service to the research community –Satus: parallelized and now running on LCG2 resources gPTM3D –Interactive tool for medical images segmentation and analysis –A non gridified version is distributed in several hospitals –Need for very fast scheduling of interactive tasks –Objectives: shorten computation time using the grid –Status: development of the gridified version being finalized

32 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 32 Evolution of biomedical applications Growing interest of the biomedical community –Partners involved proposing new applications –New application proposals (in various health-related areas) –Enlargement of the biomedical community (drug discovery) Growing scale of the applications –Progressive migration from prototypes to pre-production services for some applications –Increase in scale (volume of data and number of CPU hours)

33 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 33 EGEE Geographical Extensions EGEE is a truly international under-taking Collaborations with other existing European projects, in particular:  GÉANT, DEISA, SEE-GRID Relations to other projects/proposals:  OSG: OpenScienceGrid (USA)  Asia: Korea, Taiwan, EU-ChinaGrid  BalticGrid: Lithuania, Latvia, Estonia  EELA: Latin America  EUMedGrid: Mediterranean Area  … Expansion of EGEE infrastructure in these regions is a key element for the future of the project and international science

34 Enabling Grids for E-sciencE INFSO-RI-508833 ISGC 2005 34 Summary EGEE is a first attempt to build a worldwide Grid infrastructure for data intensive applications from many scientific domains A large-scale production grid service is already deployed and being used for HEP and BioMed applications with new applications being ported Resources & user groups are expanding A process is in place for migrating new applications to the EGEE infrastructure A training programme has started with many events already held “next generation” middleware is being tested (gLite) First project review by the EU successfully passed in Feb’05 Plans for a follow-on project are being prepared


Download ppt "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org The EGEE Project Status Ian Bird EGEE Operations Manager CERN Geneva, Switzerland ISGC, Taipei."

Similar presentations


Ads by Google