EGEE Asia Pacific Regional Operation Center

Slides:



Advertisements
Similar presentations
Updates of the APGrid PMA Catania March 3, 2009 Yoshio Tanaka APGridPMA Chair, AIST, Japan.
Advertisements

The gLite Support System Giuseppe LA ROCCA INFN Catania
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Academia Sinica Grid Computing Certification Authority (ASGCCA) Jinny Chien F2F Meeting 8 th March 2010.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Pilot Test-bed Operations and Support Work.
EGEE-II INFSO-RI Enabling Grids for E-sciencE AP ROC Min-Hong Tsai ASGC SA1 Transition Meeting May 8 th, 2008
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
1 Next steps with EGEE and P-GRADE Portal Gergely Sipos MTA SZTAKI EGEE Training and Induction.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
Training and Dissemination Enabling Grids for E-sciencE Jinny Chien, ASGC 1 Training and Dissemination Jinny Chien Academia Sinica Grid.
EGEE is a project funded by the European Union under contract IST User support in EGEE Alistair Mills Torsten Antoni EGEE-3 Conference 20 April.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE Gergely Sipos
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
Academia Sinica Grid Computing Certification Authority (ASGCCA) Jinny Chien April 20, th APGridPMA in Taipei.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
Lessons Learned from disaster recovery Jinny Chien April 20, th APGridPMA in Taipei.
Taiwan COD Experience Joen Yi Jian, Shu-Ting Liao, Min Tsai ASGC EGEE 4 Conference: Oct 25, 2005.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
EGEE ARM-2 – 5 Oct LCG/EGEE Security Coordination Ian Neilson Grid Deployment Group CERN.
Area Coordinator Report for Operations Rob Quick 4/10/2008.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
1 Presentation Title Speaker Institution Event Name EUAsiaGRID Community and Applications Support Jan Kmuníček, CESNET.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
APGridPMA Update Eric Yen APGridPMA August, 2014.
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security aspects (based on Romain Wartel’s.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Update of APGridPMA Eric Yen 25 th EUGridPMA & IGTF All Hands Meeting KIT, Germany 7 May, 2012.
1 Grid Operations Jinny Chien ASGC June 09, Academia Sinica Slides adapted from the EGEE training material repository:
APGridPMA Update Eric Yen 35 th Amsterdam, NL September 7, 2015.
Scuola Grid - Martina Franca, Thursday 08 November Il Sistema di Supporto INFNGrid & GGUS ( Global Grid User.
Updates of APGrid PMA 18 th EUGridPMA Meeting 18 th EUGridPMA Meeting 18 January, 2010 Eric Yen ASGCCA Taiwan.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
1 The Life-Science Grid Community Tristan Glatard 1 1 Creatis, CNRS, INSERM, Université de Lyon, France The Spanish Network for e-Science 2/12/2010.
CERN WLCG Grid Storage Systems Deployment Flavia Donno, CERN 6 November 2007 Organization of Storage Support through GGUS Flavia Donno CERN/IT-GD CERN.
Service Availability Monitoring
Jean-Philippe Baud, IT-GD, CERN November 2007
Bob Jones EGEE Technical Director
Il Sistema di Supporto INFNGrid & GGUS (Global Grid User Support )
Regional Operations Centres Core infrastructure Centres
PPS All sites Meeting: - CODs and PPS - Monitoring Tools
GILDA t-Infrastructure
SA1 Execution Plan Status and Issues
LCG Security Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Short term improvements to the Information System: a status report
Brief overview on GridICE and Ticketing System
LCG/EGEE Incident Response Planning
EGEE VO Management.
EGEE support for HEP and other applications
The CCIN2P3 and its role in EGEE/LCG
GGUS Partnership between FZK and ASCC
LCG Operations Workshop, e-IRG Workshop
Welcome To : Group 1 VC Presentation
EGEE: Grid Operations & Management
Leigh Grundhoefer Indiana University
CHAIN PMB Meeting - Taipei,
EGEE Operation Tools and Procedures
Best practises and experiences in user support
Presentation transcript:

EGEE Asia Pacific Regional Operation Center Min-Hong Tsai ASGC ISGC 2008 April 10, Taipei http://www.eu-egee.org/ http://aproc.twgrid.org/

Agenda Asia Pacific Operation Center ASGC Service Availability Introduction CA Service Tutorials Site Deployment Regional Availability ASGC Service Availability

APROC Introduction Services APROC Mission Provide deployment support facilitating Grid expansion Maximize the availability of Grid services Services ASGCCA Certificate Authority services Initial site deployment Continuous operations support EGEE global operations support Site Deployment Support Registration Installation Certification Operations Support Monitoring, troubleshooting Problem tracking Software updates and security coordination Regional VO services - VOMS and LFC ASGCCA CA Service provide certificates for AP EGEE/LCG sites without domestic CA. EGEE Operations CIC-on-duty: EGEE global operations Monitoring tool development: GStat and GGUS Search TPM: Front line user support (Q4 2006)‏ OSCT: Incident Response duty (Dec 2006)‏

ASGCCA Service Providing CA services since 2003 Scalability concerns Serving Taiwan and Asia Pacific LCG/EGEE users 290 tickets closed in Feb 2008 Scalability concerns New APGridPMA CAs will reduce loading Investigate Member Integrated X509 Credential Services (MISC) 1658 ticket in the past year 290 tickets in Feb MISC to rely on existing organizations identity databases RA: * taiwan: 7 * korea: 1 * india: 11 * new zealand: 1 * phillipines: 1 * malaysia: 3 * vietnam: 1

Tutorials Events since last year: Grid Asia 07: 1day Induction Grid Camp 07: 3day Admin, Operations, Applications With CERN MIMOS Tutorial 07: 5day Application and Installation With EGEE NA3 ISGC 08: 1day Induction and Application MIMOS Installation Tutorial - Malaysia 25 virtual machines prepared for participants Firewall, os and middleware configuration errors Instructions were not explicit enough, which led to errors Investigate INFN GILDA admin training resources Participants obtained valid certificates and joined APeSci VO

APROC Sites Additional support planned for other EUAsiaGrid partners Supports EGEE sites in Asia Pacific since April 2005 21 production sites, 8 countries 4 sites in certification process China: Peking University PKU Japan: Hiroshima University Malaysia: MIMOS Vietnam: IOIT-HCM Additional support planned for other EUAsiaGrid partners Philippines Indonesia Brunei Thailand Ticket process manager: user support

Site Deployment Case Study I Preparation: Supplementary documentation Registration procedures Site preparation recommendations Non-middleware issues Summarize installation procedures Training Communication and interaction Email Remote login for troubleshooting

Site Deployment Case Study II Step Days Emails Site Design Recommendations 3 7 Registration 1 6 Hardware / OS Setup M/W Installation and Configuration 45 Certification / SAM Testing 8 4

Site Deployment Case Study III Issues: Major new release of new configuration tool version Configuration parameters Command line options Documentation Incorrect firewall configuration for services Difficult to interpret error messages (install, configuration, testing) Email latency and lack of clarify Recommendations: ROC Test and update supplementary documentation after major changes Site Studying the EGEE users guide is important Update ROC staff on status or new errors as often as possible Both Improve communication Video conference or in visits to or from ROC Test and resolve network issues at the before deployment

Regional Availability Issues March 2008 results 74% Availability Issues Configuration changes Heavy loading Service instabilities Network performance Possible solutions Expand coverage of monitoring tools Improve detail and coverage to current trouble shooting guides Diagnostic scripts to isolate problems Use High Availability solutions

Agenda Asia Pacific Operation Center ASGC Service Availability High Availability Services Monitoring and Notification 24x7 coverage

High Availability Services Virtual Router Redundancy Protocol Host failover Linux Virtual Server Service failover Load balancing

High Availability Services Advantages Easy to install Fast failover Customizable service checks Issues Network restriction for VRRP Scalability of LVS director Increased complexity Plans Extend HA to other services Investigate Dynamic DNS solutions See “WLCG Service Reliability - Best Practices” Tuesday presentation by James Casey

Monitoring and Notification Ganglia, Smokeping, Weathermap, SAM, GStat Nagios service fault monitoring Facility, Network, Grid, ROC 148 host and 570 services SMS notification Ticketing system integration Faults automatically generate new ticket Associated issues are combined into same ticket Recovery scripts for a couple services Future Plans Better integration of automatic recovery with Nagios Incorporate work from WLCG Monitoring Working Group CERN’s Service Level Status integration Recovery script * reads sms notifications * has delay for false positives and to avoid flapping * uses expect to reboot through blade management system

24x7 Coverage Service Class Escalation Open Issues Foundation: 1 hour response time Facility, Network, DNS, DB, Monitoring Critical: 2 hour response time Grid and Experiment Services Best Effort: next day User Interface Escalation On-site engineer On-call engineer – weekly rotation Service manager Open Issues Hire additional on-site engineer for 16x7 Add and improve set of recovery procedures and training

Summary Asia Pacific ROC provides regional EGEE operation Challenges are still present to: Stream line site deployment Increase the availability of sites and resources ASGC service availability depends on High availability solutions Monitoring and notification 24x7 processes Key personnel expertise and responsiveness

Thanks You for Your Attention! Questions? roc@lists.grid.sinica.edu.tw http://aproc.twgrid.org/aproc/ Thanks to efforts from: ASGC Operations Team Jinny Chien Aries Hong Jhen-Wei Huang Joanna Huang Hung-Che Jen Felix Lee Shu-Ting Liao Yuan-Pin Liao Jason Shih Dave Wei Yi-Han Wu