23 May 2007Hep Sysman, RAL Hepix/WLCG System Management WG: an update Alessandra Forti Hep Sysman, RAL 23 May 2007.

Slides:



Advertisements
Similar presentations
Admin and Security Track 2007 Pre-Summit Workshop & User Cooperative Admin and Security Track Bruce Knox University of Arkansas Division of Agriculture.
Advertisements

NIMAC 2.0: The Accessible Media Producer Portal NIMAC 2.0 for AMPs.
Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
Andrew McNab - Manchester HEP - 24 May 2001 WorkGroup H: Software Support Both middleware and application support Installation tools and expertise Communication.
The National Grid Service and OGSA-DAI Mike Mineter
E-Science Update Steve Gough, ITS 19 Feb e-Science large scale science increasingly carried out through distributed global collaborations enabled.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
AD User Import From SIMS.NET
ADABAS to RDBMS UsingNatQuery. The following session will provide a high-level overview of NatQuerys ability to automatically extract ADABAS data from.
Using SD K12 SharePoint ®. What is SharePoint? Microsoft SharePoint Components Web Browser Collaboration functions Process management modules Search modules.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
HEPiX Meeting Wrap Up Fall 2000 JLab. Meeting Highlights Monitoring –Several projects underway –Collaboration of ideas occurred –Communication earlier.
Password?. Project CLASP: Common Login and Access rights across Services Plan
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Andrew McNab - Manchester HEP - 22 April 2002 UK Rollout and Support Plan Aim of this talk is to the answer question “As a site admin, what are the steps.
Drupal Workshop Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology, Drupal technology, directories.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL, UK) 4 May 2011 HEPiX, GSI, Darmstadt david.kelsey at stfc.ac.uk.
MAE Atlassian Tool Suite Administration Training July 8 th, 2013.
HEPiX Catania 19 th April 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 19 th April 2002 HEPiX 2002, Catania.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Wiki Culture & Collaboration Presented by: Faria Sami Quratulain Shattari Munim Ahmed Zaid Nizami.
How Web Servers and the Internet Work by by: Marshall Brainby: Marshall Brain
13 June 2007Operations Workshop, Stockholm1 Hepix/WLCG System Management WG Alessandra Forti Operations Workshop 14 June 2007.
Easy Chair Online Conference Submission, Tracking and Distribution Process: Getting Started + Information for Reviewers AMS World Marketing Congress /
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
VOMS Alessandra Forti HEP Sysman meeting April 2005.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
0 eCPIC User Training: Resource Library These training materials are owned by the Federal Government. They can be used or modified only by FESCOM member.
Proposal for the new group web infrastructure SFT Group meeting 3/7/2009 Yves Perrin.
Two Rivers Chapter Website Navigating through …. Visit
Wiki Workshop Tech PD.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
Security Policy Update LCG GDB Prague, 4 Apr 2007 David Kelsey CCLRC/RAL
Training and Dissemination Enabling Grids for E-sciencE Jinny Chien, ASGC 1 Training and Dissemination Jinny Chien Academia Sinica Grid.
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Overview ROC_LA CERN
3-Jul-02D.P.Kelsey, Security1 Security meetings Report to EDG PTB 3 Jul 2002 David Kelsey CLRC/RAL, UK
GGUS at PEB – –- page 1 LCG Klaus-Peter Mickel, GridKa Karlsruhe LCG-PEB-Meeting ( ) The Global Grid User Support Model (Report of GDB.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
COORENOR COORENOR Web Portal COORENOR Agenda Where we are? (Summarize features of the COORENOR web portal.) Where are we going? (Show how to.
Portal Update Plan Ashok Adiga (512)
Online Submission and Management Information -- Authors AMS Annual Conference / AMS WMC Click on play to begin show.
Documentation (& User Support) Issues Stephen Burke RAL DB, Imperial, 12 th July 2007.
Andrew McNabGrid in 2002, Manchester HEP, 7 Jan 2003Slide 1 Grid Work in 2002 Andrew McNab High Energy Physics University of Manchester.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
EGEE is a project funded by the European Union under contract IST GLite Integration Infrastructure Integration Team JRA1.
Security Policy Update WLCG GDB CERN, 14 May 2008 David Kelsey STFC/RAL
Configuring and Deploying Web Applications Lesson 7.
HEPiX Fall 2009 Highlights Michel Jouvin LAL, Orsay November 10, 2009 GDB, CERN.
23 January 2007WLCG workshop, CERN System Management Working Group Alessandra Forti WLCG workshop CERN, 23 January 2007.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
INFSO-RI Enabling Grids for E-sciencE Fabric and Management WG Davide Salomoni NIKHEF Lyon, ARM-3 –
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
II EGEE conference Den Haag November, ROC-CIC status in Italy
HEPiX IPv6 Working Group David Kelsey david DOT kelsey AT stfc DOT ac DOT uk (STFC-RAL) HEPiX, Vancouver 26 Oct 2011.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
Online Submission and Management Information -- Authors
Understanding Web Server Programming
Quattor Usage at Nikhef
Tribal Stewardship Cohort Program
Presentation transcript:

23 May 2007Hep Sysman, RAL Hepix/WLCG System Management WG: an update Alessandra Forti Hep Sysman, RAL 23 May 2007

Hep Sysman, RAL Layout Background Mandate WEB site Wiki Repositories Group Conclusions

23 May 2007Hep Sysman, RAL Background Ian Bird at the Fall 2006 Hepix and at the WLCG Management board – essionId=8&materialId=slides&confId=384https://indico.fnal.gov/materialDisplay.py?contribId=34&s essionId=8&materialId=slides&confId=384 – ;sessionId=s0&materialId=slides&confId=a063271http://indico.cern.ch/materialDisplay.py?contribId=s0t14&amp ;sessionId=s0&materialId=slides&confId=a groups have been created to set up a comprehensive monitoring framework to improve the robustness of grid sites. –System Management WG: system management and fabric monitoring tools and cookbook –Grid Services Monitoring WG: middleware monitoring and monitoring framework. –System Analysis WG: Monitoring from the application side

23 May 2007Hep Sysman, RAL Mandate: Intro One of the problems observed (by EGEE and LCG) in providing a reliable grid service is the reliability of the local fabric services of participating sites. The SMWG should bring together the existing expertise in different area of fabric management to build a common repository of tools and knowledge for the benefit of HEP system managers community. The idea is not to present all possible tools nor to create new ones, but to recommend specific tools for specific problems according to the best practices already in use at sites. Although this group is proposed in order to help improve grid sites reliability, the results should be useful to any site running similar local services. Two areas should be improved by the group: tools and documentation.

23 May 2007Hep Sysman, RAL Mandate: Goals Improve overall level of grid site reliability, focusing on improving system management practices, sharing expertise, experience and tools Provide a repository –Management tools –Fabric monitoring sensors –HOWTOs Provide site manager input to requirements on grid monitoring and management tools Propose existing tools to the grid monitoring working group as solutions to general problems Produce a Grid Site Fabric Management cook-book –Recommend basic tools to cover essential practices, including security management –Discover what are common problems for sites and document how experienced sites solve them –Document collation of best practices for grid sites Point out holes in existing documentation sets Identify training needs –To be addressed in a workshop or by EGEE for example?

23 May 2007Hep Sysman, RAL Preliminary list of areas and tools System Management Areas –Filesystems: ext(2,3), XFS, NFS, AFS, dcache, DPM –Networking: Interfaces, IPs, Routers, Gateways, NAT –Databases: mysql, Oracle, ldap, gdbm –Processes: system, users monitoring –Servers: http, dhcp, dns, ldap, sendmail or other, sshd, (grid)ftp rfio –Batch systems: LSF, Torque, Maui, BQS, Sun Grid Engine, Condor –Security: login access pool accounts, certificates management and monitoring, non required services, ports list backups, monitoring(file systems, processes, networking), log files (grid services included) –……… Common Fabric Monitoring and Management Tools –Monitoring: Ganglia, Nagios, Ntop, Home grown, SAM, GridICE, Lemon –Management: Cfengine, Npaci rocks, Kickstart, Quattor –Security: iptables, rootkit, tripwire, nmap, ndiff, tcpdump, syslog, yummit –Grid Configuration: Yaim, Quattor

23 May 2007Hep Sysman, RAL Mandate: Interaction with GSWG Some of the areas covered by this group overlap with the Grid Services Monitoring Working Group ones particularly the local fabric monitoring area. The two groups are required to work in close contact and boundaries and division of responsibility should be discussed between the groups. The SMWG should act as a bridge between the system managers and the developers in the GSMWG giving feedback for what concerns monitoring tools and sensors used. It is important that work is not duplicated.

23 May 2007Hep Sysman, RAL WEB site WEB site has been setup in Manchester – Its based on GridSite – allows ACLs control based on x509 certificates The WEB site hosts –Wiki (Cookbook requested in the mandate) –Subversion repositories (sharing scripts)

23 May 2007Hep Sysman, RAL Subversion Repositories Integrated with GridSite –Read access is allowed to anyone –Write access based on certificates no need to create accounts but need to be added to the ACLs –Different repositories have different ACLs Fabric-management (SMWG) Fabric-monitoring (SMWG) Grid-monitoring (GSWG) –not created yet, but in the pipeline

23 May 2007Hep Sysman, RAL Subversion repositories (2) The tools should be management scripts or monitoring sensors written by sys admins to solve a local problem –However they should be generic enough to work at other sites Each script should have a banner containing the following information –Description –Author –Institute –Creation date –License –Repository version number Scripts not necessarily committed by the author –Always with their permission and license they want to use. There are currently 9 scripts in the repositories –We need more!

23 May 2007Hep Sysman, RAL Wiki It is also integrated with GridSite –Accounts based on DN rather than user name and password. Simple rules to edit the wiki: –Each article should belong at least to one category to facilitate navigation and identification of the problem. –If the article contains a link to a script in the repositories it should belong to the category scripts –Each article or portion of article should bear the name and institute of the source if it is not the same as the page author. For example if the text is extracted from a received .

23 May 2007Hep Sysman, RAL Wiki (2) Structure of categories is hierarchical with four top categories –Fabric management –Fabric monitoring –Best Practices (mostly basic and grid security) –Scripts to help navigate the repositories Subcategories are normally associated with a tool or one of the areas listed in a previous slide and then there are the articles. –Fabric Management (category) -> Cfengine (subcategory) -> Getting_started (article)

23 May 2007Hep Sysman, RAL Wiki(3) If good documentation is available some where else put just a pointer to the existing documentation. –Apply the minimum effort philosophy. For example Quattor page just point to the Quattor working group site. –But if someone wants to add an article with its own experience can do it. Editing is currently done by me in a non systematic way. –Mostly assign articles to categories However we used a wiki rather than writing a static document to avoid editing issues –Everyone should feel free to help writing an article or edit a stub.

23 May 2007Hep Sysman, RAL SMWG Group Chairs: –Alessandra Forti (University of Manchester) –Michel Jouvin (LAL) Mailing list: –26 subscribers Meetings every fortnight the details are here: – –Mainly to give updates about what people have done in the two weeks.

23 May 2007Hep Sysman, RAL SMWG Group (2) All the work is based on people volunteering to share –There are no dedicated people So there is no definition of group –Some people have only subscribed the mailing list –Some have subscribed the mailing list, participated to the meetings and done some work –Some people have acted as consultants and accepted their scripts to be distributed but are not on the mailing list nor come to the meetings –Some people have actually started editing the wiki with some stubs without even being in contact with any member of the group (i.e. mailing list subscribers) Its a start but it is not easy make people volunteer –Need to publicize more so people know the site is there and they can contribute in their own time Distribution of articles to other lists up to now

23 May 2007Hep Sysman, RAL Conclusions There is a mandate There is a wiki There are repositories There is a group We need only people to contribute Questions?