D0 Run IIb Review 15-Jul-2004 Run IIb DAQ / Online status Stu Fuess Fermilab.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
Deploying GMP Applications Scott Fry, Director of Professional Services.
Intel® Manager for Lustre* Lustre Installation & Configuration
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Deployment Options Frank Bergmann
COEN 180 NAS / SAN. NAS Network Attached Storage (NAS) Each storage device has its own network interface. Filers: storage device that interfaces at the.
Welcome Course 20410B Module 0: Introduction Audience
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
ProjectWise Virtualization Kevin Boland. What is Virtualization? Virtualization is a technique for deploying technologies. Virtualization creates a level.
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
Tier 3g Infrastructure Doug Benjamin Duke University.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
7/26/01 Online MeetingStu Fuess Online Hardware News  d0olc u Recent hardware problems s Low level of errors in 1 of 4 CPUs, so wanted to get it replaced.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
1 SAM & OLAR. 2 Upon completion of this module, you will be able to: List significant changes to system administration tasks in HP-UX 11i Determine system.
Chapter 18: Windows Server 2008 R2 and Active Directory Backup and Maintenance BAI617.
Module 9: Configuring Storage
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
© 2005 Mt Xia Technical Consulting Group - All Rights Reserved. HACMP – High Availability Introduction Presentation November, 2005.
Paul Scherrer Institut 5232 Villigen PSI HEPIX_AMST / / BJ95 PAUL SCHERRER INSTITUT THE PAUL SCHERRER INSTITUTE Swiss Light Source (SLS) Particle accelerator.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
Sandor Acs 05/07/
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
2-3 April 2001HEPSYSMAN Oxford Particle Physics Site Report Pete Gronbech Systems Manager.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
VMware vSphere Configuration and Management v6
Queensland University of Technology CRICOS No J VMware as implemented by the ITS department, QUT Scott Brewster 7 December 2006.
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
DØ Online16-April-1999S. Fuess Online Computing Status DØ Collaboration Meeting 16-April-1999 Stu Fuess.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
June 17th, 2002Gustaaf Brooijmans - All Experimenter's Meeting 1 DØ DAQ Status June 17th, 2002 S. Snyder (BNL), D. Chapin, M. Clements, D. Cutts, S. Mattingly.
1 EIR Nov 4-8, 2002 DAQ and Online WBS 1.3 S. Fuess, Fermilab P. Slattery, U. of Rochester.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
DØ Online Workshop3-June-1999S. Fuess Online Computing Overview DØ Online Workshop 3-June-1999 Stu Fuess.
DoE Review January 1998 Online System WBS 1.5  One-page review  Accomplishments  System description  Progress  Status  Goals Outline Stu Fuess.
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
PIC port d’informació científica Luis Diaz (PIC) ‏ Databases services at PIC: review and plans.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Windows Certification Paths OR MCSA Windows Server 2012 Installing and Configuring Windows Server 2012 Exam (20410) Administering Windows Server.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Enterprise Vitrualization by Ernest de León. Brief Overview.
BaBar Transition: Computing/Monitoring
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Bentley Systems, Incorporated
Server Upgrade HA/DR Integration
Services DFS, DHCP, and WINS are cluster-aware.
High Availability 24 hours a day, 7 days a week, 365 days a year…
IT-DB Physics Services Planning for LHC start-up
Disaster Recovery Technical Infrastructure at George Mason University
Introduction to Networks
Presentation transcript:

D0 Run IIb Review 15-Jul-2004 Run IIb DAQ / Online status Stu Fuess Fermilab

D0 Run IIb Review 15-Jul-2004 Introduction  In order to meet the DAQ and Online computing requirements for Run IIb we plan: u Level 3 farm node increase –Brown, Univ. of Washington, Fermilab u Host system replacements / upgrades –Hardware: Fermilab, Software:various u Control system node upgrade –Fermilab  The requirements, plans, status, and future activities will be discussed

D0 Run IIb Review 15-Jul-2004 Level

D0 Run IIb Review 15-Jul-2004 Level 3 farm nodes  Need: greater L3 processing capabilities for higher luminosities Dual Nodes “GHz” plan to be removed existing existing to be added 332 GHz equiv CPUs now 659 GHz equiv CPUs for start of RunIIb For example: ms-GHz requires 500 GHz of CPUs For example: ms-GHz requires 500 GHz of CPUs

D0 Run IIb Review 15-Jul-2004 Level 3 farm nodes, cont’d.  Plan: Single purchase, summer 2005, of $210K* of nodes u 3 racks of 32 = 96 nodes plus infrastructure  Strategy: u This is an “off the shelf” purchase, but a major one u Similar to CompDiv farms purchases u Used a Run IIa purchase to refine the procedure: s 32 node addition s History –Req preparation begun 1/04/04 –Req submitted 1/29/04 –PO created 3/23/04 ($51.5K) –Prototype system delivery 4/21/04 –Full order delivery 6/21/04 –Operational in Level 3 on 6/23/04 5 month process ! Thanks to Computing Division for help! * Unburdened FY02 $

D0 Run IIb Review 15-Jul-2004 Level 3 farm nodes, cont’d.  Other preparations u Will replace 3 racks / 48 nodes of older processors with 3 racks / 96 nodes u Existing electrical circuits and cooling sufficient for new racks u Will need additional 48 network ports on Level 3 and Online switches  Impact u Installation somewhat disruptive, as will remove 3 racks (48) of older nodes to make room for these s Remaining 66 nodes operational during installation  Schedule u Plan for arrival of nodes at start of 2005 shutdown s Start purchase process ~ 3/05 u Continued replacement with upgraded nodes will be necessary over the duration of Run IIb (Operating funds)

D0 Run IIb Review 15-Jul-2004 Host systems 1.3.2

D0 Run IIb Review 15-Jul-2004 Host systems  Need u Replace 3-node Alpha cluster, which has the functions: s Event data logger, buffer disk, transfer to FCC s Oracle database s NFS file server s User database  Plan u Replace with Linux servers s Install a number (~4) of clusters which supply "services“ s Shared Fibre Channel (FC) storage and failover software to provide flexibility and high availability u $247K* for processor and storage upgrades * Unburdened FY02 $

D0 Run IIb Review 15-Jul-2004 DØ Online Linux clusters RAID Array Legacy RAID Array JBOD Array Legacy JBOD Array Fibre Channel Switch SAN Clients Network Switch DAQ Services Cluster File Server Cluster Online Services Cluster Database Cluster

D0 Run IIb Review 15-Jul-2004 Cluster Configuration Cluster Service Name Domain Check interval Script Cluster Member Name Power Controller ip Address Device Device special file Mount point File system Mount options NFS Export Export directory NFS Clients Client names / addresses Export options

D0 Run IIb Review 15-Jul-2004 Cluster Services Details of configuration of cluster services … Using experience of Run IIa in how things actually work! Details of configuration of cluster services … Using experience of Run IIa in how things actually work!

D0 Run IIb Review 15-Jul-2004 Host systems, cont’d.  System tests u Performed tests of Fibre Channel, network, storage rates s Network: capable of wire rate (1 Gb/sec) s Storage: Write (MB/s)Read (MB/s) Local disk JBOD1853 SW RAID FC disk JBOD5241 HW RAID HW RAID 1 HW RAID SW RAID Target is 25 MB/s for event path Target is 25 MB/s for event path

D0 Run IIb Review 15-Jul-2004 Host systems, cont’d.  System tests, cont’d. u Checked relative performance of dual vs quad processor systems s Conclude: dual processor nodes, at 20% the cost, are sufficient for all but possibly the highest I/O DAQ data logging nodes u Potential issues/concerns s Linux 2.4 kernel has problems with multiple high-rate buffered I/O streams; much better in 2.6 kernel; alleviated somewhat with use of direct I/O –Expect to see 2.6 next Spring/Summer in Fermi Linux –The design avoids this situation s Fibre Channel redundant paths somewhat complicated –Expect to use a “manual” solution, but is solvable ($$) with commercial Secure Path software

D0 Run IIb Review 15-Jul-2004 Host systems, cont’d.  Cluster implementation u Red Hat Cluster Suite s Available open source, distributed in Fermi Linux –But also a supported ($) Red Hat Application Suite product s No kernel modifications required –Can use non-homogeneous distributions s Can be made to work with non-homogeneous hardware –Use LVM as virtual storage layer  Cluster tests u Storage device access u NFS failover s File reads/writes transparently complete when active node turned off and service transitioned to backup node

D0 Run IIb Review 15-Jul-2004 Host systems, cont’d.  Status u A 2-node cluster has been created s Single-path FC SAN s Service failover demonstrated u 6 new servers delivered 6/21/04 s Will construct 4 clusters during summer/fall 2004 shutdown  Schedule u Fall 04 attempt to move everything! s DAQ, Oracle, NFS, etc s Need involvement of software system experts s Dual-path SAN still a challenge s DAB2 rack space juggling a challenge Disruptive (possibly a day or two)! Essential functions will have to be relocated and debugged u Summer 05 to enhance with best processors

D0 Run IIb Review 15-Jul-2004 Control system 1.3.3

D0 Run IIb Review 15-Jul-2004 Control System  Need: u The current control system processors (~100 of them) s are becoming obsolete and not maintainable –Lost 2 nodes, repaired 5 during Run IIa s are limiting functionality in some areas –Tracker readout crates are CPU limited  Plan: u Upgrade ~1/3 of the control system processors s either with latest generation of processors (PowerPC) which run current software (VxWorks), or transition to different architecture (eg Intel) with new OS (eg Linux) u Inclination is to just purchase appropriate number of the current processor family and minimize software changes  Strategy: u $140K* to upgrade processors u Scheme for replacement on next slide: * Unburdened FY02 $

D0 Run IIb Review 15-Jul-2004 Control System Processors Detector subsystem # of Processors Processor typesReplacement plan Control and Monitoring 18(11) 16MB PowerPC (6) 64MB PowerPC (1) 128MB PowerPC Replace, use old processors for HV or spares; need 12 additional for CAL High Voltage30(30) 16MB PowerPCRetain, with new and spare needs met from other replacements Muon~40(23) 4MB 68K (16) 128MB PowerPC OK as is, 16 in readout crates are recent replacements Tracker readout26(10) 32MB Power PC (11) 64MB PowerPC (5) 128MB PowerPC Replace, use old processors for HV or spares Test stands~13mixed low endUse available

D0 Run IIb Review 15-Jul-2004 Control System, cont’d.  Impact: u Potential short disruptions in control system functions as processors are replaced  Schedule: u Recently purchased latest PowerPC processor for testing s Testing EPICS and D0 controls software u Follow evolutionary developments of OS (VxWorks) and Control System Framework (EPICS) u Purchases in advance of Summer 05, then incremental installation of nodes

D0 Run IIb Review 15-Jul-2004 Conclusion  Three activities: u Level 3 u Host systems u Control system  Level 3 is an “addition of nodes”  Host system changes are most revolutionary u Attempting to perform upgrade this summer/fall u Improvements in functionality  Control system is a “replacement of nodes” u With evolutionary progress of VxWorks, EPICS software Expect nearly seamless transition, ready for IIb