Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
The New Data Pump Caleb Small Next generation Import / Export New features Better performance Improved security Versatile interfaces.
Advantage Data Dictionary. agenda Creating and Managing Data Dictionaries –Tables, Indexes, Fields, and Triggers –Defining Referential Integrity –Defining.
FlareCo Ltd ALTER DATABASE AdventureWorks SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS Slide 1.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Maintaining and Updating Windows Server 2008
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
MSF Testing Introduction Functional Testing Performance Testing.
IS 4510 – Database Administration Module – 2 Database Backup 10/24/20141Compiled by: Zafar Iqbal Khan.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
19 February CASTOR Monitoring developments Theodoros Rekatsinas, Witek Pokorski, Dennis Waldron, Dirk Duellmann,
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
Presented by: Shane Kullman VMware / Microsoft Consultant TIES 1667 Snelling Avenue North Saint Paul, Minnesota Office: Fax:
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Introduction and simple using of Oracle Logistics Information System Yaxian Yao
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
1 Operating System Overview Chapter 2 Advanced Operating System.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Chapter 3.  Help you understand different types of servers commonly found on a network including: ◦ File Server ◦ Application Server ◦ Mail Server ◦
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Module 7: Fundamentals of Administering Windows Server 2008.
Goodbye rows and tables, hello documents and collections.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
DB Installation and Care Carmine Cioffi Database Administrator and Developer ICAT Developer Workshop, The Cosener's House August
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
CERN - IT Department CH-1211 Genève 23 Switzerland t DB Development Tools Benthic SQL Developer Application Express WLCG Service Reliability.
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
What’s new in Kentico CMS 5.0 Michal Neuwirth Product Manager Kentico Software.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
A Brief Documentation.  Provides basic information about connection, server, and client.
ESRI User Conference 2004 ArcSDE. Some Nuggets Setup Performance Distribution Geodatabase History.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Over view  Why Oracle Forensic  California Breach security Act  Oracle Logical Structure  Oracle System Change Number  Oracle Data Block Structure.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Lemon Tutorial Sensor How-To Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
ConTZole Tomáš Kubeš, 2010 atlas-tz-monitoring.cern.ch An Interactive ATLAS Tier-0 Monitoring.
M2OProxy Details Andy Salnikov Monitoring in M2OProxy2 Monitoring What is monitored Requests Requests client ID (host IP address/port)
Maintaining and Updating Windows Server 2008 Lesson 8.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
9 Copyright © 2004, Oracle. All rights reserved. Getting Started with Oracle Migration Workbench.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
INFSO-RI Enabling Grids for E-sciencE Running reliable services: the LFC at CERN Sophie Lemaitre
Planning a Migration.
Fundamental of Databases
Jean-Philippe Baud, IT-GD, CERN November 2007
Understanding and Improving Server Performance
Giuseppe Lo Re Workshop Storage INFN 20/03/2006 – CNAF (Bologna)
Miroslav Siket, Dennis Waldron
Maximum Availability Architecture Enterprise Technology Centre.
CERN-Russia Collaboration in CASTOR Development
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Solving ETL Bottlenecks with SSIS Scale Out
Operating Systems.
Cloud computing mechanisms
AWS Cloud Computing Masaki.
DRC Central Office Services
Graduation Project #1 University Internet Student Registration System
Presentation transcript:

Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT

Dennis Waldron (IT/FIO/FD) 2 Outline  What is DLF?  Why was it changed?  New features  Web Interface  Deployment Strategy  Questions

Dennis Waldron (IT/FIO/FD) 3 What is DLF?  Objective:  Stands for Distributed Logging Facility  A framework designed to centrally log messages and accounting information from Castor2 related services / facilities.  3 major components: DLF server – a collector daemon to receive messages Client API – C/C++ API to allow clients to write to the DLF server Web Interface – for graphical interruption and visualisation of the stored data. Essentially its like a syslog server with a relational database backend (e.g. syslog-ng) but specifically designed fur much higher data collection rates and handling of castor specific data points such as request ids, file ids etc…

Dennis Waldron (IT/FIO/FD) 4 Why was it changed?  Data Management  The removal of old/obsolete/expired data was extremely inefficient ( DELETE FROM X WHERE time > X days )  No archiving strategy is available. Once the data is dropped it is lost!  Scalability  Performance tests showed that DLF would not be able scale to the required message load.  (Increased load meant decreased Castor2 performance)  Robustness  Server: to improve the way in which the server responds to database related problems.  Client: synchronous API calls meant that a problem with DLF’s performance (e.g. slow insertion times) was felt/impacted the whole system as clients waited for DLF message to be acknowledged.  Recoverability  To improve the DLF frameworks ability to deal with scheduled or unscheduled service disruptions e.g network, database, etc...

Dennis Waldron (IT/FIO/FD) 5 Whats New? (I)  Asynchronous API  Messages to be written to DLF are now done with asynchronous calls. (clients no longer wait for acknowledgement)  Multi-threaded.  A client side cache (default: 5000 messages) allows for small disruptions on the DLF server to occur without message loss*  Server Memory Cache  Similar to the clients cache but much larger (default: 250,000 messages)  Core principle of the server is “to acknowledge messages with the smallest overhead possible”.  Stores the messages in memory pending database insertion.  Internal Monitoring Statistics  The server produces internal statistics about its performance collected every 5 minutes.  Can be useful for tracing problems. SELECT * FROM DLF_MONITORING ORDER BY TIMESTAMP DESC; * Assumes the process hasn’t died

Dennis Waldron (IT/FIO/FD) 6 Whats New? (II) ORACLE ONLY  Partitioning  Messages are now partitioned by time and sub-partitioned by facility.  1 partition exists for each day.  Partitions can be archived and dropped in minutes!  Partition creation is automated by PLSQL procedure (01:00 AM).  Archiving  Uses Oracle Data Pump (10g feature)  Partitions older then 30 days (adjustable) are ‘dumped’ to the file system, removed from the database and stored in Castor.  Re-importation is manual, no GUI!  Archiving done automatically by PLSQL procedure (02:00 AM).

Dennis Waldron (IT/FIO/FD) 7 Whats New? (III) ORACLE ONLY  Request and Job Statistics  Information collected by DLF is processed every 5 minutes.  Again PLSQL procedures.  Reports min time, max time, avg time, latency of all exiting requests and jobs SELECT * FROM [DLF_REQSTATS | DLF_JOBSTATS] ORDER BY TIMESTAMP DESC;  Increased Performance  Oracle Bulk Insertions are used to record many 1,000’s of messages in one go.  DLF can easily record 2,500 messages per second (3.75 million rows of data every 5 minutes) for brief periods!!

Dennis Waldron (IT/FIO/FD) 8 Statistics  C2ALICE (Dual Intel Xeon 2.66 GHz, 2048 MB)  13 days worth of information (20 partitions)  ~ 42 million messages collected in total (~190 million rows)  On average 3.2 million messages per day, 14.8 million inserts.  The database server can be unavailable for approx 1hour 40mins without data loss.  A days worth of data ~ 2.7GB is archived in under 2 minutes and dropped in ~ 6 seconds

Dennis Waldron (IT/FIO/FD) 9 Web Interface  Supports multiple Castor2 instances  Gives users the ability to view information not only from the DLF database but also some parts of the stager database.  More user friendly interface.  Improved SQL query performance.  A page with 200 rows takes only 5 SQL statements to generate (not 1000’s)  Not backwards compatible with older versions of DLF!

Dennis Waldron (IT/FIO/FD) 10 Deployment Strategy (I) (ORACLE)  Upgrade DLF server first!  New facilities cannot write to the old server!  Database checks:  Two tablespaces: DLF_INDX, DLF_DATA  Permissions: GRANT EXECUTE ON DBMS_JOB TO castor_dlf; GRANT CREATE JOB TO castor_dlf; CREATE DIRECTORY DLF_DATAPUMP_DIR AS ' '; GRANT READ, WRITE ON DIRECTORY DLF_DATAPUMP_DIR TO castor_dlf;  Prepare database schema:  Stop dlfserver ‘server dlfserver stop’  Drop old schema using OLD dlf_drop script  Create the new schema using NEW dlf_oracle_create.sql This may take some minutes as it pre creates 7 days worth of partitions.

Dennis Waldron (IT/FIO/FD) 11 Deployment Strategy (I) (ORACLE)  Upgrade castor-dlf-server software.  cp /etc/sysconfig/dlfserver.example /etc/sysconfig/dlfserver  service dlfserver start  Verify the server is running and ok  Look at (/var/spool/dlf/log)  “resuming normal operations” = OK  Upgrade the rest of Castor2!!!!!

Questions ?