October 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed.

Slides:



Advertisements
Similar presentations
Caltech Proprietary Videoconferencing Security in VRVS 3.0 and Future Videoconferencing Security in VRVS 3.0 and Future Kun Wei California Institute of.
Advertisements

May 2005 Iosif Legrand 1 Iosif Legrand California Institute of Technology May 2005 An Agent Based, Dynamic Service System to Monitor, Control and Optimize.
May 2005 Iosif Legrand 1 Iosif Legrand California Institute of Technology ICFA WORKSHOP Daegu, May 2005 Daegu, May 2005 An Agent Based, Dynamic Service.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
MONITORING WITH MONALISA Costin Grigoras. M ONITORING WITH M ON ALISA What is MonALISA ? MonALISA communication architecture Monitoring modules ApMon.
June 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
Monitoring and controlling VRVS Reflectors Catalin Cirstoiu 3/7/2003.
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Understanding and Managing WebSphere V5
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
September 2005 Iosif Legrand 1 End User Agents: extending the "intelligence" to the edge in Distributed Service Systems Iosif Legrand California Institute.
Users’ Authentication in the VRVS System David Collados California Institute of Technology November 20th, 2003TERENA - Authentication & Authorization.
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
ACAT 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Ramiro Voicu, Iosif Legrand, Harvey Newman, Artur Barczyk, Costin Grigoras, Ciprian Dobre, Alexandru Costan, Azher Mughal, Sandor Rozsa Monitoring and.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Monitoring, Accounting and Automated Decision Support for the ALICE Experiment Based on the MonALISA Framework.
February 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology February 2006 February 2006 An Agent Based, Dynamic Service System to Monitor,
1 VINCI : Virtual Intelligent Networks for Computing Infrastructures An Integrated Network Services System to Control and Optimize Workflows in Distributed.
1 Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Catalin Cirstoiu, Ciprian Dobre An Agent Based, Dynamic Service System to Monitor, Control.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
ALICE, ATLAS, CMS & LHCb joint workshop on
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
LISHEP 2004 Iosif Legrand Iosif Legrand California Institute of Technology DISTRIBUTED SERVICES.
ABone Architecture and Operation ABCd — ABone Control Daemon Server for remote EE management On-demand EE initiation and termination Automatic EE restart.
Overview of ALICE monitoring Catalin Cirstoiu, Pablo Saiz, Latchezar Betev 23/03/2007 System Analysis Working Group.
1 Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer.
Monitoring with MonALISA Costin Grigoras. What is MonALISA ?  Caltech project started in 2002
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 MonALISA Team Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Ciprian Dobre, Alexandru Costan MonALISA capabilities for the LHCOPN LHCOPN.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
MONITORING WITH MONALISA Costin Grigoras. M ON ALISA COMMUNICATION ARCHITECTURE MonALISA software components and the connections between them Data consumers.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
MONALISA MONITORING AND CONTROL Costin Grigoras. O UTLINE MonALISA services and clients Usage in ALICE Online SE discovery mechanism Data management 3.
January 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology January 2006 An Agent Based, Dynamic Service System to Monitor, Control and.
Storage discovery in AliEn
ECSAC- August 2009, Veli Losink California Institute of Technology
California Institute of Technology
ALICE Monitoring
Open Source distributed document DB for an enterprise
A Messaging Infrastructure for WLCG
Storage elements discovery
STATEL an easy way to transfer data
Presentation transcript:

October 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems Control and Optimize Distributed Systems ICFA Workshop Cracow, October 2006

October 2006 Iosif Legrand 2 The MonALISA Framework   MonALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems.   The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications.

October 2006 Iosif Legrand 3 Regional or Global High Level Services, Repositories & Clients Secure and reliable communication Dynamic load balancing Scalability & Replication AAA for Clients Distributed Dynamic Registration and Discovery- based on a lease mechanism and remote events JINI-Lookup Services Secure & Public MonALISA services Proxies HL services Agents The MonALISA Architecture Network of Distributed System for gathering and analyzing information based on mobile agents: Customized aggregation, Triggers, Actions Fully Distributed System with no Single Point of Failure

October 2006 Iosif Legrand 4 MonALISA service & Data Handling Data Store Data Cache Service & DB Configuration Control (SSL) Predicates & Agents Data (via ML Proxy) Applications Clients or Higher Level Services WS Clients and service Web Service WSDL SOAP Lookup Service Lookup Service Registration Discovery Postgres AGENTS FILTERS / TRIGGERS Monitoring Modules Collects any type of information Dynamic Loading Push and Pull

October 2006 Iosif Legrand 5 Monitoring Internet2 backbone Network u Test for a Land Speed Record u ~ 7 Gb/s in a single TCP stream from Geneva to Caltech

October 2006 Iosif Legrand 6 The UltraLight Network BNL ESnet IN /OUT

October 2006 Iosif Legrand 7 Monitoring Network Topology Latency, Routers NETWORKS AS ROUTERS

October 2006 Iosif Legrand 8 Monitoring Grid sites, Running Jobs, Network Traffic, and Connectivity TOPOLOGY JOBS ACCOUNTING

October 2006 Iosif Legrand 9 Monitoring OSG: Resources, Jobs & Accounting 58 SITES ~ 8000 Nodes ( CPUs) Thousands of Jobs Thousands of Jobs parameters parameters Running Jobs Accounting

October 2006 Iosif Legrand 10 CMS Aggregate Job Monitoring In-depth or abstracted high-level information, as needed

October 2006 Iosif Legrand 11 ALICE : Jobs & resource usage monitoring Cumulative parameters è CPU Time è Wall time è Input & output traffic (xrootd) è Read & written files Running parameters è Resident memory è Virtual memory è Open files è Workdir size è Disk usage è CPU usage Aggregated per site

October 2006 Iosif Legrand 12 Available Bandwidth Measurements Embedded Pathload module.

October 2006 Iosif Legrand 13 FTP Data Transfer between GRID sites Total FTP Traffic per VO

October 2006 Iosif Legrand 14 ApMon – Application Monitoring MonALISA Service MonALISA Service ApMon APPLICATION MonitoringD ata UDP/XDR Mbps_out: 0.52 Status: reading App. Monitoring MB_inout: ApMon Config parameter1: value parameter2: value App. Monitoring... Time;IP;procID MonitoringD ata UDP/XDR MonitoringD ata UDP/XDR load1: 0.24 processes: 97 System Monitoring pages_in: 83 MonALISA hosts Config Servlet dynamic reloading ApMon configuration generated automatically by a servlet / CGI script No Lost Packages Lightweight library of APIs (C, C++, Java, Perl, Python) that can be used to send any information to MonALISA Services High comm. performance FlexibleAccounting Sys Mon

October 2006 Iosif Legrand 15 Monitoring the Execution of Jobs and the Time Evolution SPLIT JOBS LIFELINES for JOBS Job Job1 Job2 Job3 Job 31 Job 32 Summit a Job DAG

October 2006 Iosif Legrand 16 Long History DB The Alien Monitoring Architecture LCG Tools ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent MonALISA LCG Site ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn TQ ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn IS ApMon AliEn Optimizers ApMon AliEn Brokers ApMon MySQL Servers ApMon CastorGrid Scripts ApMon API Services MonaLisaRepository Aggregated Data rss vsz cpu time run time job slots free space nr. of files open files Queued JobAgents cpu ksi2k job status disk used processes load net In/out jobs status sockets migrated mbytes active sessions MyProxy status

October 2006 Iosif Legrand 17 Operational Decisions and Actions  Based on monitoring information, actions can be taken in  ML Service  ML Repository  Actions can be triggered by  Values above/below given thresholds  Absence/presence of values  Correlation between multiple values  Operational actions  Alerts   Instant messaging  Supervision for Services  External commands  Event logging GlobalService ML Service Actions based on global information Actions based on local information Traffic Jobs Hosts Apps Networking Temperature Humidity A/C Power … Sensors Local decisions Global decisions ML Service

October 2006 Iosif Legrand 18 ALICE Examples MySQL daemon is automatically restarted when it runs out of memory Trigger: threshold on VSZ memory usage ALICE Production jobs queue is automatically kept full by the automatic resubmission Trigger: threshold on the number of aliprod waiting jobs Administrators are kept up-to-date on the services’ status Trigger: presence/absence of monitored information

October 2006 Iosif Legrand 19 Monitoring Video Conference System: Reflectors and Communication Topology

October 2006 Iosif Legrand 20 Monitoring and Controlling Optical Planes Port power monitoring Controlling

October 2006 Iosif Legrand 21 Monitoring Optical Switches Agents to Create on Demand an Optical Path

October 2006 Iosif Legrand 22 “On-Demand”, Dynamic Path Allocation Internet A >bbcopy A/fileX B/path/ OS path available Configuring interfaces Starting Data Transfer Monitor Control TL1 Optical Switch MonALISA Service MonALISA Distributed Service System B OS Agent Active light path Regular IP path Real time monitoring APPLICATION LISA AGENTLISA sets up - Network Interfaces - TCP stack - Kernel parameters - RoutesLISA  APPLICATION “use eth1.2, …” LISA Agent Agent DATA CREATES AN END TO END PATH < 1s Detects errors and automatically recreate the path in less than the TCP timeout path in less than the TCP timeout

October 2006 Iosif Legrand 23 Major Communities   OSG   CMS   ALICE   D0   STAR   VRVS   LGC RUSSIA   SE Europe GRID   APAC Grid   UNAM Grid (Mx)   ITU   ABILENE   ULTRALIGHT   GLORIAD   LHC Net   RoEduNET   Enlightened Communities using MonALISA - - VRVS ALICE ABILENE VRVS OSG Demonstrated at:   Telecom World   WSIS 2003   SC 2004   Internet   TERENA 2005   IGrid 2005   SC 2005   CHEP 2006   CENIC 2006 Innovation Award for High- Performance Applications MonALISA Today Running 24 X 7 at ~300 Sites   Collecting > 600,000 parameters in near real-time   Update rate of 20,000 parameter updates per second   Monitoring   12,000 computers   > 100 WAN Links   Thousands of Grid jobs running concurrently

October 2006 Iosif Legrand 24 The MonALISA Architecture Provides:  Distributed Registration and Discovery for Services and Applications.  Monitoring all aspects of complex systems :  System information for computer nodes and clusters  Network information : WAN and LAN  Monitoring the performance of Applications, Jobs or services  The End User Systems, its performance  Environment; Video streaming  Can interact with any other services to provide in near real-time customized information based on monitoring data  Secure, remote administration for services and applications  Agents to supervise applications, trigger alarms, restart or reconfigure them, and to notify other services when certain conditions are detected.  The MonALISA framework is used to develop higher level decision services, implemented as a distributed network of communicating agents, to perform global optimization tasks.  Graphical User Interfaces to visualize complex information