Presentation is loading. Please wait.

Presentation is loading. Please wait.

January 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology January 2006 An Agent Based, Dynamic Service System to Monitor, Control and.

Similar presentations


Presentation on theme: "January 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology January 2006 An Agent Based, Dynamic Service System to Monitor, Control and."— Presentation transcript:

1 January 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology January 2006 An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems Control and Optimize Distributed Systems

2 January 2006 Iosif Legrand 2 The MonALISA Framework  MonALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems.  The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications.

3 January 2006 Iosif Legrand 3 MonALISA is A Dynamic, Distributed Service Architecture  The framework is based on a hierarchical structure of loosely coupled agents acting as distributed services which are independent & autonomous entities able to discover themselves and to cooperate using a dynamic set of proxies or self describing protocols.  An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence; to reduce complexity and make global systems manageable in real time. For an effective use of distributed resources, these services provide adaptability and self-organization.

4 January 2006 Iosif Legrand 4 The MonALISA Discovery System & Services Network of JINI-LUSs Secure & Public MonALISA services Proxies Clients, HL services repositories Distributed Dynamic Discovery- based on a lease Mechanism and REN Distributed System for gathering and Analyzing Information. Dynamic load balancing Scalability & Replication Security AAA for Clients Global Services or Clients Fully Distributed System with no Single Point of Failure AGENTS

5 January 2006 Iosif Legrand 5 The MonALISA Architecture Provides:  Distributed Registration and Discovery for Services and Applications.  Monitoring all aspects of complex systems :  System information for computer nodes and clusters  Network information : WAN and LAN  Monitoring the performance of Applications, Jobs or services  The End User Systems, its performance  Can interact with any other services to provide in near real-time customized information based on monitoring data  Secure, remote administration for services and applications  Agents to supervise applications, to restart or reconfigure them, and to notify other services when certain conditions are detected.  The MonALISA framework can be used to develop higher level decision services, implemented as a distributed network of communicating agents, to perform global optimization tasks.  Graphical User Interfaces to visualize complex information

6 January 2006 Iosif Legrand 6 Look up Servi ce MonALISA service & Data Handling Data Cache Service & DB Configuration Control (SSL) Look up Servi ce Data Stores WEB Service WSDL SOAP Client (other service) Java Discovery Registration Client (other service) Web client data Postgres MySQL Applicati ons User defined loadable Modules to write /sent data Predicates & Agents Communications via the ML Proxy MonALSIA Service

7 January 2006 Iosif Legrand 7 Look up Servi ce Registration / Discovery Admin Access and AAA for Clients MonALISA Service Look up Servi ce Client (other service) Discovery Registration (signed certificate) MonALISA Service MonALISA Service Services Proxy Multiplexer Services Proxy Multiplexer Client (other service) Admin SSL connection Trust keysto re AAA services Client authentication Data Data Filters & Agents Filters & Agents Trust keysto re Application Applications

8 January 2006 Iosif Legrand 8 Security in the MonALSIA System Secure LUSs Secure Registration SSL/TLS, PKIX, GSS-API 1) Community-based trust relationships. Multiple MonaLisa services may be operated by a community. The community memberships is maintained in specialized Authorization Services 2) Flexible communication protection 3) Secure registration in LUSs based on an X.509 host or site certificate 4) Auditing PROXY SERVICE NETWORK Authorization Enforcement Point

9 January 2006 Iosif Legrand 9 Major Communities:  OSG  CMS  ALICE  D0  STAR  VRVS  LGC RUSSIA  SE Europe GRID  APAC Grid  UNAM Grid  ABILENE  ULTRALIGHT  GLORIAD  LHC Net  RoEduNET Communities using MonALISA ABILENE VRVS - - GRID3 ALICE CMS-DC04 It has been used for Demonstrations at:  SC2003  Telecom 2003  WSIS 2003  SC 2004  I2 2005  TERENA 2005  IGrid 2005  SC 2005

10 January 2006 Iosif Legrand 10 Monitoring I2 Network Traffic, Grid03 Farms and Jobs

11 January 2006 Iosif Legrand 11 Monitoring OSG: Resources, Jobs & Accounting 42 SITES ~ 4 000 Nodes ( 10 000 CPUs) Thousands of Jobs Thousands of Jobs 60 000 parameters 60 000 parameters Running Jobs Accounting

12 January 2006 Iosif Legrand 12 FTP Data Transfer between GRID sites Total FTP Traffic per VO

13 January 2006 Iosif Legrand 13 ALICE Monitoring Job Monitoring Job Monitoring Traffic Monitoring Traffic Monitoring AliEn Services AliEn Services Xrootd Servers Xrootd Servers

14 January 2006 Iosif Legrand 14 Monitoring ABILENE backbone Network  Test for a Land Speed Record  ~ 7 Gb/s in a single TCP stream from Geneva to Caltech

15 January 2006 Iosif Legrand 15 The UltraLight Network BNL ESnet IN /OUT

16 January 2006 Iosif Legrand 16 Monitoring The GLORIAD Ring

17 January 2006 Iosif Legrand 17 Monitoring VRVS Reflectors and Communication Topology

18 January 2006 Iosif Legrand 18 Monitoring Network Topology Latency, Routers NETWORKS AS ROUTERS

19 January 2006 Iosif Legrand 19 Monitoring the Execution of Jobs and the Time Evolution SPLIT JOBS LIFELINES for JOBS JobJob JobJob Job1Job1 Job2Job2 Job3Job3 Job31Job31 Job32Job32 Summit a Job DAG

20 January 2006 Iosif Legrand 20 Bandwidth Challenge at SC2005 151 Gbs ~ 500 TB Total in 4h

21 January 2006 Iosif Legrand 21 ApMon – Application Monitoring MonA LISA Servi ce MonA LISA Servi ce APPLICATION MonitoringData UDP/XDR Mbps_out: 0.52 Status: reading App. Monitoring MB_inout: 562.4 ApMo n Confi g parameter1: value parameter2: value App. Monitoring... Time;IP;procID Monitoring Data UDP/XDR MonitoringData UDP/XDR load1: 0.24 processes: 97 System Monitoring pages_in: 83 MonALISA hosts Config Servlet Library of APIs (C, C++, Java, Perl. Python) that can be used to send any information to MonALISA services  Flexibility, dynamic configuration, high communication performance dynamic reloading ApMon configuration generated automatically by a servlet / CGI script  Automated system monitoring  Accounting information

22 January 2006 Iosif Legrand 22 MonA LISA Servi ce MonA LISA Servi ce APPLICATION MonitoringData UDP/XDR Mbps_out: 0.52 Status: reading App. Monitoring MB_inout: 562.4 ApMo n Confi g parameter1: value parameter2: value App. Monitoring... Time;IP;procID Monitoring Data UDP/XDR MonitoringData UDP/XDR load1: 0.24 processes: 97 System Monitoring pages_in: 83 MonALISA hosts Config Servlet dynamic reloading ApMon configuration generated automatically by a servlet / CGI script

23 January 2006 Iosif Legrand 23 Communication in the Distributed Collaborative System Communication in the Distributed Collaborative System vrvs us vrvs eu pub funet star- light sinica kek cor- nell triumf cal- tech usf usp inet 2 vrvs 5 Reflectors are hosts that interconnect users by permanent IP tunnels. The active IP tunnels must be selected so that there is no cycle formed. Tree The selection is made according to the real-time measurements of the network performance. minimum-spanning tree(MST) minimum-spanning tree (MST)

24 January 2006 Iosif Legrand 24 LISA- Localhost Information Service Agent End To End Monitoring Tool  It is very easy to deploy and install by simply using any browser.  It detects the system architecture, the operating system and selects dynamically the binary parts necessary on each system.  It can be easily deployed on any system. It is now used on all versions of Windows, Linux, Mac.  It provides complete system monitoring of the host computer:  CPU, memory, IO, disk, …  Hardware detection  Main components, Audio, Video equipment,  Drivers installed in the system  Provides embedded clients for IPERF (or other network monitoring tools, like Web 100 )  A user friendly GUI to present all the monitoring information. A lightweight Java Web Start application that provides complete monitoring of the end user systems, the network connectivity and can use the MonALISA framework to optimize client applications

25 January 2006 Iosif Legrand 25 LISA- Provides an Efficient Integration for Distributed Systems and Applications LISA Loo kup Ser vice Loo kup Serv ice Discovery Registration Best Service MonA LISA Applic ation Servic e MonA LISA Applic ation Servic e MonA LISA Applic ation Servic e MonA LISA Applic ation Servic e  It is using external services to identify the real IP of the end system, its network ID and AS  Discovers MonALISA services and can select, based on service attributes, different applications and their parameters (location, AS, functionality, load … )   Based on information such as AS number or location, it determines a list with the best possible services.   Registers as a listener for other service attributes (eg. number of connected clients).   Continuously monitors the network connection with several selected services and provides the best one to be used from the client’s perspective.   Measures network quality, detects faults and informs upper layer services to take appropriate decisions

26 January 2006 Iosif Legrand 26 Creating a Dynamic, Global, Minimum Spanning Tree to optimize the connectivity A weighted connected graph G = (V,E) with n vertices and m edges. The quality of connectivity between any two reflectors is measured every 2s. Building in near real time a minimum- spanning tree T

27 January 2006 Iosif Legrand 27 EVO: LISA Detects the Best Reflector for each Client and MonALISA Agents keep the reflectors connected in a MST  Dynamic Discovery of Reflectors  Creates and maintains, in real-time, the optimal connectivity between reflectors (MST) based on periodic network measurements.  Detects and monitor the User configuration, its hardware, the connectivity and its performance.  Dynamically connects the client to the best reflector  Provides secure administration.  It is using alarm triggers to notify unexpected events

28 January 2006 Iosif Legrand 28 Optical Switch Runs a ML Demon > ml_path IP1 IP4 “copy file IP4” ML proxy services used in Agent Communication ML Demon Control and Monitor the switch Optical Switch MonALISA ML Agent MonALISA ML Agent MonALISA ML Agent 2 1 3 Discovery & Secure Connection 4 MonALISA agents to create on demand on an optical path or tree Time to create a path on demand <1s independent of the location and the number of connections Time to create a path on demand <1s independent of the location and the number of connections

29 January 2006 Iosif Legrand 29 Monitoring and Controlling Optical Planes Port power monitoring Controlling

30 January 2006 Iosif Legrand 30 Monitoring Optical Switches Agents to Create on Demand an Optical Path

31 January 2006 Iosif Legrand 31 JobJob JobJob Job1Job1 Job2Job2 Job3Job3 Job31Job31 Job32Job32 User Farms & Data Serv. Networking Applications HELP to create Vertical Integration NEAR REAL TIME FEEDBACK BETWEEN MAJOR LAYERS IS CRUCIAL FOR DYNAMIC LOAD BALANCING, ADAPATABILITY AND SELF-ORGANIZATION Higher Level Services MonALISA is a framework capable to correlate information from different layers in complex systems and to build decision and global optimization services MonALISA is a framework capable to correlate information from different layers in complex systems and to build decision and global optimization services

32 January 2006 Iosif Legrand 32 SUMMARY MonALISA proved to be a stable and reliable distributed service system More than 250 Sites running MonALISA Is collecting ~ 250 000 parameters in near real-time with a rate of ~ 25 000 updated parameters per second ~ 12 000 computers ~ 100 WAN links ~ Thousands of Jobs running concurrently in GRID systems The Repository system:  keeps long time history and record information at a rate of ~ 10 mil records / hour  severed 1.4 mil web requests in last 3 months with a pick of 10 000 requests per hour. http://monalisa.caltech.edu

33 January 2006 Iosif Legrand 33 VINCI: Virtual Intelligent Networks for Computing Infrastructures An Integrated Network Services System to Control and Optimize Workflows in Distributed Systems

34 January 2006 Iosif Legrand 34 The Need for Network Services  The main objective of the VINCI project is to enable users’ applications, at the LHC and in other fields of data-intensive science, to effectively use and coordinate network resources  VINCI dynamically estimates and monitors the achievable performance along a set of candidate (shared or dedicated) network paths, and correlates these results with the CPU power and storage available at various sites, to generate optimized workflows for grid tasks  This will significantly improve the overall performance and reduce the effective costs of global-scale grids  The VINCI system is implemented as a dynamic set of collaborating Agents in the MonALISA framework, exploiting MonALISA’s ability to access and analyze in-depth monitoring information from a large number of network links and grid sites in real-time

35 January 2006 Iosif Legrand 35 A Multi-Agent System  VINCI and the underlying MonALISA framework use a system of autonomous agents to support a wide range of dynamic services  Agents in the MonALISA servers self-organize and collaborate with each other to manage access to distributed resources, to make effective decisions in planning workflow, to respond to problems that affect multiple sites, or to carry out other globally-distributed tasks  Agents running on end-users’ desktops or clusters detect and adapt to their local environment so they can function properly. They locate and receive real-time information from a variety of MonALISA services, aggregate and present results to users, or feed information to higher level services  Agents with built-in “intelligence” are required to engage in negotiations (for network resources, for example), and to make pro-active run-time decisions, while responding to changes in the environment

36 January 2006 Iosif Legrand 36 The Functionality of the VINCI System Level 3 Level 2 Level 1 Site ASite BSite C MonALISA ML Agent MonALISA ML Agent MonALISA ML Agent ML proxy services A g e nt ROUTER ETHERNET SONET WAN-PHY DWDM FIBER ~ 1-2 K$ /port

37 January 2006 Iosif Legrand 37 The Main VINCI Services Application End User Agent Service Discovery (LUS) Topology Discovery Application End User Agent GMPLSMPLS OS agents SNMP agents Scheduling ; Dynamic Path Allocation Network Monitorin g Network Monitorin g Control Agents; Path Provisioning Learning and Prediction Monitoring and Failure detection Authentication, Authorization, Accounting

38 January 2006 Iosif Legrand 38 Secure Service Discovery and AAA Service Registration and Discovery We use JINI Lookup Services to provide a reliable mechanism to dynamically register services, their dynamic sets of attributes is essential Authentication, Authorization and Accounting for Users We use external AAA services supported by different Virtual Organizations. Loadable plug-in interface modules to support different protocols and services will provide the necessary flexibility to work with different grids and networks

39 January 2006 Iosif Legrand 39 Topology Discovery Specialized agents will be used to discover the connection topology for each service, to keep a dynamic map of how they are allocated and used, as well as to get information on traffic for each segment. This is done on multiple MonALISA services in parallel and agents will provide the basic information to the scheduling system. It will use the information from MPLS/GMPLS /DRAGON / Optical Path agents where the infrastructure provide this functionality. Examples of targeted capabilities for Topology Discovery are:  Determine which options exist between two locations in the network  List components in the path that are “manageable”.  Given two replicas of a data source, “discover” (in conjunction with monitoring) the estimated bandwidth and reliability of each to a given destination.  Locate network resources and services which have agreements with a given VO

40 January 2006 Iosif Legrand 40 The Workflow Scheduler The Scheduler is implemented as a set of collaborating agents  There is no single point of failure  It provides complete autonomy to each provider of resources, who can implement his own policy “Market Model” Scheduling Scheme  Each agent uses policy-based priority queues; it negotiates for an end-to-end connection using a set of cost functions  A lease mechanism is implemented for each offer an agent accepts from its peers  Two phase commit and periodic lease renewal are used for all agents; this allows a flexible response of the agents to task completion, as well as to application failure or network errors  If network errors are detected, supervising agents cause all segments to be released along a path  An alternative path then may be set up: rapidly enough to avoid a TCP timeout, so that the transfer can continue uninterrupted

41 January 2006 Iosif Legrand 41 Learning and Prediction  Learning algorithms (like Self Organizing Neural Networks) will be used to evaluate the traffic created by other applications, to identify major patterns, and dynamically setup effective connectivity maps  It is very difficult if not impossible to assume that we could predict all possible events in a complex environment like a grid in advance  Heuristic learning is thus the only practical approach, where agents can acquire the necessary information to describe their environments  The multi-agent learning task includes two levels:  the local level of individual learning agents  the global level of inter-agent communication  We need to ensure that each agent’s can learn to optimize its actions locally, while the global monitoring mechanism acts as a ‘driving force’ that causes the agents’ behavior to evolve collectively, based on the accumulated experience


Download ppt "January 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology January 2006 An Agent Based, Dynamic Service System to Monitor, Control and."

Similar presentations


Ads by Google