Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 MonALISA Team Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Ciprian Dobre, Alexandru Costan MonALISA capabilities for the LHCOPN LHCOPN.

Similar presentations


Presentation on theme: "1 MonALISA Team Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Ciprian Dobre, Alexandru Costan MonALISA capabilities for the LHCOPN LHCOPN."— Presentation transcript:

1 1 MonALISA Team Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Ciprian Dobre, Alexandru Costan MonALISA capabilities for the LHCOPN LHCOPN meeting March 2010 London USLHCNet Team Harvey Newman, Artur Barczyk, Ramiro Voicu, Azher Mughal, Sandor Rozsa

2 2 Outline  MonALISA Framework  Architecture  Data handling  Automatic actions  USLHCNet  Network topology  Monitoring modules  Reliable monitoring & accounting  Alarms & triggers  Conclusions 2 Ramiro Voicu LHCOPN London March 2010

3 The MonALISA Architecture 3 Regional or Global High Level Services, Repositories & Clients Secure and reliable communication Dynamic load balancing Scalability & Replication AAA for Clients Distributed Dynamic Registration and Discovery- based on a lease mechanism and remote events JINI-Lookup Services Secure & Public MonALISA services Proxies HL services Agents Network of Distributed System for gathering and analyzing information based on mobile agents: Customized aggregation, Triggers, Actions Fully Distributed System with no Single Point of Failure 3 Ramiro Voicu LHCOPN London March 2010

4 MonALISA Service & Data Handling 4 Data Store Data Cache Service & DB Configuration Control (SSL) Predicates & Agents Data (via ML Proxy) Applications Clients or Higher Level Services WS Clients and service Web Service WSDL SOAP Lookup Service Lookup Service Registration Discovery Postgres AGENTS FILTERS / TRIGGERS Monitoring Modules Collects any type of information Dynamic (Re)Loading Push and Pull 4 Ramiro Voicu LHCOPN London March 2010

5 Two levels of decisions: local (autonomous), global (correlations). Actions triggered by: values above/below given thresholds, absence/presence of values, correlations between any values. Action types: alerts (emails/instant msg/atom feeds), running an external command, automatic charts annotations in the repository, running custom code, like securely ordering a ML service to (re)start a site service. ML Service Actions based on global information Actions based on local information Traffic Jobs Hosts Apps Temperature Humidity A/C Power … Sensors Local decisions Global decisions Local and Global Decision Framework Global ML Services 5 Ramiro Voicu LHCOPN London March 2010

6 USLHCNet  USLHCNet provides transatlantic connections of the Tier1 computing facilities at Fermilab and Brookhaven with the Tier0 and Tier1 facilities at CERN as well as Tier1s elsewhere in Europe and Asia.  Together with ESnet, Internet2 and the GEANT, USLHCNet supports connections between the Tier2 centers.  The USLHCNet core infrastructure is using the Ciena Core Director devices that provide time-division multiplexing and packet-forwarding protocols that support virtual circuits with bandwidth guarantees. The virtual circuits offer the functionality to develop efficient data transfer services with support for QoS and priorities.  Hybrid network: uses both Ciena CD and Force10 routers  6 transatlantic 10G links at the moment 6 Ramiro Voicu LHCOPN London March 2010

7 USLHCnet ML weather map 7 Ramiro Voicu LHCOPN London March 2010

8 Monitoring modules We developed a set of monitoring modules for USLHCNet network devices:  Force10 (SNMP & sFlow)  Traffic per interface  sFlow traffic  Link status monitoring  Ciena Core Director (TL1 – Transaction Language1)  ETTP (Ethernet Termination Point) traffic  EFLOW (Ethernet Flow) traffic  OSRP (routing protocol) topology  VCG Provisioned / Available Bandwidth  Dynamic circuits inside the optical core of the network  Ping module/MLPing trigger which sends alarms in case of packet loss 8 Ramiro Voicu LHCOPN London March 2010

9 USLHCnet monitoring MonALISA @GVA MonALISA @CHI MonALISA @NYC MonALISA @AMS SNMP TL1 SNMP 9 Ramiro Voicu LHCOPN London March 2010

10 USLHCnet redundant monitoring MonALISA @GVA MonALISA @CHI MonALISA @NYC MonALISA @AMS Each Circuit is monitored at both ends by at least two MonALISA services; the monitored data is aggregated by global filters in the repository 10 Ramiro Voicu LHCOPN London March 2010

11 Local and global filters  Based on the MonALISA actions framework a set of triggers have been deployed inside the service to notify by email, SMS and IM the USLHCNet network engineers in case of problems  The filters developed for USLHCNet repository aggregate the redundant monitoring data (traffic and link status) collected from all the MonALISA services  The link status is computed as a logical “AND” between both end points of a link. This also cross checks the status reported by the hardware equipment.  We collect data in two repository instances, each with replicated database back-ends. These instances are dynamically balanced in DNS. 11 Ramiro Voicu LHCOPN London March 2010

12 USLHCnet: Precise measurements for the Operational Status on the WAN Link  Operations & management assisted by agent-based software  Used on the new CIENA equipment used for network managment 12 Ramiro Voicu LHCOPN London March 2010

13 USLHCnet: ALL EFLOW traffic - last 2 months 13 Ramiro Voicu LHCOPN London March 2010

14 USLHCnet: Accounting for Integrated Traffic 14 Ramiro Voicu LHCOPN London March 2010

15 USLHCnet: Ciena alarms monitoring 15 Ramiro Voicu LHCOPN London March 2010

16 16 Ramiro Voicu LHCOPN London March 2010 Topology monitoring and discovery NETWORKS AS ROUTERS Real Time Topology Discovery & Display

17 Storage discovery in Alice 17 Ramiro Voicu LHCOPN London March 2010 France Italy USA Russia Nordic Countries distance(IP, IP) Same IP-class network Common domain name Same AS Same country (+ function of RTT between the respective AS-es if known) If distance between the AS-es is known, use it Same continent Far away distance(IP, Set ): Client's public IP to all known IPs for the storage C. Grigoras (Alice) – ACAT 2010

18 18 Ramiro Voicu LHCOPN London March 2010 FDT Bandwidth tests in Alice (E2E av bw) Newer kernel Tuned TCP Buffers Default kernels Default TCP Buffers Different trends = different kernels 100 Mbps network card 1 Gbps network card http://monalisa.cern.ch/FDT/

19 Conclusions  The MonALISA framework provides a flexible and reliable monitoring infrastructure  350+ installed services, 1.5M+ unique parameters, 25kHz value updates  Truly distributed architecture with no single points of failure  Highly modular platform  Automatic decision taking capability at both local and global levels  USLHCNet provides a hybrid network with support for circuit oriented network services  Monitoring this infrastructure proved to be a challenging task, but we are running with 99.5+% monitoring uptime (100% in the last 6 months)  We are investigating dynamic provisioning of circuits from collaborating agents http://monalisa.caltech.edu http://repository.uslhcnet.org 19 Ramiro Voicu LHCOPN London March 2010

20 Dynamic restoration of lightpath if a segment has problems Monitoring Optical Switches 20 Ramiro Voicu LHCOPN London March 2010

21 CERN Geneva CALTECH Pasadena Starlight Manlan USLHCnet Internet2 Controlling Optical Planes Automatic Path Recovery “Fiber cut” simulations The traffic moves from one transatlantic line to the other one FDT transfer (CERN – CALTECH) continues uninterrupted TCP fully recovers in ~ 20s 1 2 3 4 FDT Transfer 4 Fiber cuts simulations 200+ MBytes/sec From a 1U Node 4 fiber cut emulations 21 Ramiro Voicu LHCOPN London March 2010


Download ppt "1 MonALISA Team Iosif Legrand, Harvey Newman, Ramiro Voicu, Costin Grigoras, Ciprian Dobre, Alexandru Costan MonALISA capabilities for the LHCOPN LHCOPN."

Similar presentations


Ads by Google