Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.

Slides:



Advertisements
Similar presentations
All rights reserved © 2006, Alcatel Grid Standardization & ETSI (May 2006) B. Berde, Alcatel R & I.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Update on OSG/WLCG perfSONAR infrastructure Shawn McKee, Marian Babik HEPIX Spring Workshop, Oxford 23 rd - 27 th March 2015.
WLCG Cloud Traceability Working Group progress Ian Collier Pre-GDB Amsterdam 10th March 2015.
PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014.
Proximity service Main idea – provide “glue” between experiments and sonar topology – mainly map sonars to storages and vice versa – determine existing.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Assessment of Core Services provided to USLHC by OSG.
Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Use Cases. Summary Define and understand slow transfers – Identify weak links, narrow down the source – Understand what perfSONAR measurements mean wrt.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Network Monitoring for OSG Shawn McKee/University of Michigan OSG Staff Planning Retreat July 10 th, 2012 July 10 th, 2012.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Plug-In B17 Organizational Architecture Trends.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015.
Challenging CMS Computing with network-aware systems CHEP 2013T.Wildish / Princeton1 Challenging data and workload management in CMS Computing with network-aware.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik perfSONAR Operations Sub-group 22 nd October 2014.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
© 2009 Cisco Systems, Inc. All rights reserved. ROUTE v1.0—1-1 Planning Routing Services Creating an Implementation Plan and Documenting the Implementation.
Update on OSG/WLCG Network Services Shawn McKee, Marian Babik 2015 WLCG Collaboration Workshop 12 th April 2015.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Update on WLCG/OSG perfSONAR Infrastructure Shawn McKee, Marian Babik HEPiX Fall 2015 Meeting at BNL 13 October 2015.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 18 h March 2015.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WLCG perfSONAR-PS Update Shawn McKee/University of Michigan WLCG Network and Transfers Metrics Co-Chair Spring 2014 HEPiX LAPP, Annecy, France May 21 st,
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
WLCG Network and Transfer Metrics WG After One Year Shawn McKee, Marian Babik GDB 4 th November
ANSE: Advanced Network Services for Experiments Shawn McKee / Univ. of Michigan BigPANDA Meeting / UTA September 4, 2013.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
Update on Network and Transfer Metrics WG Shawn McKee, Marian Babik GDB 8 th October 2014.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.
Julia Andreeva on behalf of the MND section MND review.
PerfSONAR for LHCOPN/LHCONE Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Amsterdam, NL October 28 th, 2015.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Network Awareness and perfSONAR Why we want it. What are the challenges? Where are we going? Shawn McKee / University of Michigan OSG AHG - US CMS Tier-2.
WLCG Latency Mesh Comments + – It can be done, works consistently and already provides useful data – Latency mesh stable, once configured sonars are stable.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
© 2014 Level 3 Communications, LLC. All Rights Reserved. Proprietary and Confidential. Simple, End-to-End Performance Management Application Performance.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
LHCONE NETWORK SERVICES: GETTING SDN TO DEV-OPS IN ATLAS Shawn McKee/Univ. of Michigan LHCONE/LHCOPN Meeting, Taipei, Taiwan March 14th, 2016 March 14,
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
PerfSONAR operations meeting 3 rd October Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model.
Bob Jones EGEE Technical Director
WLCG IPv6 deployment strategy
Shawn McKee, Marian Babik for the
WLCG Network Discussion
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
perfSONAR-PS Deployment: Status/Plans
POW MND section.
LHCOPN/LHCONE perfSONAR Update
Alerting/Notifications (MadAlert)
Presentation transcript:

Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer Metrics Working Group CHEP 2015, Okinawa, Japan 13 th April 2015

A crucial contributor to the success of the massively scaled global computing system that delivers the analysis needs of the LHC experiments is the networking infrastructure upon which the system is built. – The LHC experiments have been able to exploit excellent high-bandwidth networking in adapting their computing models for the most efficient utilization of resources. – However there are still challenges and opportunities to make this even more effective for our needs. CHEP Network Integration Motivations

We already have deployed End-to-end monitoring of our networks using perfSONAR. – perfSONAR, combined with data flow performance metrics, further allows our applications to adapt based on real time conditions. New, advanced networking technologies are slowly becoming available as production equipment is upgraded and replaced SDN – Software Defined Networking(SDN) holds the potential to further leverage the network to optimize workflows and dataflows We foresee eventual proactive control of the network fabric on the part of high level applications such as experiment workload management and data management systems. We must prepare for this… CHEP Evolving Network Capabilities

Goals: – Find and isolate “network” problems; alerting in time – Characterize network use (base-lining) – Provide a source of network metrics for higher level services Choice of a standard open source tool: perfSONAR – Benefiting from the R&E community consensus Tasks achieved: – Monitoring in place to create a baseline of the current situation between sites – Continuous measurements to track the network, alerting on problems as they develop – Developed test coverage and made it possible to run “on-demand” tests to quickly isolate problems and identify problematic links 4 Network Monitoring in WLCG/OSG CHEP 2015

5 perfSONAR Deployment 259 perfSONAR instances registered in GOCDB/OIM 233 Active perfSONAR instances 172 Running latest version (3.4.2) Initial deployment coordinated by WLCG perfSONAR TF Commissioning of the network followed by WLCG Network and Transfer Metrics WG CHEP for stats

The experiments and various infrastructure projects have been asked for use-cases for network monitoring and working group responsibilities Summary of responses: – Defining and understanding slow transfers. perfSONAR can debug, once you know that a link is "slow". – Identifying and localizing real network problems – Alerting the right people when there are network problems – Defining procedures to contact the sites and to "assign" them the issues – Producing a "best source" table, on the line of what we have done with the Cost-Matrix, which can also improve our brokering. – Commissioning new sites and storage – Enabling network-aware tools CHEP WG Use Cases / Responsibilities

7 perfSONAR Configuration Interface perfSONAR instance can participate in more than one mesh Configuration interface and auto-URL enables dynamic re- configuration of the entire perfSONAR network perfSONAR instance only needs to be configured once, with an auto-URL ( tname/ )

Auto-summaries are available per mesh Service summaries per metric type GOAL: To ensure we continue to reliably obtain ALL network metrics 8 pS Infrastructure Monitoring CHEP 2015

9 OSG perfSONAR Datastore CHEP 2015 A critical component is the datastore to organize and store the network metrics and associated metadata q OSG is gathering relevant metrics from the complete set of OSG and WLCG perfSONAR instances q This data will be available via an API, must be visualized and must be organized to provide the “OSG Networking Service” q Operating now q Targeting a production service by mid-summer.

10 Experiments Interface to Datastore

Goal – Provide platform to integrate network and transfer metrics ANSE – Enable network-aware tools (see ANSE Network resource allocation along CPU and storage Bandwidth reservation Create custom topology Plan – Provide latency and trace routes and test how they can be integrated with throughput from transfer systems – Provide mapping between sites/storages and sonars – Uniform access to the network monitoring Pilot projects – FTS performance – adding latency and routing to the optimizer – Experiment’s interface to datastore 11 Integration Projects CHEP 2015

perfSONAR perfSONAR widely deployed and already showing benefits in troubleshooting network issues – Additional deployments within R&E/overlay networks still needed Significant progress in configuration and infrastructure monitoring – Helping to reach full potential of the perfSONAR deployment OSG datastore – community network data store for all perfSONAR metrics – planned to enter production in Q3 Integration projects underway to aggregate network and transfer metrics – FTS Performance – Experiment’s interface to perfSONAR ANSE ANSE project is providing “hooks” in PANDA and PhEDEx to utilize & future SDN capabilities as they become available at our sites and in our networks. 12 Closing remarks CHEP 2015

13 Questions? CHEP 2015

Network Documentation OSG OSG Deployment documentation for OSG and WLCG hosted in OSG New 3.4 MA guide Modular Dashboard and OMD Prototypes – OSG Production instances for OMD, MaDDash and Datastore – – – Mesh-config in OSG Use-cases document for experiments and middleware dkPQTQic0VbH1mc/edit dkPQTQic0VbH1mc/edit CHEP References