Presentation is loading. Please wait.

Presentation is loading. Please wait.

Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany.

Similar presentations


Presentation on theme: "Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany."— Presentation transcript:

1 Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany

2 Outline  Motivation  Review of perfSONAR  PerfSONAR issues  New components to address them  UNIS  Periscope  XSP  NLMI  E2E example with GridFTP  Visualizations from SC10 demo  Questions & rotten fruit 2/1/20111Internet2 Joint Techs 2011. Clemson, SC

3 Motivating Use-Cases  Analyzing PBs of experimental data on an HPC cluster  Offloading or disseminating PBs of simulation output  Large data transfers source: http://xkcd.com/401/ 2/1/20112Internet2 Joint Techs 2011. Clemson, SC

4 PerfSONAR Overview  Infrastructure & software for network performance analysis User or Application AbbrNamePurpose LSLookup serviceFind sources of measurements TopSTopology ServiceDescribe network topology MPMeasurement pointRetrieve/publish measurements MAMeasurement archive Store/publish measurements TSTransformation service Aggregate, sample, smooth measurements Discovery Data 2/1/20113Internet2 Joint Techs 2011. Clemson, SC

5 Motivating questions  How can we accurately forecast application performance?  How can we detect performance anomalies in real-time?  How can we troubleshoot poor application performance?  And improve it!  ‘Shooting the gap between expectation and reality 2/1/20114Internet2 Joint Techs 2011. Clemson, SC

6 PerfSONAR issues ① Data is hard to find  Cannot simply ask “which MPs have data for path” ② Slow  Lookups across multiple domains  Polling for data = RTT_net + Delay_DB + Delay_WS  XML serialization/deserialization ③ E2E analysis is difficult  No integrated host, application monitoring  Analysis/visualization done client-side and not exported ④ Measurement frequency is static  Always-on and lack of aggregation encourages large intervals 2/1/20115Internet2 Joint Techs 2011. Clemson, SC

7 ① Data is hard to find 2/1/20116Internet2 Joint Techs 2011. Clemson, SC

8 Unified Network Information Service (UNIS)  Merges TS & LS  Topology model  Tree of nodes at different layers (Network/Node/Port)  Relations between arbitrary nodes  Node properties  ‘GIS for networks’  Relates MPs, MAs to topology 2/1/20117Internet2 Joint Techs 2011. Clemson, SC

9 ② Slow 2/1/20118Internet2 Joint Techs 2011. Clemson, SC

10 Periscope: Topologically aware cache  PerfSONAR requests have topological locality  Pre-fetch and cache relevant perfSONAR information  New protocols to indicate interesting sub-topologies  Analysis functions  domain-specific transformations, e.g. forecasting  visualization (whee!)  Preserve uniform perfSONAR interface User or Application perfSONAR interface PeriscopeMP/MALS... 2/1/20119Internet2 Joint Techs 2011. Clemson, SC

11 Periscope data representation  Follow PerfSONAR data model  But use a simpler, more efficient format  Many good options:  JSON ✔  BSON ✔  Thrift  Avro  Protobuf  NetLogger 2/1/201110Internet2 Joint Techs 2011. Clemson, SC

12 ③ E2E Analysis is Difficult 2/1/201111Internet2 Joint Techs 2011. Clemson, SC

13 Missing metrics OSI LayerperfSONAR metrics ApplicationX PresentationX SessionX Transportbandwidth, delay Networkcapacity, bandwidth, delay Data linkavailability, loss, errors Physicalavailability, errors E2E ComponentperfSONAR metrics DiskX Host / ClusterX Network“yes” Network layersEnd-to-end components 2/1/201112Internet2 Joint Techs 2011. Clemson, SC

14 NetLogger Machine Information (NLMI)  Basic set of host probes, using /proc  Host interface statistics  TCP settings  CPU, memory  Disk I/O  Export data in Periscope data model 2/1/201113Internet2 Joint Techs 2011. Clemson, SC

15 ④ Measurement frequency is static 2/1/201114Internet2 Joint Techs 2011. Clemson, SC

16 eXtensible Session Protocol (XSP) EEstablishment, termination, and negotiation of a session between end-user application processes SSession = stateful layer over multiple other NE’s IIn-band or OOB signaling of control information OOther metadata can also be forwarded A A B B C C TCP xspd App Session NE Metadata 2/1/201115Internet2 Joint Techs 2011. Clemson, SC

17 Monitoring GridFTP  GridFTP’s XIO allows interception of I/O  New XIO layer can talk to a local xspd  Signaling: open/close  Performance: aggregated read/write  NetLogger’s nlcalipers library aggregates reads/writes into periodic summaries XIO layer GridFTP server XIO layer Disk and Network operation xspd signaling performance 2/1/201116Internet2 Joint Techs 2011. Clemson, SC XIO/XSP

18 Combining XSP, Periscope, NLMI 2/1/201117Internet2 Joint Techs 2011. Clemson, SC xspd Signaling XIO performance XIO layer XSP layer XIO layer NLMI Host stats GridFTP server XIO layer XSP layer XIO layer NLMI GridFTP server... Periscope perfSONAR services Clients perfSONAR protocols

19 Visualization 2/1/201118Internet2 Joint Techs 2011. Clemson, SC

20 Visualization cont. 2/1/201119Internet2 Joint Techs 2011. Clemson, SC

21 Conclusions  Periscope provides a platform for perfSONAR analysis  Caching to reduce latency, centralized correlation  Integration with XSP provides transparent monitoring and awareness of application state  Still polling perfSONAR, though – Publish/Subscribe? D. Martin Swany Faculty, UD Ezra Kissel Grad student, UD Ahmed El-Hassany Grad student, UD Guilty parties 2/1/201120Internet2 Joint Techs 2011. Clemson, SC Guilherme Fernandes Grad student, UD

22 Questions 2/1/201121Internet2 Joint Techs 2011. Clemson, SC Contact: dkgunter@lbl.gov

23 Extra slides 2/1/201122Internet2 Joint Techs 2011. Clemson, SC

24 UNIS example topology id : esnet domain id : urn:ogf:network:domain=ps.es.net, node _id : urn:ogf:network:domain=ps.es.net:node=albu-cr1 name : albu-crl description : Juniper address type : hostname value : albu-crl location latitude: +35.08 longitude : -106.64 2/1/201123Internet2 Joint Techs 2011. Clemson, SC

25 UNIS Example, cont. 134.55.40.186 albucr1-sdn-a-albusdn1.es.net urn:ogf:network:domain=ps.es.net:node=albu-cr1:port=ge- 5/0/0 255.255.255.252 2/1/201124Internet2 Joint Techs 2011. Clemson, SC


Download ppt "Troubleshooting GridFTP flows with XSP and Periscope Dan Gunter, presenter Ahmed El-Hassany, Ezra Kissel, Guilherme Fernandes, Martin Swany."

Similar presentations


Ads by Google