Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002

Slides:



Advertisements
Similar presentations
The Transmission Control Protocol (TCP) carries most Internet traffic, so performance of the Internet depends to a great extent on how well TCP works.
Advertisements

Globus FTP Evaluation test Catania – 10/04/2001Antonio Forte – INFN Torino.
Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
Shouldnt High-Performance Networks Be As Easy To Use As the Web? Basil Irwin Senior Network Engineer NETS July 13, 1999 National Center for Atmospheric.
Web100 Userland Status and future August 2, 2002 John Estabrook.
Autotuning in Web100 John W. Heffner August 1, 2002 Boulder, CO.
Web100 User Services Support Team: Tanya Brethour, Jim Ferguson, Steve Engelhardt
Experiences Using Web100 for Visible Human Testbeds Thomas Hacker Center for Advanced Computing, University of Michigan Brian Athey Michigan Center for.
Click to edit Master title style Click to edit Master text styles –Second level Third level –Fourth level »Fifth level 1 List of Nominations Whats Good.
Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Using NetLogger and Web100 for TCP analysis Data Intensive Distributed Computing Group Lawrence Berkeley National Laboratory Brian L. Tierney.
Tuning and Evaluating TCP End-to-End Performance in LFN Networks P. Cimbál* Measurement was supported by Sven Ubik**
LOGO Transmission Control Protocol 12 (TCP) Data Flow.
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture & Protocols TCP-Friendly Transport Protocols.
1 The ns-2 Network Simulator H Plan: –Discuss discrete-event network simulation –Discuss ns-2 simulator in particular –Demonstration and examples: u Download,
Pushing Up Performance for Everyone Matt Mathis 7-Dec-99.
Pathload A measurement tool for end-to-end available bandwidth Manish Jain, Univ-Delaware Constantinos Dovrolis, Univ-Delaware Sigcomm 02.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Data Communication and Networks
Computer Networks Transport Layer. Topics F Introduction  F Connection Issues F TCP.
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Performance Diagnostic Research at PSC Matt Mathis John Heffner Ragu Reddy 5/12/05 PathDiag ppt.
Network Simulation Internet Technologies and Applications.
NDT Tools Tutorial: How-To setup your own NDT server Rich Carlson Summer 04 Joint Tech July 19, 2004.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
Using NDT July 22 nd 2013, XSEDE Network Performance Tutorial Jason Zurawski – Internet2/ESnet.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
Development of network-aware operating systems Tom Dunigan
MaxNet NetLab Presentation Hailey Lam Outline MaxNet as an alternative to TCP Linux implementation of MaxNet Demonstration of fairness, quick.
Developing the Web100 Based Network Diagnostic Tool (NDT) E2EpiPEs/Web100 Joint Session April 9, 2002 by Rich Carlson Argonne National Laboratory.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
1 BWdetail: A bandwidth tester with detailed reporting Masters of Engineering Project Presentation Mark McGinley April 19, 2007 Advisor: Malathi Veeraraghavan.
NET100 Development of network-aware operating systems Tom Dunigan
Network Diagnostic Tool (NDT) Duplex-Mismatch detection update Fall Member Meeting Sept 21, 2005 Rich Carlson
Network Path and Application Diagnostics Matt Mathis John Heffner Ragu Reddy 4/24/06 PathDiag ppt.
NET100 … as seen from ORNL Tom Dunigan November 8, 2001.
NET100 Development of network-aware operating systems Tom Dunigan
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Basil Irwin & George Brett.
Wide Area Network Performance Analysis Methodology Wenji Wu, Phil DeMar, Mark Bowden Fermilab ESCC/Internet2 Joint Techs Workshop 2007
The TCP-ESTATS-MIB Matt Mathis John Heffner Raghu Reddy Pittsburgh Supercomputing Center Rajiv Raghunarayan Cisco Systems J. Saperia JDS Consulting, Inc.
1 Evaluating NGI performance Matt Mathis
Web100 Basil Irwin National Center for Atmospheric Research Matt Mathis Pittsburgh Supercomputing Center Halloween, 2000.
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Roll Out I2 Members Meeting.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
NET100 Development of network-aware operating systems Tom Dunigan
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 and Logistical Networking.
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100: developing network-aware operating systems New (9/01) DOE-funded (Office of.
INDIANAUNIVERSITYINDIANAUNIVERSITY Status of FAST TCP and other TCP alternatives John Hicks TransPAC HPCC Engineer Indiana University APAN Meeting – Hawaii.
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
UT-BATTELLE U.S. Department of Energy Oak Ridge National Laboratory Net100 year 1 leftovers (proposal): PSC –none ORNL –router access to SNMP data (besides.
Network-aware OS DOE/MICS ORNL site visit January 8, 2004 ORNL team: Tom Dunigan, Nagi Rao, Florence Fowler, Steven Carter Matt Mathis Brian.
Etere Video Assist The last step for a complete tapeless workflow Presented by: Emanuele Porfiri.
Troubleshooting Ben Fineman,
Wide Area Network Performance Analysis Methodology
Network Path and Application Diagnostics
Overview – SOE PatchTT December 2013.
Transport Protocols over Circuits/VCs
TCP-LP: A Distributed Algorithm for Low Priority Data Transfer
Congestion Control, Internet transport protocols: udp
Wide Area Networking at SLAC, Feb ‘03
Measuring End-to-end Bandwidth with Iperf using Web100
TCP flow and congestion control
Anant Mudambi, U. Virginia
STATEL an easy way to transfer data
Using NetLogger and Web100 for TCP analysis
Presentation transcript:

Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002 Web100 Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002

Outline Project Overview Progress to Date Code Capabilities Motivation: What is the problem Web100 Collaboration Progress to Date Standardization Process Code Release Code Capabilities Overview of Users Web100 Resources

Motivations: What’s the Problem? High performance flows slower than line rate Delays continue/increase even with higher bandwidth TCP tuning issues are non-trivial Poorly conceived stacks Router/switch buffer queues inadequate Slow start and AIMD algorithm Eliminate/dramatically reduce the “wizard gap” Need for kernel instrumentation set for TCP variables

The Wizard Gap TCP over a long haul path Year Wizards Non-wizards Ratio 1Mb/s 300kb/s 3:1 10Mb/s 1995 100Mb/s 1Gb/s 3Mb/s 300:1 Scientists/researchers not happy with this

TCP tuning is painful debugging All problems limit performance IP routing, long round trip times Improper MSS negotiations or path MTU discovery IP Packet reordering Packet losses, congestion, lame hardware TCP sender or receive buffer space Inefficient applications Any one problem can mask all the others and confound all but the best (and few) tuning gurus Need for better diagnostics and visibility into problems

Goal and Method Make it “easy” (transparent) for non-experts to achieve higher throughput performance Enhance TCP capabilities with better (finer grain) kernel instrumentation and automatic controls Real time triage capability determines sender, receiver, and/or network bottlenecks

Why Focus on TCP TCP has an ideal vantage point into throughput problem space TCP can identify bottleneck subsystem(s) TCP already measures the network (some) TCP can measure the application TCP can adjust itself (auto-tuning feedback)

Web100 Collaboration Funded by the NSF Collaborators Currently Year 2 of a 3 Year grant. Cisco URP for initial seed funding. Collaborators PSC (Matt Mathis, R. Reddy, Janet Brown, John Heffner) NCAR (Peter O’Neil, Marla Meehl) NCSA (John Estabrook, Tanya Brethour, Stephen Engelhardt, Jim Ferguson)

What is in the code Web100 software consists of: TCP Kernel Instrument Set (TPC-KIS) Instruments coded directly in to the Operating System kernel. Derived Instrument Set (DIS) Information that is collected based on KIS parameters. Application Code Tools, applications, etc. that use the information provided by the KIS and DIS.

Kernel Instrument Set Definition How it is implemented Set of instruments designed to collect as much of the information as possible to enable a user to isolate the performance problems of a TCP connection. How it is implemented Each instrument is a variable in a "stats" structure that is linked through the kernel socket structure. The Linux /proc interface is used to expose these instruments outside the kernel.

What is the TCP-KIS? TCP-KIS instruments group naturally into categories. Currently roughly 19 categories. Already more than 125 instruments have been developed. For each instrument: Precise (standards ready) definition. Instrument code in the kernel Implementation verification tests Does the kernel implementation meet the definition. Prototype diagnostic tool(s) to demonstrate functionality and effectiveness.

TCP-KIS Basic instrumentation examples Connection ID: 5-tuple that uniquely identifies a connection. State: determines what protocol features or algorithms are enabled. Traffic out: statistics aggregate packets and traffic sent out on a connection.

Local Sender Triage Group of instruments associated with the local sender. Determine what subsystems are throttling TCP data transmission. Three parallel sets of instruments that measure: Receiver Window Network Congestion Senders Availability

Local Sender Groups Other groups of instruments associated with the Local Sender: Local Sender Congestion Model Local Sender Loss Model Local Sender Re-order Model Local Sender RTT Local Sender Segment Size Local Sender Bottlenecks Local Sender Tuning

Other Instruments Similar instruments for the Local Receiver. Observed Receiver instruments Often inferred from the data stream. E.g, Observed Receiver - receivers state is inferred from the ACK stream. Application Interface Future instruments to collect statistics on how the application is using the network.

Userland Distribution Released asynchronously with kernel distribution Currently at Alpha 1.1 Version 1.2 release imminent Consists of The web100 library Command line utilities GUI utilities

Web100 Library Web100 kernel exposes critical TCP variables/instruments through /proc Web100 library provides the necessary access functions to access these variables/instruments Functions Read the value of a variable/instrument Snap shot of a group (facilitates atomic reading of a group of variables) Modify tunable variables (ex. send buffer size) Etc …

Utilities Command line utilities GUI utilities Useful in batch scripts Serve as demo codes for the usage of web100 library GUI utilities Based on GTK+ Useful for troubleshooting network applications Serve as examples for application developers

GUI Sample Screens – DTB

Connection Selector

Looking at a Variable

Timeline - Year 1 Alpha code development Establish User Support www.web100.org Initial User Community Very limited to begin with. Knowledgeable users, expected to provide technical input on the code. Understand and develop applications.

Timeline - Year 2 Began standardization process. Develop public code Develop MIB Submit to IETF Develop public code Fix bugs in alpha versions Add instrumentation Code release Continue code development Identify and add new instruments

Code Releases - To date Initial Release Alpha0.2, released May 23, 2001 Alpha0.3, released Sept. 19, 2001 Alpha 1.0-Separation of Kernel and Userland code Kernel Patch: Alpha 1.1 for Linux 2.4.16, released March 18, 2002 Alpha 1.0, released March 1, 2002 Alpha 1.0, released February 26,2002 Userland: Alpha 1.1, released February 28, 2002

Timeline - Year 3 New pathprobe diagnostic tool (wip, unreleased). Add another 10-12 instruments. Review instruments and code with other wizards. Gain vendor support for ideas and code. Finalize IETF draft by December IETF meeting.

Milestones Over a year of ~ 30 alpha testers Including: SLAC, ORNL, LBNL, and universities www.net100.org Modified Linux kernel supports 2.4.16 Separation between KIS and library functions draft-ietf-tsvwg-tcp-mib-extension-00.txt draft-ietf-ipngwg-rfc2012-update-01.txt

Web100 Collaborator Activity Rich Carlson, ANL Tom Dunnigan, ORNL Tom Hacker, U. of Michigan Doug Chang, SLAC Andreas Burkhardt & Matt Grob, Qualcomm Larry Dunn & Scott Dier, Cisco/U. of Minnesota Jason Lee, LBL

Collaborator Assistance Bugs! Kernel Utilities Release Request new features Review and criticize documentation Way too easy on us

Collaborator Activity Carlson/ANL working on a troubleshooting guide for LANs. Set up network of 13 identically equipped PIII connected via Cisco 5500 network switch, running Web100-enabled Linux. Introduces typical network faults (duplex mismatches, other config errors) and analyzes data for “signatures” of these faults. Modified Iperf 1.2 to collect variables and reverse flow.

Collaborator Activity Dunnigan/ORNL has found web100 helpful in seeing losses/retransmission and congestion avoidance parameters of individual TCP flows, and for tuning flows Has developed a Web100-enabled ttcp Has developed a daemon that logs web100 variables for designated paths when a flow closes Has developed an autotuning daemon that uses web100 to tune flows, including modifications to web100 to support "event notification", so the daemon knows when a new flow/socket is opened

Collaborator Activity Hacker/U.Michigan has been using the web100 software to help tune and diagnose end-to-end network performance problems across the U-M campus network as well as across Abilene for the Visible Human and Atlas projects at U-M. Chang/SLAC is looking to fix performance problem between Linux and Solaris machines.

Collaborator Activity Qualcomm is using Web100 to measure TCP performance over certain types of high speed wireless links under development. Web100 is partially integrated into some other tools - in the sense that output reports are published automatically in a format similar to other tools Qualcomm uses. Dunn/Cisco currently using Web100 for a class at U.Minnesota. Includes accounts on test machine at NCSA.

Collaborator Activity Lee/LBL has obtained accounts at SLAC and ANL for WAN testing, and have co-located one of our machines in Washington D.C. to do testing over SuperNet. Still in the process of testing all this out. Keith Jackson at LBL has written Python wrappers to the Web100 calls using swing.

Web100 Summary Main WWW site: www.web100.org Freely available software distribution www.web100.org/download hundreds of downloads Please be cognizant of impacts on others Please use, test, provide feedback, contribute code IETF standards process to benefit all Attention turning to working with OS vendors to incorporate standards enhancements into their stacks