Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002

Similar presentations


Presentation on theme: "Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002"— Presentation transcript:

1 Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002
Web100 Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002

2 Outline Project Overview Progress to Date Code Capabilities
Motivation: What is the problem Web100 Collaboration Progress to Date Standardization Process Code Release Code Capabilities Overview of Users Web100 Resources

3 Motivations: What’s the Problem?
High performance flows slower than line rate Delays continue/increase even with higher bandwidth TCP tuning issues are non-trivial Poorly conceived stacks Router/switch buffer queues inadequate Slow start and AIMD algorithm Eliminate/dramatically reduce the “wizard gap” Need for kernel instrumentation set for TCP variables

4 The Wizard Gap TCP over a long haul path
Year Wizards Non-wizards Ratio 1Mb/s kb/s 3:1 10Mb/s Mb/s 1Gb/s Mb/s :1 Scientists/researchers not happy with this

5

6 TCP tuning is painful debugging
All problems limit performance IP routing, long round trip times Improper MSS negotiations or path MTU discovery IP Packet reordering Packet losses, congestion, lame hardware TCP sender or receive buffer space Inefficient applications Any one problem can mask all the others and confound all but the best (and few) tuning gurus Need for better diagnostics and visibility into problems

7 Goal and Method Make it “easy” (transparent) for non-experts to achieve higher throughput performance Enhance TCP capabilities with better (finer grain) kernel instrumentation and automatic controls Real time triage capability determines sender, receiver, and/or network bottlenecks

8 Why Focus on TCP TCP has an ideal vantage point into throughput problem space TCP can identify bottleneck subsystem(s) TCP already measures the network (some) TCP can measure the application TCP can adjust itself (auto-tuning feedback)

9 Web100 Collaboration Funded by the NSF Collaborators
Currently Year 2 of a 3 Year grant. Cisco URP for initial seed funding. Collaborators PSC (Matt Mathis, R. Reddy, Janet Brown, John Heffner) NCAR (Peter O’Neil, Marla Meehl) NCSA (John Estabrook, Tanya Brethour, Stephen Engelhardt, Jim Ferguson)

10 What is in the code Web100 software consists of:
TCP Kernel Instrument Set (TPC-KIS) Instruments coded directly in to the Operating System kernel. Derived Instrument Set (DIS) Information that is collected based on KIS parameters. Application Code Tools, applications, etc. that use the information provided by the KIS and DIS.

11 Kernel Instrument Set Definition How it is implemented
Set of instruments designed to collect as much of the information as possible to enable a user to isolate the performance problems of a TCP connection. How it is implemented Each instrument is a variable in a "stats" structure that is linked through the kernel socket structure. The Linux /proc interface is used to expose these instruments outside the kernel.

12 What is the TCP-KIS? TCP-KIS instruments group naturally into categories. Currently roughly 19 categories. Already more than 125 instruments have been developed. For each instrument: Precise (standards ready) definition. Instrument code in the kernel Implementation verification tests Does the kernel implementation meet the definition. Prototype diagnostic tool(s) to demonstrate functionality and effectiveness.

13 TCP-KIS Basic instrumentation examples
Connection ID: 5-tuple that uniquely identifies a connection. State: determines what protocol features or algorithms are enabled. Traffic out: statistics aggregate packets and traffic sent out on a connection.

14 Local Sender Triage Group of instruments associated with the local sender. Determine what subsystems are throttling TCP data transmission. Three parallel sets of instruments that measure: Receiver Window Network Congestion Senders Availability

15 Local Sender Groups Other groups of instruments associated with the Local Sender: Local Sender Congestion Model Local Sender Loss Model Local Sender Re-order Model Local Sender RTT Local Sender Segment Size Local Sender Bottlenecks Local Sender Tuning

16 Other Instruments Similar instruments for the Local Receiver.
Observed Receiver instruments Often inferred from the data stream. E.g, Observed Receiver - receivers state is inferred from the ACK stream. Application Interface Future instruments to collect statistics on how the application is using the network.

17 Userland Distribution
Released asynchronously with kernel distribution Currently at Alpha 1.1 Version 1.2 release imminent Consists of The web100 library Command line utilities GUI utilities

18 Web100 Library Web100 kernel exposes critical TCP variables/instruments through /proc Web100 library provides the necessary access functions to access these variables/instruments Functions Read the value of a variable/instrument Snap shot of a group (facilitates atomic reading of a group of variables) Modify tunable variables (ex. send buffer size) Etc …

19 Utilities Command line utilities GUI utilities Useful in batch scripts
Serve as demo codes for the usage of web100 library GUI utilities Based on GTK+ Useful for troubleshooting network applications Serve as examples for application developers

20 GUI Sample Screens – DTB

21 Connection Selector

22 Looking at a Variable

23 Timeline - Year 1 Alpha code development Establish User Support
Initial User Community Very limited to begin with. Knowledgeable users, expected to provide technical input on the code. Understand and develop applications.

24 Timeline - Year 2 Began standardization process. Develop public code
Develop MIB Submit to IETF Develop public code Fix bugs in alpha versions Add instrumentation Code release Continue code development Identify and add new instruments

25 Code Releases - To date Initial Release
Alpha0.2, released May 23, 2001 Alpha0.3, released Sept. 19, 2001 Alpha 1.0-Separation of Kernel and Userland code Kernel Patch: Alpha 1.1 for Linux , released March 18, 2002 Alpha 1.0, released March 1, 2002 Alpha 1.0, released February 26,2002 Userland: Alpha 1.1, released February 28, 2002

26 Timeline - Year 3 New pathprobe diagnostic tool (wip, unreleased).
Add another instruments. Review instruments and code with other wizards. Gain vendor support for ideas and code. Finalize IETF draft by December IETF meeting.

27 Milestones Over a year of ~ 30 alpha testers
Including: SLAC, ORNL, LBNL, and universities Modified Linux kernel supports Separation between KIS and library functions draft-ietf-tsvwg-tcp-mib-extension-00.txt draft-ietf-ipngwg-rfc2012-update-01.txt

28 Web100 Collaborator Activity
Rich Carlson, ANL Tom Dunnigan, ORNL Tom Hacker, U. of Michigan Doug Chang, SLAC Andreas Burkhardt & Matt Grob, Qualcomm Larry Dunn & Scott Dier, Cisco/U. of Minnesota Jason Lee, LBL

29 Collaborator Assistance
Bugs! Kernel Utilities Release Request new features Review and criticize documentation Way too easy on us

30 Collaborator Activity
Carlson/ANL working on a troubleshooting guide for LANs. Set up network of 13 identically equipped PIII connected via Cisco 5500 network switch, running Web100-enabled Linux. Introduces typical network faults (duplex mismatches, other config errors) and analyzes data for “signatures” of these faults. Modified Iperf 1.2 to collect variables and reverse flow.

31 Collaborator Activity
Dunnigan/ORNL has found web100 helpful in seeing losses/retransmission and congestion avoidance parameters of individual TCP flows, and for tuning flows Has developed a Web100-enabled ttcp Has developed a daemon that logs web100 variables for designated paths when a flow closes Has developed an autotuning daemon that uses web100 to tune flows, including modifications to web100 to support "event notification", so the daemon knows when a new flow/socket is opened

32 Collaborator Activity
Hacker/U.Michigan has been using the web100 software to help tune and diagnose end-to-end network performance problems across the U-M campus network as well as across Abilene for the Visible Human and Atlas projects at U-M. Chang/SLAC is looking to fix performance problem between Linux and Solaris machines.

33 Collaborator Activity
Qualcomm is using Web100 to measure TCP performance over certain types of high speed wireless links under development. Web100 is partially integrated into some other tools - in the sense that output reports are published automatically in a format similar to other tools Qualcomm uses. Dunn/Cisco currently using Web100 for a class at U.Minnesota. Includes accounts on test machine at NCSA.

34 Collaborator Activity
Lee/LBL has obtained accounts at SLAC and ANL for WAN testing, and have co-located one of our machines in Washington D.C. to do testing over SuperNet. Still in the process of testing all this out. Keith Jackson at LBL has written Python wrappers to the Web100 calls using swing.

35 Web100 Summary Main WWW site: www.web100.org
Freely available software distribution hundreds of downloads Please be cognizant of impacts on others Please use, test, provide feedback, contribute code IETF standards process to benefit all Attention turning to working with OS vendors to incorporate standards enhancements into their stacks


Download ppt "Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002"

Similar presentations


Ads by Google