Presentation on theme: "Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002"— Presentation transcript:
1Wendy Huntoon - PSC Jim Ferguson - NCSA I2 Members Meeting May 2002 Web100Wendy Huntoon - PSCJim Ferguson - NCSAI2 Members MeetingMay 2002
2Outline Project Overview Progress to Date Code Capabilities Motivation: What is the problemWeb100 CollaborationProgress to DateStandardization ProcessCode ReleaseCode CapabilitiesOverview of UsersWeb100 Resources
3Motivations: What’s the Problem? High performance flows slower than line rateDelays continue/increase even with higher bandwidthTCP tuning issues are non-trivialPoorly conceived stacksRouter/switch buffer queues inadequateSlow start and AIMD algorithmEliminate/dramatically reduce the “wizard gap”Need for kernel instrumentation set for TCP variables
4The Wizard Gap TCP over a long haul path Year Wizards Non-wizards Ratio1Mb/s kb/s 3:110Mb/sMb/s1Gb/s Mb/s :1Scientists/researchers not happy with this
6TCP tuning is painful debugging All problems limit performanceIP routing, long round trip timesImproper MSS negotiations or path MTU discoveryIP Packet reorderingPacket losses, congestion, lame hardwareTCP sender or receive buffer spaceInefficient applicationsAny one problem can mask all the others and confound all but the best (and few) tuning gurusNeed for better diagnostics and visibility into problems
7Goal and MethodMake it “easy” (transparent) for non-experts to achieve higher throughput performanceEnhance TCP capabilities with better (finer grain) kernel instrumentation and automatic controlsReal time triage capability determines sender, receiver, and/or network bottlenecks
8Why Focus on TCPTCP has an ideal vantage point into throughput problem spaceTCP can identify bottleneck subsystem(s)TCP already measures the network (some)TCP can measure the applicationTCP can adjust itself (auto-tuning feedback)
9Web100 Collaboration Funded by the NSF Collaborators Currently Year 2 of a 3 Year grant.Cisco URP for initial seed funding.CollaboratorsPSC (Matt Mathis, R. Reddy, Janet Brown, John Heffner)NCAR (Peter O’Neil, Marla Meehl)NCSA (John Estabrook, Tanya Brethour, Stephen Engelhardt, Jim Ferguson)
10What is in the code Web100 software consists of: TCP Kernel Instrument Set (TPC-KIS)Instruments coded directly in to the Operating System kernel.Derived Instrument Set (DIS)Information that is collected based on KIS parameters.Application CodeTools, applications, etc. that use the information provided by the KIS and DIS.
11Kernel Instrument Set Definition How it is implemented Set of instruments designed to collect as much of the information as possible to enable a user to isolate the performance problems of a TCP connection.How it is implementedEach instrument is a variable in a "stats" structure that is linked through the kernel socket structure.The Linux /proc interface is used to expose these instruments outside the kernel.
12What is the TCP-KIS?TCP-KIS instruments group naturally into categories.Currently roughly 19 categories.Already more than 125 instruments have been developed.For each instrument:Precise (standards ready) definition.Instrument code in the kernelImplementation verification testsDoes the kernel implementation meet the definition.Prototype diagnostic tool(s) to demonstrate functionality and effectiveness.
13TCP-KIS Basic instrumentation examples Connection ID: 5-tuple that uniquely identifies a connection.State: determines what protocol features or algorithms are enabled.Traffic out: statistics aggregate packets and traffic sent out on a connection.
14Local Sender TriageGroup of instruments associated with the local sender.Determine what subsystems are throttling TCP data transmission.Three parallel sets of instruments that measure:Receiver WindowNetwork CongestionSenders Availability
15Local Sender GroupsOther groups of instruments associated with the Local Sender:Local Sender Congestion ModelLocal Sender Loss ModelLocal Sender Re-order ModelLocal Sender RTTLocal Sender Segment SizeLocal Sender BottlenecksLocal Sender Tuning
16Other Instruments Similar instruments for the Local Receiver. Observed Receiver instrumentsOften inferred from the data stream.E.g, Observed Receiver - receivers state is inferred from the ACK stream.Application InterfaceFuture instruments to collect statistics on how the application is using the network.
17Userland Distribution Released asynchronously with kernel distributionCurrently at Alpha 1.1Version 1.2 release imminentConsists ofThe web100 libraryCommand line utilitiesGUI utilities
18Web100 LibraryWeb100 kernel exposes critical TCP variables/instruments through /procWeb100 library provides the necessary access functions to access these variables/instrumentsFunctionsRead the value of a variable/instrumentSnap shot of a group (facilitates atomic reading of a group of variables)Modify tunable variables (ex. send buffer size)Etc …
19Utilities Command line utilities GUI utilities Useful in batch scripts Serve as demo codes for the usage of web100 libraryGUI utilitiesBased on GTK+Useful for troubleshooting network applicationsServe as examples for application developers
23Timeline - Year 1 Alpha code development Establish User Support Initial User CommunityVery limited to begin with.Knowledgeable users, expected to provide technical input on the code.Understand and develop applications.
24Timeline - Year 2 Began standardization process. Develop public code Develop MIBSubmit to IETFDevelop public codeFix bugs in alpha versionsAdd instrumentationCode releaseContinue code developmentIdentify and add new instruments
25Code Releases - To date Initial Release Alpha0.2, released May 23, 2001Alpha0.3, released Sept. 19, 2001Alpha 1.0-Separation of Kernel and Userland codeKernel Patch:Alpha 1.1 for Linux , released March 18, 2002Alpha 1.0, released March 1, 2002Alpha 1.0, released February 26,2002Userland:Alpha 1.1, released February 28, 2002
26Timeline - Year 3 New pathprobe diagnostic tool (wip, unreleased). Add another instruments.Review instruments and code with other wizards.Gain vendor support for ideas and code.Finalize IETF draft by December IETF meeting.
27Milestones Over a year of ~ 30 alpha testers Including: SLAC, ORNL, LBNL, and universitiesModified Linux kernel supportsSeparation between KIS and library functionsdraft-ietf-tsvwg-tcp-mib-extension-00.txtdraft-ietf-ipngwg-rfc2012-update-01.txt
28Web100 Collaborator Activity Rich Carlson, ANLTom Dunnigan, ORNLTom Hacker, U. of MichiganDoug Chang, SLACAndreas Burkhardt & Matt Grob, QualcommLarry Dunn & Scott Dier, Cisco/U. of MinnesotaJason Lee, LBL
29Collaborator Assistance Bugs!KernelUtilitiesReleaseRequest new featuresReview and criticize documentationWay too easy on us
30Collaborator Activity Carlson/ANL working on a troubleshooting guide for LANs.Set up network of 13 identically equipped PIII connected via Cisco 5500 network switch, running Web100-enabled Linux.Introduces typical network faults (duplex mismatches, other config errors) and analyzes data for “signatures” of these faults.Modified Iperf 1.2 to collect variables and reverse flow.
31Collaborator Activity Dunnigan/ORNL has found web100 helpful in seeing losses/retransmission and congestion avoidance parameters of individual TCP flows, and for tuning flowsHas developed a Web100-enabled ttcpHas developed a daemon that logs web100 variables for designated paths when a flow closesHas developed an autotuning daemon that uses web100 to tune flows, including modifications to web100 to support "event notification", so the daemon knows when a new flow/socket is opened
32Collaborator Activity Hacker/U.Michigan has been using the web100 software to help tune and diagnose end-to-end network performance problems across the U-M campus network as well as across Abilene for the Visible Human and Atlas projects at U-M.Chang/SLAC is looking to fix performance problem between Linux and Solaris machines.
33Collaborator Activity Qualcomm is using Web100 to measure TCP performance over certain types of high speed wireless links under development. Web100 is partially integrated into some other tools - in the sense that output reports are published automatically in a format similar to other tools Qualcomm uses.Dunn/Cisco currently using Web100 for a class at U.Minnesota. Includes accounts on test machine at NCSA.
34Collaborator Activity Lee/LBL has obtained accounts at SLAC and ANL for WAN testing, and have co-located one of our machines in Washington D.C. to do testing over SuperNet. Still in the process of testing all this out.Keith Jackson at LBL has written Python wrappers to the Web100 calls using swing.
35Web100 Summary Main WWW site: www.web100.org Freely available software distributionhundreds of downloadsPlease be cognizant of impacts on othersPlease use, test, provide feedback, contribute codeIETF standards process to benefit allAttention turning to working with OS vendors to incorporate standards enhancements into their stacks