Presentation on theme: "1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki,"— Presentation transcript:
1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki, Hari Balakrishnan, David Cheriton, Petros Maniatis, Ion Stoica, Mike Walfish)
2 Process Why are we doing this, anyway?
3 Why the Clean Slate Mania? Internet in crisis? -lack of functionality not a crucial problem -lack of reliability is most important problem Research community in crisis? -little practical impact on architecture -narrowed focus, stopped asking the big questions NSF’s response: FIND and GENI -but not enough by itself....
4 You Can Lead an Academic to Architecture, but.... Normal academic behavior won’t produce architecture -Publication requires differentiation and/or indifference -Architecture comes from critique and synthesis work on ideas other than your own..... Can’t just design, simulate and abandon -must also experiment and deploy then discuss and synthesize Process change harder than technical issues -adoption is much harder than both!
5 Some Thoughts on Architecture material covered in several papers (apologies to those who have heard all this before) not comprehensive architecture, many issues ignored
6 What’s Wrong with the Internet? Internet is everywhere, used for (almost) everything Main limiting factor seems to be lack of reliability -can’t do telesurgery, air traffic control, etc. Hard to improve reliability of packet delivery within current architecture Vulnerable to attacks, misconfigurations and failures
7 Packet Delivery Problems Access link failures -multihome Routing failures -security, policy, configuration, convergence, multipath,... Congestion control failures -FQ, XCP, RCP,.... DoS -default-off, capabilities, filters,...
8 Packet Delivery Problems Technical solutions are largely at hand -not perfect, but huge improvement over status quo No overarching synthetic architecture has emerged -symptom of process failure, or just too early? But packet delivery won’t be the focus of this talk.... -because only experts see it as the major problem
9 Normal User’s Perspective Other forms of failure dominate: out-of-date addresses broken links misleading urls and/or inauthentic data applications blocked by NATs, etc. unusable or unreliable due to spam......
10 Why? Three Important Changes Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam
11 Three Important Changes 1. Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam
12 Not just host-oriented apps.... Of course, packets always flow from host to host -modulo middleboxes.... But which host are the packets sent to? This is controlled by what hostname is used So adjusting to data-oriented apps involves re- evaluating the Internet naming system -data, service specified by host/path pair
13 Problems with host/path names Data movement causes broken links -names should be persistent Replication unnecessarily difficult -Akamai expensive, and can’t replicate at object granularity -Google, P2P, etc. do this now.... DNS names lead to legal/political battles -increasingly important, witness ICANN debacle Names don’t facilitate authentication -can’t easily verify that data originated with intended source
15 Fix #2: Use Names in Appropriate Layer User-level descriptors (e.g., search) App session App-specific search/lookup returns SID Transport Resolves SID to EID Opens transport conns IP Resolves EID to IP Bind to EID (HIP) SIDs IP hdrEIDTCPSID… IP Transport App session Application
16 Fix #3: Names Should be Flat! 0xf436f0ab527bac9e8b100afeff A name can be persistent if and only if it doesn’t embed any mutable information about its referent Flat names embed no information, so they can be used to persistently name anything -Enables inter-domain migration, etc. Once you have a large flat namespace, you never need other global handles -no distinction between EIDs, SIDs, etc.
17 Disadvantages of Flat Names Hard to resolve No local control No locality Not human friendly all can be handled, but flat names do require new resolution infrastructure
18 Fix #4: Make Names Self-certifying Name = Hash(pubkey, salt) Value = -can verify name related to pubkey and pubkey signed data Can receive data from caches or other 3rd parties without worry -much more opportunistic data transfer
19 Proposed Naming System Flat, self-certifying identifiers for all entities Used in “layered” fashion so that each protocol binds to the correct level of abstraction Names are persistent, verifiable, and support easy replication and migration Requirement: industrial-strength flat name resolver -names, key revocation (later, another use)
20 Three Important Changes 1. Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam
21 Not just end-to-end.... Middleboxes provide important functionality -NATs, firewalls, proxies, caches, app accelerators, etc. But processing between endpoints violates pure end- to-end religion, and causes many practical problems -e.g., NATs interfere with many applications, How can architecture support middleboxes better? -eliminate problems and make them architecturally sound
22 Delegation via Resolution Names usually resolve to “location” of entity Delegation principle: A network entity should be able to direct resolutions of its name not only to its own location, but also to chosen delegates Semantics: -where am I where should packets be sent to reach me This allows packets to be directed towards middleboxes in a clean and coherent manner
23 Architecturally-Sound Middleboxes EID d IP ipd EID s Firewall IP ipf ipf d MappingDest EID Delegate can be anywhere, not necessarily on path Can apply to app-layer middle boxes Including SID, EID in packet is crucial ipf EID d TCP hdr Packet structure ipd TCP hdr Packet structure ipd d MappingDest EID Current (Bad) Middleboxes Example
24 Possible Impacts More general services: more complex services (like Riverbed, transcoding, etc.) can fit within framework Remote services, not boxes: since middleboxes need not be on-path, services like firewalls, virus-scanners, etc. can be provided as remote services Rethinking transport: with intermediaries between endpoints, basic notion of the transport layer should be rethought, combining ideas from DTN, DOT, etc.
25 Three Important Changes 1. Host-to-host accessing data and services 2. End-to-end middleboxes 3. Appropriate communication spam
26 Restraining Usage Can’t be at packet level, must be app-dependent But don’t want separate mechanism for each app - , IM, wiki, etc. Proposal: quota system -quotas allocated in application-dependent manner -quotas enforced through single mechanism stamp for each usage, canceled through mechanism see NSDI 06 paper for details.... Uses flat name resolution
27 Summary: Other Forms of Failure..... broken links and pointers: persistent names inauthentic data: self-certifying names applications blocked by NATs, etc.: delegation spam and other clutter: quota enforcement No change to IP or routers!
28 Troubleshooting and Debugging because things inevitably fail.....
29 User’s Perspective Want to know who to yell at -identify responsible entity (at appropriate granularity) Want their complaints to be taken seriously -provide credible and actionable report Want the problem fixed, now -detailed diagnostic tools -this is traditional focus of troubleshooting
30 User’s Perspective Want to know who to yell at -identify responsible entity (at appropriate granularity) Want their complaints to be taken seriously -provide credible and actionable reports Want the problem fixed -detailed debugging tools -this is traditional focus of work in this area
31 Vision Incorporate coherent set of monitoring tools into architecture that: -record necessary information -process information to answer relevant questions Key points: -not just statistics (e.g., Netflow), but answers -focus broader than just detailed diagnostics Three examples
32 Ex. #1: Monitoring ISPs Monitor boxes on peering links record packet digests -no internal information revealed Boxes exchange information to determine where packets are dropped and/or delayed Information ends up at source ISP or end user Overhead: ~2-4% of packet bandwidth Can be applied within enterprises, etc.
33 Ex. #2: Multilayer Tracing Traceroute is useful, but limited to IP XTrace (just started) is a generalized version: -operates at multiple layers -follows recursive packet generation (DNS queries, etc.) -can implement policies about when to respond Requirements: -layer must be able to handle and propagate metadata -module on box to intercept and report on packets
34 Ex. #3: Distributed Debugging When bugs occur in operation, it can be extremely difficult to locate and reproduce We are developing liblog, a log-and-replay debugging tool (early) that is always turned on Lots of log-and-replay debuggers, ours meets a special set of requirements....(not described here)
Logging and Replay 1. Each process logs its execution to a local file 2. Logs are collected at central location and replayed Node 1 Node 2 Log 2 Node 3 Log 3 app liblog Replay Node app liblog app liblog Log 2 Log 3 Log 1 console GDB app/liblog
36 Extensions liblog generates too much data -hard to sift through for large systems Next step: setting global watchpoints and breakpoints Can specify in terms of general expressions (python) -routing loops, state inconsistencies, etc. No operational experience yet
37 Troubleshooting and Debugging Automated end-user reporting tools would be useful to both users and ISPs -lots of low-hanging fruit Not clear ISPs will take the lead on troubleshooting -ISPs may not be eager to admit fault -but they should be eager to reduce phonebank expenses Experience needed with distributed debugger in networking context
38 Summary Biggest challenge is to get community talking to each other rather than past each other Reliability more pressing than functionality -have tools to provide better packet delivery -then considered wider set of failure modes -can handle without IP/router involvement Troubleshooting should be part of “architecture” -nowhere near coherent yet -looking for basic building blocks