1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki,

1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki, Hari Balakrishnan, David Cheriton, Petros Maniatis, Ion Stoica, Mike Walfish)

2 Process Why are we doing this, anyway?

3 Why the Clean Slate Mania? Internet in crisis? -lack of functionality not a crucial problem -lack of reliability is most important problem Research community in crisis? -little practical impact on architecture -narrowed focus, stopped asking the big questions NSF’s response: FIND and GENI -but not enough by itself....

4 You Can Lead an Academic to Architecture, but.... Normal academic behavior won’t produce architecture -Publication requires differentiation and/or indifference -Architecture comes from critique and synthesis work on ideas other than your own..... Can’t just design, simulate and abandon -must also experiment and deploy..... -.....then discuss and synthesize Process change harder than technical issues -adoption is much harder than both!

5 Some Thoughts on Architecture material covered in several papers (apologies to those who have heard all this before) not comprehensive architecture, many issues ignored

6 What’s Wrong with the Internet? Internet is everywhere, used for (almost) everything Main limiting factor seems to be lack of reliability -can’t do telesurgery, air traffic control, etc. Hard to improve reliability of packet delivery within current architecture Vulnerable to attacks, misconfigurations and failures

7 Packet Delivery Problems Access link failures -multihome Routing failures -security, policy, configuration, convergence, multipath,... Congestion control failures -FQ, XCP, RCP,.... DoS -default-off, capabilities, filters,...

8 Packet Delivery Problems Technical solutions are largely at hand -not perfect, but huge improvement over status quo No overarching synthetic architecture has emerged -symptom of process failure, or just too early? But packet delivery won’t be the focus of this talk.... -because only experts see it as the major problem

9 Normal User’s Perspective Other forms of failure dominate: out-of-date email addresses broken links misleading urls and/or inauthentic data applications blocked by NATs, etc. email unusable or unreliable due to spam......

10 Why? Three Important Changes... 1. Host-to-host  accessing data and services 2. End-to-end  middleboxes 3. Appropriate communication  spam

11 Three Important Changes 1. Host-to-host  accessing data and services 2. End-to-end  middleboxes 3. Appropriate communication  spam

12 Not just host-oriented apps.... Of course, packets always flow from host to host -modulo middleboxes.... But which host are the packets sent to? This is controlled by what hostname is used So adjusting to data-oriented apps involves re- evaluating the Internet naming system -data, service specified by host/path pair

13 Problems with host/path names Data movement causes broken links -names should be persistent Replication unnecessarily difficult -Akamai expensive, and can’t replicate at object granularity -Google, P2P, etc. do this now.... DNS names lead to legal/political battles -increasingly important, witness ICANN debacle Names don’t facilitate authentication -can’t easily verify that data originated with intended source

14 Fix #1: Name Data/Services Directly Network locations: IP addresses Hosts: endpoints identifiers (EIDs) Data/Services: service identifiers (SIDs) -direct naming supports fine-grained migration/replication User-level descriptors: -search terms -canonical names (AOL keywords) -.......

15 Fix #2: Use Names in Appropriate Layer User-level descriptors (e.g., search) App session App-specific search/lookup returns SID Transport Resolves SID to EID Opens transport conns IP Resolves EID to IP Bind to EID (HIP) SIDs IP hdrEIDTCPSID… IP Transport App session Application

16 Fix #3: Names Should be Flat! 0xf436f0ab527bac9e8b100afeff394300 A name can be persistent if and only if it doesn’t embed any mutable information about its referent Flat names embed no information, so they can be used to persistently name anything -Enables inter-domain migration, etc. Once you have a large flat namespace, you never need other global handles -no distinction between EIDs, SIDs, etc.

17 Disadvantages of Flat Names Hard to resolve No local control No locality Not human friendly all can be handled, but flat names do require new resolution infrastructure

18 Fix #4: Make Names Self-certifying Name = Hash(pubkey, salt) Value = -can verify name related to pubkey and pubkey signed data Can receive data from caches or other 3rd parties without worry -much more opportunistic data transfer

19 Proposed Naming System Flat, self-certifying identifiers for all entities Used in “layered” fashion so that each protocol binds to the correct level of abstraction Names are persistent, verifiable, and support easy replication and migration Requirement: industrial-strength flat name resolver -names, key revocation (later, another use)

21 Not just end-to-end.... Middleboxes provide important functionality -NATs, firewalls, proxies, caches, app accelerators, etc. But processing between endpoints violates pure end- to-end religion, and causes many practical problems -e.g., NATs interfere with many applications, How can architecture support middleboxes better? -eliminate problems and make them architecturally sound

22 Delegation via Resolution Names usually resolve to “location” of entity Delegation principle: A network entity should be able to direct resolutions of its name not only to its own location, but also to chosen delegates Semantics: -where am I  where should packets be sent to reach me This allows packets to be directed towards middleboxes in a clean and coherent manner

23 Architecturally-Sound Middleboxes EID d IP ipd EID s Firewall IP ipf ipf d MappingDest EID Delegate can be anywhere, not necessarily on path Can apply to app-layer middle boxes Including SID, EID in packet is crucial ipf EID d TCP hdr Packet structure ipd TCP hdr Packet structure ipd d MappingDest EID Current (Bad) Middleboxes Example

24 Possible Impacts More general services: more complex services (like Riverbed, transcoding, etc.) can fit within framework Remote services, not boxes: since middleboxes need not be on-path, services like firewalls, virus-scanners, etc. can be provided as remote services Rethinking transport: with intermediaries between endpoints, basic notion of the transport layer should be rethought, combining ideas from DTN, DOT, etc.

26 Restraining Usage Can’t be at packet level, must be app-dependent But don’t want separate mechanism for each app -Email, IM, wiki, etc. Proposal: quota system -quotas allocated in application-dependent manner -quotas enforced through single mechanism stamp for each usage, canceled through mechanism see NSDI 06 paper for details.... Uses flat name resolution

27 Summary: Other Forms of Failure..... broken links and pointers: persistent names inauthentic data: self-certifying names applications blocked by NATs, etc.: delegation spam and other clutter: quota enforcement No change to IP or routers!

28 Troubleshooting and Debugging because things inevitably fail.....

29 User’s Perspective Want to know who to yell at -identify responsible entity (at appropriate granularity) Want their complaints to be taken seriously -provide credible and actionable report Want the problem fixed, now -detailed diagnostic tools -this is traditional focus of troubleshooting

30 User’s Perspective Want to know who to yell at -identify responsible entity (at appropriate granularity) Want their complaints to be taken seriously -provide credible and actionable reports Want the problem fixed -detailed debugging tools -this is traditional focus of work in this area

31 Vision Incorporate coherent set of monitoring tools into architecture that: -record necessary information -process information to answer relevant questions Key points: -not just statistics (e.g., Netflow), but answers -focus broader than just detailed diagnostics Three examples

32 Ex. #1: Monitoring ISPs Monitor boxes on peering links record packet digests -no internal information revealed Boxes exchange information to determine where packets are dropped and/or delayed Information ends up at source ISP or end user Overhead: ~2-4% of packet bandwidth Can be applied within enterprises, etc.

33 Ex. #2: Multilayer Tracing Traceroute is useful, but limited to IP XTrace (just started) is a generalized version: -operates at multiple layers -follows recursive packet generation (DNS queries, etc.) -can implement policies about when to respond Requirements: -layer must be able to handle and propagate metadata -module on box to intercept and report on packets

34 Ex. #3: Distributed Debugging When bugs occur in operation, it can be extremely difficult to locate and reproduce We are developing liblog, a log-and-replay debugging tool (early) that is always turned on Lots of log-and-replay debuggers, ours meets a special set of requirements....(not described here)

35 1 2 34 56 7 8 9 Logging and Replay 1. Each process logs its execution to a local file 2. Logs are collected at central location and replayed Node 1 Node 2 Log 2 Node 3 Log 3 app liblog Replay Node app liblog app liblog Log 2 Log 3 Log 1 console GDB app/liblog

36 Extensions liblog generates too much data -hard to sift through for large systems Next step: setting global watchpoints and breakpoints Can specify in terms of general expressions (python) -routing loops, state inconsistencies, etc. No operational experience yet

37 Troubleshooting and Debugging Automated end-user reporting tools would be useful to both users and ISPs -lots of low-hanging fruit Not clear ISPs will take the lead on troubleshooting -ISPs may not be eager to admit fault -but they should be eager to reduce phonebank expenses Experience needed with distributed debugger in networking context

38 Summary Biggest challenge is to get community talking to each other rather than past each other Reliability more pressing than functionality -have tools to provide better packet delivery -then considered wider set of failure modes -can handle without IP/router involvement Troubleshooting should be part of “architecture” -nowhere near coherent yet -looking for basic building blocks

1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki,

Similar presentations

Presentation on theme: "1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki,

Similar presentations

Presentation on theme: "1 Rethinking the Internet Architecture Process, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki,"— Presentation transcript:

Similar presentations

About project

Feedback