Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet Quality-of-Service (QoS) Henning Schulzrinne Columbia University Fall 2003.

Similar presentations


Presentation on theme: "Internet Quality-of-Service (QoS) Henning Schulzrinne Columbia University Fall 2003."— Presentation transcript:

1 Internet Quality-of-Service (QoS) Henning Schulzrinne Columbia University Fall 2003

2 Quality of Service Motivation Service availability Elementary queueing theory Traffic characterization & control Integrated services (RSVP, NSIS) Differentiated services (DiffServ)

3 What is quality of service? Many applications are sensitive to the effects of delay (+ jitter) and packet loss – may have floor below which utility drops to zero The existing Internet architecture provides a best effort service. – All traffic is treated equally (generally, FIFO queuing) – No mechanism for distinguishing between delay sensitive and best effort traffic Original IP architecture (IPv4) has TOS (type-of- service byte) in packet header – RFC 795: defined multiple axes (delay, throughput, reliability) – rarely used outside some (rumor) military networks utility ($) bandwidth

4 Motivation QoS service availability – not good enough if all but 2 minutes of my phone call sound perfect Support mission-critical applications that cant tolerate disruption – VoIP – VPNs (LAN emulation) – high-availability computing Charge more for business applications vs. consumer applications

5 Service availability Users do not care about QoS at least not about packet loss, jitter, delay rather, its service availability how likely is it that I can place a call and not get interrupted? availability = MTBF / (MTBF + MTTR) – MTBF = mean time between failures – MTTR = mean time to repair availability = successful calls / first call attempts – equipment availability: % (5 nines) 5 minutes/year – AT&T (2003): – Sprint IP frame relay SLA: 99.5% Long-distance voice99.978% ATM data99.999% Frame relay data99.998% IP99.991%

6 Availability – PSTN metrics PSTN metrics (Worldbank study): – fault rate should be less than 0.2 per main line – fault clearance (~ MTTR) next business day – call completion rate during network busy hour varies from about 60% - 75% – dial tone delay

7 Example PSTN statistics Source: Worldbank

8 Measurement setup Node nameLocationConnectivityNetwork columbiaColumbia University, NY>= OC3I2 wustlWashington U., St. LouisI2 unmUniv. of New MexicoI2 epflEPFL, Lausanne, CHI2+ hutHelsinki University of TechnologyI2+ rrNYCcable modemISP rrqueensQueens, NYcable modemISP njcableNew Jerseycable modemISP newportNew JerseyADSLISP sanjoseSan Jose, Californiacable modemISP sunaKitakyushu, Japan3 Mb/sISP shShanghai, Chinacable modemISP ShanghaihomeShanghai, Chinacable modemISP ShanghaiofficeShanghai, ChinaADSLISP

9 Measurement setup Active measurements call duration 3 or 7 minutes UDP packets: – 36 bytes alternating with 72 bytes (FEC) – 40 ms spacing September 10 to December 6, ,500 call hours

10 Call success probability 62,027 calls succeeded, 292 failed 99.53% availability roughly constant across I2, I2+, commercial ISPs All99.53% Internet299.52% Internet % Commercial99.51% Domestic (US)99.45% International99.58% Domestic commercial 99.39% International commercial 99.59%

11 Overall network loss PSTN: once connected, call usually of good quality – exception: mobile phones compute periods of time below loss threshold – 5% causes degradation for many codecs – others acceptable till 20% loss0%5%10%20% All ISP I I US Int US ISP Int. ISP

12 Network outages sustained packet losses – arbitrarily defined at 8 packets – far beyond any recoverable loss (FEC, interpolation) 23% outages make up significant part of 0.25% unavailability symmetric: A B B A spatially correlated: A B A X not correlated across networks (e.g., I2 and commercial)

13 Network outages

14 no. of outages % symmetric duration (mean) duration (median) total (all, h:m) outages > 1000 packets all10,75330% :2010:58 I %360253:172:33 I2+2,70810%259267:475:37 ISP8,04537%107249:334:58 US1,77718%269205:183:53 Int.8,97633% :026:42

15 Outage-induced call abortion probability Long interruption user likely to abandon call from E.855 survey: P[holding] = e - t/17.26 (t in seconds) half the users will abandon call after 12s 2,566 have at least one outage 946 of 2,566 expected to be dropped 1.53% of all calls all1.53% I21.16% I2+1.15% ISP1.82% US0.99% Int.1.78% US ISP0.86% Int. ISP2.30%

16 Conclusions from measurement Availability in space is (mostly) solved availability in time restricts usability for new applications initial investigation into service availability for VoIP need to define metrics for, say, web access unify packet loss and no Internet dial tone far less than 5 nines working on identifying fault sources and locations looking for additional measurement sites

17 Whats next? Existing SLAs are mostly useless – too many exceptions – wrong time scales: month vs. minutes – no guarantees for interconnects Existing measurements similarly dubious Limited ability to learn from mistakes – what are the primary causes of service unavailability? – what can I do to protect myself – multi-homing via same fiber? diverse access mechanisms? Consumers of services have no good ways to compare service availability – only some very large customers may get access to carrier-internal data Thus, market failure Need published metrics – similar to switch availability reporting

18 What's hard to scale (and not) Signaling does not have be hard: – one message, on a reliable peering channel or IP router alert option – NSIS effort in the IETF? YESSIR: RTCP-based signaling – 700 MHz Celeron processor – 10,000 flow setups/second 300,000 softstate flows If scaling matters, sink-tree based reservation (BGRP)

19 Diversity is good Unlike routing, no need for single signaling protocol: – multicast is much harder – dumb end devices – edge "pop-up" only show up in edge nodes

20 AAA Signaling can easily be done in ASIC (no harder than IP), but – need cryptographic verification of request – need interface to Authentication, Authorization, Accounting (AAA) – cross-domain authentication hard, but 3G networks will do it anyway – easier if both sides ask their own access router – see also: iPass for dial-up, OSP (open settlement protocol)

21 AAA example AR1AR2 Internet source destination signs request reserves for both directions Cell phone model: both sides pay

22 Reservation scaling Example: every long-distance call in the US uses VoIP with per-flow resource reservation 2000: billion 10 minutes each 1,800 calls/second single mySQL server can sustain 5002,000 queries+updates/second

23 Business models don't work Most of the time, "tin" service is no worse than "platinum" service – can't impress others with platinum AmEx card – no frequent flyer bonuses everybody switches only when the network is in bad shape

24 Resource control & reservation Reservation Protocol Application Admission Control Packet Scheduler Classifier & route selection Data QoS queuing Routing Protocols & DBs Best-effort queuing Traffic Control DB Tspec Y/N USC EE-S 555

25 RED (Random Early Detection) TCP synchronization effect during overload, many connections lose packets and go into slowstart RED: start dropping based on average queue occupancy (vs. instantaneous queue occupancy) Parameter setting critical and non-trivial See also RFC 2309

26 ECN (Explicit Congestion Notification) Extension of RED: mark instead of drop RFC 2481 (A Proposal to add Explicit Congestion Notification (ECN) to IP) IP TOS6 bit indicates congestion: ECN IP TOS7 bit indicates support for mechanism Needs cooperation of TCP (or similar protocols) TCP should act almost as if packet was dropped – ½ congestion window – but dont do slow-start ECT=1 ECN=1 ECT=1 ECN=0 TCP ACK: ECN echo

27 Next steps in signaling (NSIS) RSVP not widely used for resource reservation – but is used for MPLS path setup – design heavily biased by multicast needs – marginal and after-the-fact security – limited support for IP mobility Thus, IETF NSIS working group developing new framework for general state management protocol – resource reservation – NAT and firewall control – traffic and QoS measurement – MPLS and lambda path setup Split into two components: – NSLP: services – NTLP: transport

28 NSIS On-path vs. off-path – off-path bandwidth brokers Discovery of next NTLP or NSLP hop – use router alert option UDPTCPSCTP NTLP QoSNAT/FWmeasure


Download ppt "Internet Quality-of-Service (QoS) Henning Schulzrinne Columbia University Fall 2003."

Similar presentations


Ads by Google