Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Jonathan Rosenberg Chief Technology Strategist Skype

Similar presentations

Presentation on theme: "Dr. Jonathan Rosenberg Chief Technology Strategist Skype"— Presentation transcript:

1 Dr. Jonathan Rosenberg Chief Technology Strategist Skype
Understanding VoIP Dr. Jonathan Rosenberg Chief Technology Strategist Skype

2 What is this course about?
Getting “under the hood” and understanding how VoIP works An exploration of the protocols and technologies behind VoIP Conveying an understanding of the various problems that need to be solved for VoIP to work

3 What this course is not about
A general introduction to telephony A detailed cookbook or deployment guide to VoIP A product survey of VoIP and IP telephony products In particular, Cisco or Skype products are not discussed except in passing

4 Ground Rules Ask Questions ANY TIME!
I will be bored if this is a one way conversation No question is too stupid Laughing or mocking anyones questions is unacceptable Please ask off-the-wall or exploratory questions – there is a lot that is not in here!

5 Agenda Breaking up the problem Voice and Video coding
Voice and Video Transport Quality of Service Signaling Security NAT Traversal

6 Non-Agenda Programming APIs Emergency Services, Lawful Intercept
Numbering, Routing, Naming (ENUM, TRIP) PSTN Interworking Billing, Provisioning, OAM Conferencing, IVR, Applications

7 Breaking Up the Problem
Application Server IP Directories Databases LDAP, ENUM SIP Accounting Billing Signaling Servers Presence Servers Media Servers RADIUS DIAMETER OAM IP Network SIP, H.323, MGCP,H.248 SIMPLE, XMPP Endpoint Endpoint RTP

8 Voice Coding

9 Voice Endpoint Model No Speech + Nonlinear Processing Speech Encoding
Packetizer Speech - DTMF/ Tone Detection Silence Detection Hybrid Echo Canceller Loss Admin Speech Decoding Unpacker DTMF/ Tone Generation Comfort Noise Generation 2-wire interface

10 Codecs Waveform codecs: Source codecs / vocoders:
Directly encode speech in an efficient way by exploiting temporal and/or spectral characteristics Attempt to reproduce input signal’s waveform by minimizing error between input and coded signals Source codecs / vocoders: Estimate and efficiently encode a parametric representation of speech

11 CELP Minimizes perceptually weighted error
similar to waveform coders Short-term predictor is LP (vocal tract) filter Excitation is obtained from codebook and long-term pitch predictor Closed-loop search is MIPS intensive

12 Codec Comparison Codec Sampling Bitrate Latency Comments G.711 8 Khz
64 kbps 125 us PSTN Codec G.729 8 kbps 10ms CS-ACELP G.723.1 5.3/6.3 kbps 37.5ms AMR 4.75 – 12 kbps 25ms GSM codec G.722.1 16 Khz 24/32kbps 40ms Polycom SIREN AMR-WB kbps GSM Wideband – encumbered SILK 8, 12, 16, 24 Khz (SWB) 6-40kbps Skype codec Listen at:

13 Echo Cancellation ERL: Echo Return Loss (dB)
ERLE: Echo Return Loss Enhancement Double-talk Convergence time Analog ERLE + Non-Linear Processor Reflection - Echo Path Estimation ERL Packet Network 2-4-wire Hybrid Echo Canceller Digital This echo canceller cancels ‘local’ echoes from the hybrid reflection

14 Echo Canceller Specifics
The voice echo path is like an electrical circuit If a ‘break’ (cancellation) is made anywhere in the ‘circuit’, you will eliminate the echo The easiest place to make the break is with a canceller ‘looking into’ the local analog/digital telephony network, NOT the packet network (which has much longer and variable delays) The echo canceller at the other end of the call eliminates the echoes that YOU hear, and vice versa Echo canceller coverage (e.g. 32 ms) is the maximum length of echo impulse response that can be cancelled from the local analog/digital network (the packet network delay does not matter) The non-linear processor is used to ‘clean-up’ any residual echo left over from the canceller

15 Voice Activity Detection
Speech Magnitude (dB) Speech Detected Hang-Over Speech Detected Hang-Over Typically fixed at 200 ms Sentence 1 Sentence 2 Signal-to- Noise Threshold Noise Floor time Front-end Speech Clipping Front-end Speech Clipping

16 Comfort Noise Generation
Silence isn’t golden…it’s annoying When speech stops…what do you play to the listener? Simple techniques: Play white/pink noise Replay last receiver packet over and over Fancier technique: Transmitter measures local “noise environment” Transmitter sends special “comfort noise” packet as last packet before silence Receiver generates noise based CN packet.

17 Voice Quality: Mean Opinion Scores
Source Channel Simulation Impairment Codec ‘X’ 1 2 3 4 5 “Nowadays, a chicken leg is a rare dish” Rating Speech Quality Distortion 5 Excellent Imperceptible 4 Good Just perceptible but not annoying 3 Fair Perceptible and slightly annoying 2 Poor Annoying but not objectionable 1 Unsatisfactory Very annoying and objectionable 1 2 3 4 5 MOS of 4.0 = Toll Quality

18 Clear Channel MOS’s Mean Opinion Score 5 4.1 3.8 3.9 3.9 4 3.4 3 2 1
G.711 (64 kbit/s PCM) G.726 (32 kbit/s ADPCM) G.723.1 (6.4 kbit/s MP- MLQ) G.729 (8 kbit/s CS-ACELP) IS-54 (8 kbit/s NA Dig Cellular)

19 MOS Under Varying Conditions

20 Video Coding

21 Key Terms Term Description Frame
An individual picture in a sequence that makes up the video Frame Rate The number of frames per second in video. 30 is excellent (TV quality) Resolution The number of horizontal and vertical pixels. VGA=640x480. Interlacing A mechanism for transmitting video by splitting a frame into two fields, one field representing the odd lines, and one the even field. This is the “i” in 1080i Progressive As opposed to interlaced, a method for transmitting video by sending each frame as a whole. HD High Def resolutions – 720p is 1280x720 with 60fps. 1080i is 1920x1080 at 30fps

22 Key Concept: Macroblocks
Rectangular block in an image which is a basic unit of compression. Typically 16x16 pixels.

23 Key Concept: Inter-Frame Prediction
Encode Predict information in the current frame by looking at previous frames, possibly taking into account motion.

24 Key Concept: Discrete Cosine Transform (DCT)
Increasing horizontal frequencies A technique for representing a macroblock by its component frequencies. Discarding the higher frequencies throws away the finer details without losing the core image. Increasing vertical frequencies

25 Video Encoder Block Diagram

26 Key Codec Comparisons Codec Timeline Applications H.261 1990
ISDN at multiples of 64kbps H.263 1996 Early Flash using Sorenson Spark implementation. Original RealVideo codec. Required in IMS. H.264 –AVC 2003 Youtube, iTunes, Blu-ray; most modern video conferencing. The current primary video codec for real-time. Typical VGA 15fps bitrate = 500kbps H.264-SVC 2007 “Layered” video that provides improved quality and resilience; ideal for multiparty video conferencing. VP7 2005 On2 Technologies codec; Skype, successor to H263 in Flash

27 Voice and Video Transport: RTP

28 RTP: What is it? What does it do (cont) Real Time Transport Protocol
RFC 3550 product of avt working group 1996 proposed standard – RFC1889 2004 full standard What does it do e2e transport of real time media optimized for multicast provides sequencing, timing, framing, loss detection provides feedback on reception quality What does it do (cont) provides information on group members provides data to correlate audio and video and other media Works with any codec need payload format for each codec Flexible

29 RTP: What isn’t it? Doesn’t guarantee quality of service
doesn’t reserve network resources doesn’t guarantee no loss or bounded delay can work with QoS protocols (RSVP) Doesn’t provide signaling other protocols must be used to set up RTP (like SIP or H.323) Not a specific protocol type Does not run directly ontop of IP Runs ontop of UDP No fixed port number


31 Big Picture: RTP, SDP and SIP
C=IN IP m=audio RTP/AVP m=video RTP/AVP a=rtpmap:98 h263 Proxy Proxy SIP w/ SDP IP Network End User End User RTP

32 RTP Components: Data + Control
Data aka RTP very confusing Usually on an even UDP port (NATs change this – later) Provides sequencing timing framing content labeling User identification Control = Real Time Control Protocol (RTCP) Same address as data, but one higher port usually Provides reception quality sender statistics participant information (multicast) synchronization information

33 Real Time Data Transport
Originator breaks stream into packets (segmentation) application layer framing (ALF)!!! Packets sent; network may lose, delay, reorder packets Must, at receiver: reorder recover resegment rescynchronize clock synchronization! RTP Source RTP Packets RTP Sink

34 Transport System Source Sink Digitize Audio from mike
Silence Suppression Echo cancellation Compress Audio G.711: 64 kbps G.729: 8 kbps G.723.1: 5.3/6.3 kbps Packetize Audio in RTP Send Sink Receive packets Un-packetize decompress comfort noise generation reorder recover loss jitter buffer A/D conversion to speakers

35 Jitter Buffer Packets delayed differently
Must play them out periodically Packets may arrive after designated playout time -> loss Insert extra delay to compensate May need to adapt this amount pkts time

36 RTP Packet Header |V=2|P|X| CC |M| PT | sequence number | | timestamp | | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | |

37 RTP Header Fields Version: 2 P: indicates padding (for encryption)
X: extension bit CSRC count: for mixers (later) M: Marker Bit: indicates framing audio codecs: first packet in talkspurt video: last packet in frame Payload Type: indicates encoding in RTP packet allows changes per-packet Useful for: adaptation DTMF codec silence codecs SN: defines ordering of packets Timestamp: when packet was generated SSRC: identifier CSRC: list of mixed users

38 RTP Timestamp Tick units are dependent on codec Video Speech
For speech: 125 microseconds (standard 8 khz sampling rate) For video: 90 KhZ For audio: 44.1 KhZ (CD rate) Gaps in TS, but not in SN mean silence Initial value random for security Video Timestamp represents time at beginning of frame Many packets may have same timestamp Speech Time per packet may vary Depends on packetization: ms typical

39 Payload Formats Payload format defines
Each codec needs a way to be encapsulated in RTP RFC3550 defines mechanisms for many common codecs G.711, G.729, G.723.1, G.722, etc. Some simple video More complex codecs have their own payload format documents MPEG H.263 and H.261 Payload format defines How to break frame into packets extra fields needed below main RTP header

40 Advanced Topics DTMF and Tones Compressed RTP RFC 2833
Special codecs for encoding touch tones (DTMF) and other signals Can send either the waveform (frequency, amplitude) Or the actual signal (#, 8, 0) Compressed RTP RFC 2508 For dialup links Don’t send header, just send index Far side uses index to retrieve header, and then increments certain fields

41 Quality of Service

42 In other words, QoS is Managed Unfairness
Quality of Service The problem we are trying to solve is to give “better” service to some at the expense of giving worse service to to others — QoS fantasies to the contrary, it’s a zero sum game - Van Jacobson In other words, QoS is Managed Unfairness

43 Quality of Service So, what’s the problem? Private Network
Toll Quality Early I-Phone Technologyy Improving I-Phone means: • Lower PC Delay • Lower Network Latency • Tighten Network Jitter Satellite Zone CB Fax Relay, Broadcast Private Network VoFR & VoIP Technology

44 Delay Budget “The Network” Device sample capture
Encode delay (algorithmic delay + processing delay) Packetization/framing Move to output queue/queueing delay Access (up) link transmission Backbone network transmission Access (down) link transmission Input queue to application Jitter buffer Decode processing delay Device playout delay “The Network”

45 Some Techniques to Improve “Network QoS”
RED — Random Early Drop (or “Detect”) WFQ — Weighed Fair Queuing Intserv/RSVP — ReSerVation Protocol IP Precedence  DiffServ CRTP — Compressed Realtime Transport Protocol MCML — Multi-Class Multi-Link PPP

46 Random Early Detect (RED) this is Basic Hygiene!
Objectives Keep average queue size low – good for voice Fairness – bigger streams punished more Avoid synchronization Only works with loss responsive transport protocols Algorithm – probabilistic dropping of packets 1 Drop Probability Min Max Queue Size

47 Poll: Will RED Help Voice?
Yes No Voice not loss responsive Mixing voice and data in same queue bad Voice queues usually not congested

48 Weighted Fair Queueing
Each flow “sees” a dedicated amount of bandwidth Bj A packet arriving at time t is transmitted at time t+size/Bj B B1 B2 B3 B = B1 + B2 + B3

49 Whats the Problem?? WFQ is unrealizable because Example:
Variable packet sizes Causality Example: Link speed 100Kbps Flow 1: 10Kbps Flow 2: 90Kbps 8.8ms Theory 128ms Actual 1500 1500 100 100

50 Approximations of WFQ Algorithms
Many PhDs written with approximate and implementable algorithms Algorithms differ in their delay bound How much worse than perfect WFQ is this? Delay bounds a function of bandwidth, number of queues, other params Algorithms SCFQ: Self-Clocked Fair Queueing WF2Q: Worst-Case Fair Weighted Fair Queueing FBFQ: Frame-Based Fair Queueing PGPS: DRR:

51 WFQ Voice Configuration
How to pick allocated bandwidth? Consider G.711, 30ms framing (74.6Kbps) If Bi = 74.6kbps, delay is at least 30ms If Bi = 149.2Kbps, delay at least 15ms Must set voice queue bandwidth at least 2x actual voice usage to keep delays down! Unused bandwidth will go to data Need an accurate WFQ Implementation

52 Priority Queueing Emulates the familiar “elite airport line” experience Voice and data packets in separate queues If there is any packets in voice queue, they are serviced Server Voice Data

53 Priority Queueing Considerations
Easy to configure – no bandwidth values required Main problem – data starvation Need to police voice queue Doesn’t work as well when there is other non-voice high priority traffic (video) Head-of-Line Blocking from data queue

54 Intserv: Integrated Services
Guaranteed Service (RFC 2212) Mathematically provable bounds on end-to-end datagram queuing delay/bandwidth Controlled Load Service (RFC 2211) Approximate QoS from an unloaded network for delay/bandwidth Describe traffic with a “TSPEC” r= token bucket rate b= token bucket depth p= peak transmission rate m= minimum (policed) packet size M= maximum packet size Describe endpoints with a « FlowSpec » Source/Destination IP addresses, ports, protocol RSPEC/FSPEC provides the policy to the queuing/scheduling algorithms

55 RSVP Design Signaling distinct from routing (modularity, deployability, evolvability) Soft state (robustness, simplicity) Transparent operation across non-RSVP routers (deployability) Support shared and distinct reservations Applies to unicast & multicast applications Simplex & receiver-oriented.

56 RSVP protocol PATH : Source  Destination RESV: Source  Destination
Src Dest. resv PATH : Source  Destination Traffic parameters of source Collects info on network capabilities Detects current route RESV: Source  Destination Receiver selected Int-Serv service Traffic parameters of receiver selected reservation Follows route detected by PATH Reservation actually nailed in network RSVP messages carried over IP Can also be carried over UDP but few people do that 55 40

57 RSVP: Admission Control
Flow Request Routing Routing Protocol Admission Control Reservation Protocol Resource Utilization Database Queuing Policy Database Routing Database Switching Interface 1 Packets Out Packet Scheduler Packets In Route Selection Interface N Packets Out Packet Scheduler

58 Intserv/RSVP Acceptance
Enthusiasm Intserv/RSVP will solve the world’s QoS Cool thing to say: “RSVP does not scale” vBNS RSVP over ATM transparently transport RSVP Real value RSVP for VoIP in Enterprise Today ISP Today Enterprise Time

59 IP Precedence & Diffserv
“Poor man’s” approach to QoS Set IP Precedence/DSCP higher on voice packets This puts them in a different queue, resulting in isolation from best effort traffic Can be done by endpoint, proxy, or in routers through heuristics Scales better than RSVP – Keeps QoS control “local” Pushes work to the edges and boundaries Can provide bulk QoS by customer or network No admission control Too much high-precedence traffic can still swamp the network

60 Diffserv Architectural Model
Clouds — regions of relative homogeneity: Administrative control Technology Bandwidth Within a cloud, QoS managed by local rules Hard work confined to boundaries of clouds: Classification Conditioning/Policing QoS information exchange limited to boundaries Bi-lateral, not multi-lateral Not necessarily symmetric Me Not Me Also Not Me Far Away

61 Diffserv Scalability Fundamental assumptions:
Relatively small number of feasible queuing/scheduling algorithms for high link speeds Number of individual flows is large Many different rules, often policy driven Group packets explicitly by the “Per-hop behavior (PHB)” they are to get Queue service Shaping/policing Nodes in the middle of a cloud only have to deal with traffic aggregates

62 Diffserv Forwarding via PHBs
PHBs map to DSCPs (Diffserv Code Points) Values chosen for backward-compatibility with IPv4 TOS byte including IP Precedence (RFC 2474) Packets with different DSCPs may be re-ordered Forwarding resources partitioned by PHB/DSCP

63 Assured Forwarding PHB (AF*)
Four independent classes Within each class, three levels of drop precedence A congested AF node discards packets with higher drop preference first Packets with lowest drop preference must be within the subscribed profile *RFC2597

64 Expedited Forwarding PHB (EF*)
Targeted at VoIP and “virtual leased lines” Roughly equivalent to priority queuing, with a safety measure to prevent starvation Implications: No more than 50% of a link can be EF see RFC3247,3248 for interesting mathematical analyses Worst case jitter at each hop is max of: number of EF microflows in the aggregate, or a single MTU packet of some other aggregate *RFC3246

65 Diffserv Traffic Conditioner
Meter Shaped Classifier Marker Shaper / Dropper Dropped Packets Classifier: selects a packet in a traffic stream based on the content of some portion of the packet header Meter: checks compliance to traffic parameters (e.g. Token Bucket) and passes result to marker and shaper/dropper to trigger particular action for in/out-of-profile packets Marker: writes/rewrites DSCP Shaper: delay some packets for them to be compliant with the profile

66 Diffserv Acceptance Enthusiasm Real value today Time
Diffserv will solve the world’s QoS Inter-SP Diffserv and end-to-end Internet QoS need further standardisation and commercial arrangements Diffserv Engineering? Diffserv SLA ? Internet e2e SLA? Real value Diffserv Design & Deployment intra Domain today Time

67 Mixing Intserv & Diffserv: Aggregation
Host signals with RSVP Edge or transit domains Aggregate reservations mark packets using DSCP In transit domains Blindly transfer end to end reservations using another IP Protocol Number - change at edge Routers detect egress of reservation (deaggregation) on transfer from an interior or aggregator interface to an exterior (deaggregating) interface Aggregate reservation size varies with load Edge Backbone

68 RTP Compression 20ms @ 8kbit/s yields 20 byte payload
IP header 20; UDP header 8; RTP header 12 Twice size of payload! Header compression: 40 bytes to 2-4 most of the time Hop-by-hop: use only on the slow links

69 Sample Delay Budget (G.711 - 64kbps)

70 Sample Delay Budget (G.729 - 8kbps)

71 Signaling: SIP

72 SIP is one of Many ITU H.323 MGCP Megaco/H.248
Originally for video conferencing The first standard protocol for VoIP Still in wide usage, but negative growth MGCP Dumb phones controlled by smart server “Softswitch” – PSTN emulation view Megaco/H.248 Standard version of MGCP

73 Core SIP Functions Establishment of peer to peer sessions
Management of peer to peer sessions Keepalives Graceful and Non-graceful termination Rendezvous Forking Search Policy Based Routing Loose Routing Mobility Limited terminal mobility Device Mobility

74 Core SIP Functions Secure User Identification
Exchange and Management of Media Session data User registration Capability declaration Capability query Reliability

75 SIP Technology Community
RTP SDP ROHC Events 3265 O/A 3264 SIMPLE STUN SIP RFC3261 DNS 3263 Rel 3262 MIDCOM SigComp SIP Extensions ENUM

76 SIP Design Philosophy Patterned after other Successful Internet Standards HTTP Don’t Reinvent the PSTN General Purpose Functionality Do Not Dictate Architectures or Services It needs to work on any IP Network Leverage the Best of Existing Standards URLs MIME RFC822 Scalability Push state to the edge

77 Basic Design Request/Response Protocol
SIP is a Peer Protocol – all entities send requests and receive requests Modelled after HTTP Each request invokes method Main purpose of request Messages contain bodies request Agent Agent response

78 Transactions Fundamental unit of messaging exchange
Request Zero or more provisional responses Usually one final response Maybe ACK All signaling composed of independent transactions Identified by Cseq Sequence number Method tag INVITE 100 200 Cseq: 1 ACK First Transaction BYE Cseq: 2 200 Second Transaction

79 Session Independence Body of SIP message used to establish call describes the session Session could be Audio Video Game SIP operation is independent of type of session SIP Bodies are MIME objects MIME = Multipurpose Internet Mail Extensions Mechanisms for describing and carrying opaque content Used with HTTP and

80 Protocol Components User Agent End systems Hard and soft phones
PSTN Gateways Phone Adaptors Media Servers Anything that originates or terminates SIP calls Proxy SIP server responsible for relaying and processing requests between user agents Main job: where to send request next? Back-to-Back User Agent (B2BUA) SIP server that terminates and re-originates SIP SBCs, Call Agents, etc.

81 SIP Addressing SIP addresses are URL’s URL contains several components
Scheme (sip) Username Hostname Optional port Parameters Headers and Body SIP allows any URI type tel URIs http URLs for redirects mailto URLs leverage vast URI infrastructure user=host?Subject=foo

82 The SIP Trapezoid SIP RTP

Queries a participant about their media capabilities, and finds them, but doesn’t invite ACK For reliability and call acceptance REGISTER Informs a SIP server about the location of a user INVITE Invites a participant to a session idempotent - reINVITEs for session modification BYE Ends a client’s participation in a session CANCEL Terminates a search

84 SIP Architecture Request Response Media Corp DB
2 Corp DB 3 5 4 6 1 7 11 12 10 13 8 14 9

85 SIP Message Syntax Many header fields from http
Payload contains a media description SDP - Session Description Protocol INVITE SIP/2.0 From: J. Rosenberg ;tag=76ah Subject: Conference Call To: John Smith Via: SIP/2.0/UDP ;branch=z9hG4bK74bf9 Call-ID: Content-type: application/sdp CSeq: 4711 INVITE Content-Length: 187 v=0 o=user IN IP s=Sales c=IN IP t=0 0 m=audio 3456 RTP/AVP 0

86 SIP Address Fields Request-URI To From
Contains address of next hop server Rewritten by proxies based on result of Location Service To Address of original called party Contains optional display name From Address of calling party Optional display name INVITE SIP/2.0 From: J. Rosenberg ;tag=76ah Subject: Conference Call To: John Smith Via: SIP/2.0/UDP ;branch=z9hG4bK74bf9 Call-ID: Content-type: application/sdp CSeq: 4711 INVITE Content-Length: 187 v=0 o=user IN IP s=Sales c=IN IP t=0 0 m=audio 3456 RTP/AVP 0

87 SIP Responses Look much like requests Differ in top line
Headers, bodies Differ in top line Status Code Numeric, Meant for computer processing Protocol behavior based on 100s digit Other digits give extra info Reason Phrase Text phrase for humans Can be anything Status Code Classes (1XX): Informational (2XX): Success (3XX): Redirection (4XX): Client Error (5XX): Server Error (6XX): Global Failure Two groups : Provisional Not reliable : Final, Definitive Example 200 OK 180 Ringing

88 Example SIP Response Note how only difference is top line
Rules for generating responses Call-ID, To, From, Cseq are mirrored in response Branch parameter used as transaction ID Tag added to To field to identify dialog SIP/ OK From: J. Rosenberg ;tag=76ah To: John Smith ;tag=112 Via: SIP/2.0/UDP ;branch=z9hG4bK74bf9 Call-ID: Content-type: application/sdp CSeq: 4711 INVITE

89 SIP Transport Reliability mechanisms depend on SIP request method
INVITE anything except INVITE Reason: optimized for phone calls SIP Messages over UDP or TCP/TLS or SCTP Reliability mechanisms defined for UDP UDP More Widely Used Faster No connection state TCP preferred these days NAT Larger SIP messages

90 Registrations REGISTER creates mapping in server from one URI to another REGISTER properties UA location in Contact Registrar identified in Request URI Identifies registered user in To and From field Expires header indicates desired lifetime Can be different for each Contact Registrations are soft-state REGISTER SIP/2.0 To: From: Call-ID: CSeq: 123 REGISTER Contact: Expires: 3600 to

91 Registration Handling
Registrar is logical function handling REGISTER Registrar steps: Authenticate Authorize Add Binding Lower expiration Return all currently registered UA (can be more than one) SIP/ OK To: From: Call-ID: CSeq: 123 REGISTER Contact: Contact:

92 Forking A proxy may have more than one address for a user
Happens when more than one SIP URL is registered for a user Can happen based on static routing configuration In this case, proxy may fork Forking is when proxy sends request to more than one proxy at once First 200 OK that is received is forwarded upstream All other unanswered requests cancelled INVITE INVITE INVITE

93 Routing of Subsequent Requests
Initial SIP request sent through many proxies No need per se for subsequent requests to go through proxies Each proxy can decide whether it wants to receive subsequent requests Inserts Record-Route header containing its address For subsequent requests, users insert Route header Contains sequence of proxies (and final user) that should receive request Proxy INVITE Proxy BYE Proxy UA1 UA2

94 Setting up the Session SDP also conveys other information about session Time it will take place Who originated the session subject of the session URL for more information SDP origins are multicast sessions on the mbone Originator of INVITE is not originator of session INVITE contains the Session Description Protocol (SDP) in the body SDP conveys the desired session from the callers perspective Session consists of a number of media streams Each stream can be audio, video, text, application, etc. Also contains information needed about the session codecs addresses and ports

95 Anatomy of SDP SDP contains informational headers Time of the session
version (v) origin(o) - unique ID information (I) Time of the session Followed by a sequence of media streams Each media stream contains an m line defining port transport codecs Media Stream also contains c line Address information v=0 o=user IN IP s=Mbone Audio i=Discussion of Mbone Engineering Issues t=0 0 m=audio 3456 RTP/AVP 0 78 c=IN IP a=rtpmap:78 G723 m=video 4444 RTP/AVP 86 a=rtpmap:86 H263

96 Negotiating the Session
Called party receives SDP offered by caller Each stream can be accepted rejected Accepting involves generating an SDP listing same stream port number and address of called party subset of codecs from SDP in request Rejecting indicated by setting port to zero Resulting SDP returned in 200 OK Media can now be exchanged v=0 o=user IN IP t=0 0 m=audio 3456 RTP/AVP 0 c=IN IP m=video 0 RTP/AVP 86 Audio stream accepted, PCMU only. Video stream rejected

97 Changing Session Parameters
INVITE Once call is started, session can be modified Possible changes Add a stream Remove a stream Change codecs Change address information Call hold is basically a session change Accomplished through a re-INVITE Same session negotiation as INVITE, except in middle of call Rejected re-INVITE - call still active! 200 ACK INVITE 200 reINVITE ACK

98 Hanging Up How to hang up depends on when and who After call is set up
INVITE How to hang up depends on when and who After call is set up either party sends BYE request From caller, before call is accepted send CANCEL BYE is bad since it may not reach the same set of users that got INVITE If call is accepted after CANCEL, then send BYE From callee, before accepted Reject with 486 Busy Here 100 CANCEL 200 OK Hangup Accept 200 OK ACK BYE 200 OK S C

99 Call Flow for basic call: UA to proxy to UA
Call setup 100 trying hop by hop 180 ringing 200 OK acceptance Call parameter modification re-INVITE Same as initial INVITE, updated session description Termination BYE method INVITE INVITE 100 Trying 100 Trying 180 Ringing 180 Ringing 200 OK 200 OK ACK RTP BYE 200 OK

100 Privacy and Identity RFC 3325: A Private Extension for Asserted Identity in Trusted Networks RFC 3323: A Privacy Mechanism for SIP RFC 4474: SIP Identity

101 RFC3325 Asserted Identity Trust Domain INVITE P-Asserted-Identity:
Authenticates Caller and verifies identity. Adds PAID.

102 RFC3323 – SIP Privacy Anonymous Caller Trust Domain INVITE
P-Asserted-Identity: From: anonymous INVITE From: anonymous INVITE Privacy: id From: anonymous Anonymous Caller

103 4474: SIP Identity Only useful for user@domain addresses! Verifies
INVITE From: Identity: asd87f7as66sda8z INVITE From: Authenticates Caller and verifies identity. Signs Request. Verifies Signature Only useful for addresses!

104 Transfers and Dialog Movement: REFER (RFC 3515)
Alice 3 INVITE Bob Referred-By: Joe INVITE 1 REFER Refer-To: Bob 4 2 INVITE Joe Bob

105 Third Party Call Control (3pcc): RFC 3725

106 SIP and Quality of Service
RFC 3312: Integration of Resource Management with SIP Problem How to make sure phone doesn’t ring unless resources are reserved Solution SIP does not do resource reservation! SIP INVITE tells far side not to ring Both sides do regular QoS reservations RSVP PDP context activation UPDATE to change state INVITE w. Preconditions 183 Progress QoS Reservations UPDATE w. Preconditions 180 Ringing 200 OK ACK

107 Security

108 The only totally secure system I know of is a rock
VoIP Security The only totally secure system I know of is a rock - Tony Lauck, circa 1985

109 But Even Rocks can be Insecure..

110 It Had a Great User Interface

111 But it had a serious security vulnerability…
The Dark Lord could eavesdrop and barge in

112 VoIP Attacks Attack Solution Free Calls aka Toll Fraud
User Authentication Impersonation User Authentication, Secure Caller ID Learning Private Information (calling patters, PIN codes) SIP Encryption, Media Encryption Steal Calls DoS ICE, Others

113 SIP User Authentication
RTP We want this SIP server to authenticate this user and this SIP server to authenticate this user

114 SIP Digest Authentication
Digest= Hash(joe, a7szh1, myPassword) = z0v88a6 Hi, I’d like to SIP REGISTER 401 – OK, try again. Nonce=a7szh1 REGISTER Nonce=a7szh1 Username=joe Digest=z0v88a6 OK, done! Digest= Hash(joe, a7szh1, myPassword)

115 Offline Dictionary Attack
Digest= Hash(joe, a7szh1, alligator) = REGISTER Nonce=a7szh1 Username=joe Digest=z0v88a6 OK, done! Word Hash(joe, a7szh1,word) Aardvark 9z8v77a Abacus lkf88z7 Abate z77x ……. Alligator z0v88a6 Digest= Hash(joe, a7szh1, alligator)

116 Solution: Digest over TLS
Digest= Hash(joe, a7szh1, alligator) = TLS Armor This is how Web Security works! Digest= Hash(joe, a7szh1, alligator)

117 Even Stronger: Mutual TLS for Devices TLS Armor MAC 8x7a6 Phone has a Certificate which identifies it

118 SIP Encryption RTP We want each SIP hop to be
Encyprted so only the SIP servers and endpoints see the signaling.

119 SIP Encryption: TLS RTP Mutual TLS Authentication

120 Media Encryption Countermeasure against: Two useful techniques
Eavesdropping Barge-in Modification Two useful techniques IPSEC SRTP Complications Key management Legal intercept (who has the keys) Firewall and NAT issues (covered later)

121 Alternative: Secure RTP
Authentication and encryption of RTP and RTCP packets V P X CC M PT sequence number timestamp synchronization source (SSRC) identifier contributing sources (CCRC) identifiers … RTP extension (optional) RTP payload SRTP MKI -- 0 bytes for voice Authentication tag -- 4 bytes for voice Encrypted portion Authenticated portion

122 SRTP Advantages Disadvantages
Provides both Privacy via encryption and authentication via message integrity check Very little bandwidth overhead Does not break header compression schemes like cRTP For very low-rate channels (e.g. cellular) can sacrifice authentication and have no packet expansion. Uses modern strong crypto suites: AES counter mode for encryption and HMAC for message integrity Disadvantages Needs key management End-to-end versus hop-by-hop trust tradeoffs in protecting keys Yet another security mechanism to ensure is implemented and deployed correctly

123 NAT Traversal

124 What is NAT? Network Address Translation (NAT)
Creates address binding between internal private and external public address Modifies IP Addresses/Ports in Packets Benefits Avoids network renumbering on change of provider Allows multiplexing of multiple private addresses into a single public address ($$ savings) Maintains privacy of internal addresses S: :6554 D: :80 S: :8877 D: :80 IP Pkt IP Pkt N A T Client N A T Binding Table Internal External :6554 -> :8877

125 Problem: Getting SIP Through NATs
RTP to N A T INVITE m=audio 3456 RTP/AVP 0 c=IN IP

126 Solution Space Application Layer Gateways (ALGs)
Session Border Controllers (SBC) Simple Traversal of UDP Through NAT (STUN) Traversal Using Relay NAT (TURN) Interactive Connectivity Establishment (ICE)

127 Application Layer Gateway
RTP to N A T INVITE m=audio 3456 RTP/AVP 0 c=IN IP INVITE m=audio 1234 RTP/AVP 0 c=IN IP ALG NAT also modifies SIP messages to fix them up!

128 ALG Benefits and Drawbacks
Doesn’t work when security turned on Hard to diagnose problems Requires network upgrade to support new app Frequent implementation problems (lack of expertise) Incentives mismatched Benefits No change to clients or servers

129 Session Border Controller
INVITE m=audio 3456 RTP/AVP 0 c=IN IP N A T INVITE m=audio 3225 RTP/AVP 0 c=IN IP SBC SBC relays RTP back to source RTP to

130 SBC Benefits and Drawbacks
Expensive media relaying Interferes with some SIP extensions Breaks more advanced SIP security Benefits No change to clients or NATs Works with basic SIP security mechanisms Easier to diagnose

131 Simple Traversal of UDP Through NAT (STUN)
STUN Server N A T What is my IP address and port please? Its : 3472 INVITE m=audio 3472 RTP/AVP 0 c=IN IP RTP to

132 STUN Benefits and Drawbacks
Doesn’t always work Benefits No change to servers or NATs Works with all SIP security mechanisms Can support non-VoIP apps (e.g., games)

133 Traversal Using Relay NAT (TURN)
TURN Server N A T Give me an IP address and port please? RTP to : 2376 INVITE m=audio 2376 RTP/AVP 0 c=IN IP

134 TURN Benefits and Drawbacks
Expensive Media Relaying Benefits No change to servers or NATs Works with all SIP security mechanisms Can support non-VoIP apps (e.g., games)

135 Interactive Connectivity Establishment (ICE)
Hybrid of STUN and TURN P2P NAT Traversal Widely Deployed on Internet Popular with Application Providers

136 ICE Step 1: Allocation Before Making a Call, the Client Gathers Candidates Each candidate is a potential address for receiving media Three different types of candidates Host Candidates Server Reflexive Candidates (STUN) Relayed Candidates (TURN) TURN candidates reside on a TURN server STUN TURN STUN candidates are addresses residing on a NAT NAT NAT Host Candidates reside on the agent itself

137 ICE Step 2: Create Offer Each candidate is placed into an a=candidate attribute of the offer Each candidate line has IP address and port plus other info needed for ICE c=IN IP t=0 0 m=audio RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:1 1 UDP typ host a=candidate:2 1 UDP typ srflx raddr rport 8998

138 ICE Step 3: Send INVITE Caller sends a SIP INVITE as normal
No ICE processing by SIP servers SIP Server INVITE

139 ICE Step 4: Allocation Called party does exactly same processing as caller and obtains its candidates Recommended to not yet ring the phone! STUN TURN NAT NAT

140 ICE Step 5: Provisional Response
Callee sends a provisional response containing its SDP with candidates As with INVITE, no processing by proxies Phone has still not rung yet SIP Proxy 1xx

141 ICE Step 6: Verification
Each agent pairs up its candidates (local) with its peers (remote) to form candidate pairs Each agent sends a STUN-based ping on each pair, starting at highest priority If a response is received the check has succeeded and we know media can flow on that pair! TURN Server NAT TURN Server NAT 5 4 2 3 1

142 ICE Benefits and Drawbacks
Requires client changes Requires other side to support it Benefits Always Works No change to servers or NATs Works with all SIP security mechanisms Minimum Media Relaying Can support non-VoIP apps (e.g., games) Built-In Anti-DOS Eliminates Ghost Rings

143 That’s it! Questions?

144 Glossary

145 Glossary (2)

146 to contact me:
Thanks Enjoy Interop! to contact me:

Download ppt "Dr. Jonathan Rosenberg Chief Technology Strategist Skype"

Similar presentations

Ads by Google