Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding VoIP Dr. Jonathan Rosenberg Chief Technology Strategist Skype.

Similar presentations

Presentation on theme: "Understanding VoIP Dr. Jonathan Rosenberg Chief Technology Strategist Skype."— Presentation transcript:


2 Understanding VoIP Dr. Jonathan Rosenberg Chief Technology Strategist Skype

3 What is this course about? Getting “under the hood” and understanding how VoIP works An exploration of the protocols and technologies behind VoIP Conveying an understanding of the various problems that need to be solved for VoIP to work

4 What this course is not about A general introduction to telephony A detailed cookbook or deployment guide to VoIP A product survey of VoIP and IP telephony products  In particular, Cisco or Skype products are not discussed except in passing

5 Ground Rules Ask Questions ANY TIME! I will be bored if this is a one way conversation No question is too stupid Laughing or mocking anyones questions is unacceptable Please ask off-the-wall or exploratory questions – there is a lot that is not in here!

6 Agenda Breaking up the problem Voice and Video coding Voice and Video Transport Quality of Service Signaling Security NAT Traversal

7 Non-Agenda Programming APIs Emergency Services, Lawful Intercept Numbering, Routing, Naming (ENUM, TRIP) PSTN Interworking Billing, Provisioning, OAM Conferencing, IVR, Applications

8 Breaking Up the Problem Endpoint IP Network Signaling Servers Directories Databases Accounting Billing Presence Servers Media Servers OAM Application Server RTP IP SIP, H.323, MGCP,H.248 SIMPLE, XMPP SIP LDAP, ENUM RADIUS DIAMETER

9 Voice Coding

10 DTMF/ Tone Generation DTMF/ Tone Detection Hybrid Echo Canceller Loss Admin Nonlinear Processing + - Silence Detection Speech Encoding Packetizer No Speech Speech Unpacker Comfort Noise Generation Speech Decoding 2-wire interface Voice Endpoint Model

11 Codecs Waveform codecs:  Directly encode speech in an efficient way by exploiting temporal and/or spectral characteristics  Attempt to reproduce input signal’s waveform by minimizing error between input and coded signals Source codecs / vocoders:  Estimate and efficiently encode a parametric representation of speech

12 CELP Minimizes perceptually weighted error  similar to waveform coders Short-term predictor is LP (vocal tract) filter Excitation is obtained from codebook and long- term pitch predictor Closed-loop search is MIPS intensive

13 Codec Comparison CodecSamplingBitrateLatencyComments G.7118 Khz64 kbps125 usPSTN Codec G.7298 Khz8 kbps10msCS-ACELP G Khz5.3/6.3 kbps37.5ms AMR8 Khz4.75 – 12 kbps 25msGSM codec G Khz24/32kbps40msPolycom SIREN AMR-WB16 Khz kbps 25msGSM Wideband – encumbered SILK8, 12, 16, 24 Khz (SWB) 6-40kbps25msSkype codec Listen at:

14 Echo Cancellation Packet Network Echo Path Estimati on 2-4-wire Hybrid Non-Linear Processor + - Reflection Analog Digital Echo Canceller ERLE ERL This echo canceller cancels ‘local’ echoes from the hybrid reflection ERL: Echo Return Loss (dB) ERLE: Echo Return Loss Enhancement Double-talk Convergence time

15 Echo Canceller Specifics The voice echo path is like an electrical circuit  If a ‘break’ (cancellation) is made anywhere in the ‘circuit’, you will eliminate the echo  The easiest place to make the break is with a canceller ‘looking into’ the local analog/digital telephony network, NOT the packet network (which has much longer and variable delays) The echo canceller at the other end of the call eliminates the echoes that YOU hear, and vice versa Echo canceller coverage (e.g. 32 ms) is the maximum length of echo impulse response that can be cancelled from the local analog/digital network (the packet network delay does not matter) The non-linear processor is used to ‘clean-up’ any residual echo left over from the canceller

16 Voice Activity Detection Speech Magnitude (dB) Speech Detected Hang-Over Speech Detected Hang-Over time Sentence 1Sentence 2 Typically fixed at 200 ms Noise Floor Signal-to- Noise Threshold Front-end Speech Clipping Front-end Speech Clipping

17 Comfort Noise Generation Silence isn’t golden…it’s annoying  When speech stops…what do you play to the listener? Simple techniques:  Play white/pink noise  Replay last receiver packet over and over Fancier technique:  Transmitter measures local “noise environment”  Transmitter sends special “comfort noise” packet as last packet before silence  Receiver generates noise based CN packet.

18 MOS of 4.0 = Toll Quality Voice Quality: Mean Opinion Scores SourceImpairment Codec ‘X’ Channel Simulation “Nowadays, a chicken leg is a rare dish” RatingSpeech QualityDistortion 5ExcellentImperceptible 4GoodJust perceptible but not annoying 3FairPerceptible and slightly annoying 2PoorAnnoying but not objectionable 1UnsatisfactoryVery annoying and objectionable

19 Clear Channel MOS’s Mean Opinion Score 5 G.711 (64 kbit/s PCM) 4.1 G.726 (32 kbit/s ADPCM) G (6.4 kbit/s MP- MLQ) G.729 (8 kbit/s CS- ACELP) IS-54 (8 kbit/s NA Dig Cellular)

20 MOS Under Varying Conditions

21 Video Coding

22 Key Terms TermDescription FrameAn individual picture in a sequence that makes up the video Frame RateThe number of frames per second in video. 30 is excellent (TV quality) ResolutionThe number of horizontal and vertical pixels. VGA=640x480. InterlacingA mechanism for transmitting video by splitting a frame into two fields, one field representing the odd lines, and one the even field. This is the “i” in 1080i ProgressiveAs opposed to interlaced, a method for transmitting video by sending each frame as a whole. HDHigh Def resolutions – 720p is 1280x720 with 60fps. 1080i is 1920x1080 at 30fps

23 Key Concept: Macroblocks Rectangular block in an image which is a basic unit of compression. Typically 16x16 pixels.

24 Key Concept: Inter-Frame Prediction Encode Predict information in the current frame by looking at previous frames, possibly taking into account motion.

25 Key Concept: Discrete Cosine Transform (DCT) A technique for representing a macroblock by its component frequencies. Discarding the higher frequencies throws away the finer details without losing the core image. Increasing horizontal frequencies Increasing vertical frequencies

26 Video Encoder Block Diagram

27 Key Codec Comparisons CodecTimelineApplications H ISDN at multiples of 64kbps H Early Flash using Sorenson Spark implementation. Original RealVideo codec. Required in IMS. H.264 –AVC 2003Youtube, iTunes, Blu-ray; most modern video conferencing. The current primary video codec for real-time. Typical VGA 15fps bitrate = 500kbps H.264- SVC 2007“Layered” video that provides improved quality and resilience; ideal for multiparty video conferencing. VP72005On2 Technologies codec; Skype, successor to H263 in Flash

28 Voice and Video Transport: RTP

29 RTP: What is it? Real Time Transport Protocol RFC 3550  product of avt working group  1996 proposed standard – RFC1889  2004 full standard What does it do  e2e transport of real time media  optimized for multicast  provides sequencing, timing, framing, loss detection  provides feedback on reception quality What does it do (cont)  provides information on group members  provides data to correlate audio and video and other media  Works with any codec need payload format for each codec  Flexible

30 RTP: What isn’t it? Doesn’t guarantee quality of service  doesn’t reserve network resources  doesn’t guarantee no loss or bounded delay  can work with QoS protocols (RSVP) Doesn’t provide signaling  other protocols must be used to set up RTP (like SIP or H.323) Not a specific protocol type  Does not run directly ontop of IP  Runs ontop of UDP  No fixed port number


32 Big Picture: RTP, SDP and SIP End User End User Proxy IP Network SIP w/ SDP C=IN IP m=audio RTP/AVP m=video RTP/AVP a=rtpmap:98 h263 RTP

33 RTP Components: Data + Control Data aka RTP  very confusing Usually on an even UDP port (NATs change this – later) Provides  sequencing  timing  framing  content labeling  User identification Control = Real Time Control Protocol (RTCP) Same address as data, but one higher port usually Provides  reception quality  sender statistics  participant information (multicast)  synchronization information

34 Real Time Data Transport Originator breaks stream into packets (segmentation)  application layer framing (ALF)!!! Packets sent; network may lose, delay, reorder packets Must, at receiver:  reorder  recover  resegment  rescynchronize  clock synchronization! RTP Source RTP Sink RTP Packets

35 Transport System Source  Digitize Audio from mike  Silence Suppression  Echo cancellation  Compress Audio G.711: 64 kbps G.729: 8 kbps G.723.1: 5.3/6.3 kbps  Packetize Audio in RTP  Send Sink  Receive packets  Un-packetize  decompress  comfort noise generation  reorder  recover loss  jitter buffer  A/D conversion to speakers

36 Jitter Buffer Packets delayed differently Must play them out periodically Packets may arrive after designated playout time -> loss Insert extra delay to compensate May need to adapt this amount time pkts

37 RTP Packet Header |V=2|P|X| CC |M| PT | sequence number | | timestamp | | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | |.... |

38 RTP Header Fields Version: 2 P: indicates padding (for encryption) X: extension bit CSRC count: for mixers (later) M: Marker Bit: indicates framing  audio codecs: first packet in talkspurt  video: last packet in frame Payload Type: indicates encoding  in RTP packet allows changes per-packet  Useful for: adaptation DTMF codec silence codecs SN: defines ordering of packets Timestamp: when packet was generated SSRC: identifier CSRC: list of mixed users

39 RTP Timestamp Tick units are dependent on codec  For speech: 125 microseconds (standard 8 khz sampling rate)  For video: 90 KhZ  For audio: 44.1 KhZ (CD rate) Gaps in TS, but not in SN mean silence Initial value random for security Video  Timestamp represents time at beginning of frame  Many packets may have same timestamp Speech  Time per packet may vary  Depends on packetization: ms typical

40 Payload Formats Each codec needs a way to be encapsulated in RTP RFC3550 defines mechanisms for many common codecs  G.711, G.729, G.723.1, G.722, etc.  Some simple video More complex codecs have their own payload format documents  MPEG  H.263 and H.261 Payload format defines  How to break frame into packets  extra fields needed below main RTP header

41 Advanced Topics DTMF and Tones  RFC 2833  Special codecs for encoding touch tones (DTMF) and other signals  Can send either the waveform (frequency, amplitude)  Or the actual signal (#, 8, 0) Compressed RTP  RFC 2508  For dialup links  Don’t send header, just send index  Far side uses index to retrieve header, and then increments certain fields

42 Quality of Service

43 The problem we are trying to solve is to give “better” service to some at the expense of giving worse service to to others — QoS fantasies to the contrary, it’s a zero sum game - Van Jacobson

44 Quality of Service So, what’s the problem? Toll Quality Early I-Phone Technologyy Improving I-Phone means: Lower PC Delay Lower Network Latency Tighten Network Jitter Satellite Zone CB Zone Fax Relay, Broadcast Private Network VoFR & VoIP Technology

45 Delay Budget Device sample capture Encode delay (algorithmic delay + processing delay) Packetization/framing Move to output queue/queueing delay Access (up) link transmission Backbone network transmission Access (down) link transmission Input queue to application Jitter buffer Decode processing delay Device playout delay “The Network”

46 Some Techniques to Improve “Network QoS” RED — Random Early Drop (or “Detect”) WFQ — Weighed Fair Queuing Intserv/RSVP — ReSerVation Protocol IP Precedence  DiffServ CRTP — Compressed Realtime Transport Protocol MCML — Multi-Class Multi-Link PPP

47 Random Early Detect (RED) this is Basic Hygiene! Objectives  Keep average queue size low – good for voice  Fairness – bigger streams punished more  Avoid synchronization Only works with loss responsive transport protocols Algorithm – probabilistic dropping of packets Queue Size Drop Probability 1 MinMax

48 Poll: Will RED Help Voice? YesNo Voice not loss responsive Mixing voice and data in same queue bad Voice queues usually not congested

49 Weighted Fair Queueing Each flow “sees” a dedicated amount of bandwidth Bj A packet arriving at time t is transmitted at time t+size/Bj B1B1 B3B3 B2B2 B B = B 1 + B 2 + B 3

50 Whats the Problem?? WFQ is unrealizable because  Variable packet sizes  Causality Example:  Link speed 100Kbps  Flow 1: 10Kbps  Flow 2: 90Kbps ms Theory 128ms Actual

51 Approximations of WFQ Many PhDs written with approximate and implementable algorithms Algorithms differ in their delay bound  How much worse than perfect WFQ is this? Delay bounds a function of bandwidth, number of queues, other params Algorithms SCFQ: Self-Clocked Fair Queueing WF2Q: Worst-Case Fair Weighted Fair Queueing FBFQ: Frame-Based Fair Queueing PGPS: DRR:

52 WFQ Voice Configuration How to pick allocated bandwidth?  Consider G.711, 30ms framing (74.6Kbps) If Bi = 74.6kbps, delay is at least 30ms If Bi = 149.2Kbps, delay at least 15ms  Must set voice queue bandwidth at least 2x actual voice usage to keep delays down!  Unused bandwidth will go to data Need an accurate WFQ Implementation

53 Priority Queueing Emulates the familiar “elite airport line” experience Voice and data packets in separate queues If there is any packets in voice queue, they are serviced VoiceData Server

54 Priority Queueing Considerations Easy to configure – no bandwidth values required Main problem – data starvation Need to police voice queue Doesn’t work as well when there is other non- voice high priority traffic (video) Head-of-Line Blocking from data queue

55 Intserv: Integrated Services Guaranteed Service (RFC 2212)  Mathematically provable bounds on end-to-end datagram queuing delay/bandwidth Controlled Load Service (RFC 2211)  Approximate QoS from an unloaded network for delay/bandwidth Describe traffic with a “TSPEC” r= token bucket rate b= token bucket depth p= peak transmission rate m= minimum (policed) packet size M= maximum packet size Describe endpoints with a « FlowSpec »  Source/Destination IP addresses, ports, protocol RSPEC/FSPEC provides the policy to the queuing/scheduling algorithms

56 RSVP Design Signaling distinct from routing (modularity, deployability, evolvability) Soft state (robustness, simplicity) Transparent operation across non-RSVP routers (deployability) Support shared and distinct reservations Applies to unicast & multicast applications Simplex & receiver-oriented.

57 RSVP protocol PATH : Source  Destination  Traffic parameters of source  Collects info on network capabilities  Detects current route RESV: Source  Destination  Receiver selected Int-Serv service  Traffic parameters of receiver selected reservation  Follows route detected by PATH  Reservation actually nailed in network RSVP messages carried over IP Can also be carried over UDP but few people do that path SrcDest. resv

58 RSVP: Admission Control Route Selection Interface 1 Interface N Routing Protocol Routing Database Packets In Packets Out Admission Control Resource Utilization Database Switching Routing Queuing Policy Database Flow Request Reservation Protocol Packet Scheduler

59 Intserv/RSVP Acceptance Time Enthusiasm Today ISP Intserv/RSVP will solve the world’s QoS Cool thing to say: “RSVP does not scale” vBNS RSVP over ATM transparently transport RSVP Real value Today Enterprise RSVP for VoIP in Enterprise

60 IP Precedence & Diffserv “Poor man’s” approach to QoS Set IP Precedence/DSCP higher on voice packets  This puts them in a different queue, resulting in isolation from best effort traffic  Can be done by endpoint, proxy, or in routers through heuristics Scales better than RSVP –  Keeps QoS control “local”  Pushes work to the edges and boundaries  Can provide bulk QoS by customer or network No admission control  Too much high-precedence traffic can still swamp the network

61 Diffserv Architectural Model Clouds — regions of relative homogeneity:  Administrative control  Technology  Bandwidth Within a cloud, QoS managed by local rules Hard work confined to boundaries of clouds:  Classification  Conditioning/Policing QoS information exchange limited to boundaries  Bi-lateral, not multi-lateral  Not necessarily symmetric Me Not Me Also Not Me Far Away

62 Diffserv Scalability Fundamental assumptions:  Relatively small number of feasible queuing/scheduling algorithms for high link speeds  Number of individual flows is large  Many different rules, often policy driven Group packets explicitly by the “Per-hop behavior (PHB)” they are to get  Queue service  Shaping/policing Nodes in the middle of a cloud only have to deal with traffic aggregates

63 Diffserv Forwarding via PHBs PHBs map to DSCPs (Diffserv Code Points)  Values chosen for backward-compatibility with IPv4 TOS byte including IP Precedence (RFC 2474) Packets with different DSCPs may be re- ordered Forwarding resources partitioned by PHB/DSCP

64 Assured Forwarding PHB (AF*) Four independent classes Within each class, three levels of drop precedence  A congested AF node discards packets with higher drop preference first  Packets with lowest drop preference must be within the subscribed profile *RFC2597

65 Expedited Forwarding PHB (EF*) Targeted at VoIP and “virtual leased lines” Roughly equivalent to priority queuing, with a safety measure to prevent starvation Implications:  No more than 50% of a link can be EF see RFC3247,3248 for interesting mathematical analyses  Worst case jitter at each hop is max of: number of EF microflows in the aggregate, or a single MTU packet of some other aggregate *RFC3246

66 Diffserv Traffic Conditioner Classifier: selects a packet in a traffic stream based on the content of some portion of the packet header Meter: checks compliance to traffic parameters (e.g. Token Bucket) and passes result to marker and shaper/dropper to trigger particular action for in/out-of-profile packets Marker: writes/rewrites DSCP Shaper: delay some packets for them to be compliant with the profile Packets Shaped Dropped Meter ClassifierMarker Shaper / Dropper

67 Diffserv Acceptance Time Enthusiasm today Diffserv will solve the world’s QoS Diffserv Engineering? Diffserv SLA ? Internet e2e SLA? Diffserv Design & Deployment intra Domain Real value Inter-SP Diffserv and end-to-end Internet QoS need further standardisation and commercial arrangements

68 Mixing Intserv & Diffserv: Aggregation Host signals with RSVP Edge or transit domains  Aggregate reservations mark packets using DSCP In transit domains  Blindly transfer end to end reservations using another IP Protocol Number - change at edge  Routers detect egress of reservation (deaggregation) on transfer from an interior or aggregator interface to an exterior (deaggregating) interface Aggregate reservation size varies with load Edge Backbone

69 RTP Compression 8kbit/s yields 20 byte payload IP header 20; UDP header 8; RTP header 12  Twice size of payload! Header compression: 40 bytes to 2-4 most of the time Hop-by-hop: use only on the slow links

70 Sample Delay Budget (G kbps)

71 Sample Delay Budget (G kbps)

72 Signaling: SIP

73 SIP is one of Many ITU H.323  Originally for video conferencing  The first standard protocol for VoIP  Still in wide usage, but negative growth MGCP  Dumb phones controlled by smart server  “Softswitch” – PSTN emulation view Megaco/H.248  Standard version of MGCP

74 Core SIP Functions Establishment of peer to peer sessions Management of peer to peer sessions  Keepalives  Graceful and Non-graceful termination Rendezvous  Forking  Search Policy Based Routing Loose Routing Mobility  Limited terminal mobility  Device Mobility

75 Core SIP Functions Secure User Identification Exchange and Management of Media Session data User registration Capability declaration Capability query Reliability

76 SIP Technology Community SIP RFC3261 DNS 3263 Events 3265 Rel 3262 O/A 3264 RTP SDP SIMPLE SigComp SIP Extensions ENUM MIDCOM STUN ROHC

77 SIP Design Philosophy Patterned after other Successful Internet Standards  HTTP Don’t Reinvent the PSTN General Purpose Functionality Do Not Dictate Architectures or Services It needs to work on any IP Network Leverage the Best of Existing Standards URLs MIME RFC822 Scalability Push state to the edge

78 Basic Design Request/Response Protocol SIP is a Peer Protocol – all entities send requests and receive requests Modelled after HTTP Each request invokes method  Main purpose of request Messages contain bodies Agent request response

79 Transactions Fundamental unit of messaging exchange  Request  Zero or more provisional responses  Usually one final response  Maybe ACK All signaling composed of independent transactions Identified by Cseq  Sequence number  Method tag INVITE ACK BYE 200 First Transaction Second Transaction Cseq: 1 Cseq: 2

80 Session Independence Body of SIP message used to establish call describes the session Session could be  Audio  Video  Game SIP operation is independent of type of session SIP Bodies are MIME objects  MIME = Multipurpose Internet Mail Extensions  Mechanisms for describing and carrying opaque content  Used with HTTP and

81 Protocol Components User Agent  End systems  Hard and soft phones  PSTN Gateways  Phone Adaptors  Media Servers  Anything that originates or terminates SIP calls Proxy  SIP server responsible for relaying and processing requests between user agents  Main job: where to send request next? Back-to-Back User Agent (B2BUA)  SIP server that terminates and re-originates SIP  SBCs, Call Agents, etc.

82 SIP Addressing SIP addresses are URL’s URL contains several components  Scheme (sip)  Username  Hostname  Optional port  Parameters  Headers and Body SIP allows any URI type  tel URIs  http URLs for redirects  mailto URLs  leverage vast URI infrastructure user=host?Subject=foo

83 The SIP Trapezoid SIP RTP

84 SIP Methods INVITE  Invites a participant to a session  idempotent - reINVITEs for session modification BYE  Ends a client’s participation in a session CANCEL  Terminates a search OPTIONS  Queries a participant about their media capabilities, and finds them, but doesn’t invite ACK  For reliability and call acceptance REGISTER  Informs a SIP server about the location of a user

85 SIP Architecture Request Response Media Corp DB

86 SIP Message Syntax Many header fields from http Payload contains a media description  SDP - Session Description Protocol INVITE SIP/2.0 From: J. Rosenberg ;tag=76ah Subject: Conference Call To: John Smith Via: SIP/2.0/UDP ;branch= z9hG4bK74bf9 Call-ID: Content-type: application/sdp CSeq: 4711 INVITE Content-Length: 187 v=0 o=user IN IP s=Sales c=IN IP t=0 0 m=audio 3456 RTP/AVP 0

87 SIP Address Fields Request-URI  Contains address of next hop server  Rewritten by proxies based on result of Location Service To  Address of original called party  Contains optional display name From  Address of calling party  Optional display name INVITE SIP/2.0 From: J. Rosenberg ;tag=76ah Subject: Conference Call To: John Smith Via: SIP/2.0/UDP ;branch= z9hG4bK74bf9 Call-ID: Content-type: application/sdp CSeq: 4711 INVITE Content-Length: 187 v=0 o=user IN IP s=Sales c=IN IP t=0 0 m=audio 3456 RTP/AVP 0

88 SIP Responses Look much like requests  Headers, bodies Differ in top line  Status Code Numeric, Meant for computer processing Protocol behavior based on 100s digit Other digits give extra info  Reason Phrase Text phrase for humans Can be anything Status Code Classes  (1XX): Informational  (2XX): Success  (3XX): Redirection  (4XX): Client Error  (5XX): Server Error  (6XX): Global Failure Two groups  : Provisional Not reliable  : Final, Definitive Example  200 OK  180 Ringing

89 Example SIP Response Note how only difference is top line Rules for generating responses  Call-ID, To, From, Cseq are mirrored in response  Branch parameter used as transaction ID  Tag added to To field to identify dialog SIP/ OK From: J. Rosenberg ;tag=76ah To: John Smith ;tag=112 Via: SIP/2.0/UDP ;branch= z9hG4bK74bf9 Call-ID: Content-type: application/sdp CSeq: 4711 INVITE

90 SIP Transport SIP Messages over UDP or TCP/TLS or SCTP Reliability mechanisms defined for UDP UDP More Widely Used  Faster  No connection state TCP preferred these days  NAT  Larger SIP messages Reliability mechanisms depend on SIP request method  INVITE  anything except INVITE Reason: optimized for phone calls

91 Registrations REGISTER creates mapping in server from one URI to another REGISTER properties  UA location in Contact  Registrar identified in Request URI  Identifies registered user in To and From field  Expires header indicates desired lifetime Can be different for each Contact Registrations are soft-state REGISTER SIP/2.0 To: From: Call-ID: CSeq: 123 REGISTER Contact: Expires: 3600 to

92 Registration Handling Registrar is logical function handling REGISTER Registrar steps:  Authenticate  Authorize  Add Binding  Lower expiration  Return all currently registered UA (can be more than one) SIP/ OK To: From: Call-ID: CSeq: 123 REGISTER Contact: Contact:

93 Forking A proxy may have more than one address for a user  Happens when more than one SIP URL is registered for a user  Can happen based on static routing configuration In this case, proxy may fork Forking is when proxy sends request to more than one proxy at once First 200 OK that is received is forwarded upstream All other unanswered requests cancelled INVITE INVITE INVITE

94 Routing of Subsequent Requests Initial SIP request sent through many proxies No need per se for subsequent requests to go through proxies Each proxy can decide whether it wants to receive subsequent requests  Inserts Record-Route header containing its address For subsequent requests, users insert Route header  Contains sequence of proxies (and final user) that should receive request Proxy UA1 UA2 INVITE BYE

95 Setting up the Session INVITE contains the Session Description Protocol (SDP) in the body SDP conveys the desired session from the callers perspective  Session consists of a number of media streams  Each stream can be audio, video, text, application, etc. Also contains information needed about the session  codecs  addresses and ports SDP also conveys other information about session  Time it will take place  Who originated the session  subject of the session  URL for more information SDP origins are multicast sessions on the mbone  Originator of INVITE is not originator of session

96 Anatomy of SDP SDP contains informational headers  version (v)  origin(o) - unique ID  information (I) Time of the session Followed by a sequence of media streams Each media stream contains an m line defining  port  transport  codecs Media Stream also contains c line  Address information v=0 o=user IN IP s=Mbone Audio i=Discussion of Mbone Engineering Issues t=0 0 m=audio 3456 RTP/AVP 0 78 c=IN IP a=rtpmap:78 G723 m=video 4444 RTP/AVP 86 c=IN IP a=rtpmap:86 H263

97 Negotiating the Session Called party receives SDP offered by caller Each stream can be  accepted  rejected Accepting involves generating an SDP listing same stream  port number and address of called party  subset of codecs from SDP in request Rejecting indicated by setting port to zero Resulting SDP returned in 200 OK Media can now be exchanged v=0 o=user IN IP t=0 0 m=audio 3456 RTP/AVP 0 c=IN IP m=video 0 RTP/AVP 86 c=IN IP Audio stream accepted, PCMU only. Video stream rejected

98 Changing Session Parameters Once call is started, session can be modified Possible changes  Add a stream  Remove a stream  Change codecs  Change address information Call hold is basically a session change Accomplished through a re-INVITE Same session negotiation as INVITE, except in middle of call Rejected re-INVITE - call still active! INVITE 200 ACK INVITE 200 ACK reINVITE

99 Hanging Up How to hang up depends on when and who After call is set up  either party sends BYE request From caller, before call is accepted  send CANCEL  BYE is bad since it may not reach the same set of users that got INVITE  If call is accepted after CANCEL, then send BYE From callee, before accepted  Reject with 486 Busy Here C S INVITE 100 HangupAccept CANCEL 200 OK ACK BYE 200 OK

100 Call Flow for basic call: UA to proxy to UA Call setup  100 trying hop by hop  180 ringing  200 OK acceptance Call parameter modification  re-INVITE  Same as initial INVITE, updated session description Termination  BYE method INVITE 100 Trying INVITE 100 Trying 180 Ringing 200 OK ACK BYE 200 OK RTP

101 Privacy and Identity RFC 3325: A Private Extension for Asserted Identity in Trusted Networks RFC 3323: A Privacy Mechanism for SIP RFC 4474: SIP Identity

102 RFC3325 Asserted Identity Trust Domain Authenticates Caller and verifies identity. Adds PAID. INVITE P-Asserted-Identity:

103 RFC3323 – SIP Privacy Trust Domain INVITE P-Asserted-Identity: From: anonymous INVITE Privacy: id From: anonymous Anonymous Caller INVITE From: anonymous

104 4474: SIP Identity Authenticates Caller and verifies identity. Signs Request. INVITE From: Identity: asd87f7as66sda8z INVITE From: Verifies Signature Only useful for addresses!

105 Transfers and Dialog Movement: REFER (RFC 3515) Joe Alice Bob REFER Refer-To: Bob INVITE INVITE Bob Referred-By: Joe

106 Third Party Call Control (3pcc): RFC 3725 RTP INVITE no SDP 200 SDP A INVITE SDP A 200 SDP B ACK SDP B

107 SIP and Quality of Service RFC 3312: Integration of Resource Management with SIP Problem  How to make sure phone doesn’t ring unless resources are reserved Solution  SIP does not do resource reservation!  SIP INVITE tells far side not to ring  Both sides do regular QoS reservations RSVP PDP context activation  UPDATE to change state INVITE w. Preconditions 183 Progress QoS Reservations UPDATE w. Preconditions 180 Ringing 200 OK ACK

108 Security

109 VoIP Security The only totally secure system I know of is a rock - Tony Lauck, circa 1985

110 But Even Rocks can be Insecure..

111 It Had a Great User Interface

112 But it had a serious security vulnerability…

113 VoIP Attacks AttackSolution Free Calls aka Toll FraudUser Authentication ImpersonationUser Authentication, Secure Caller ID Learning Private Information (calling patters, PIN codes) SIP Encryption, Media Encryption Steal CallsSIP Encryption, Media Encryption DoSICE, Others

114 SIP User Authentication RTP We want this SIP server to authenticate this user and this SIP server to authenticate this user

115 SIP Digest Authentication Hi, I’d like to SIP REGISTER 401 – OK, try again. Nonce=a7szh1 REGISTER Nonce=a7szh1 Username=joe Digest=z0v88a6 Digest= Hash(joe, a7szh1, myPassword) OK, done! Digest= Hash(joe, a7szh1, myPassword) = z0v88a6

116 Offline Dictionary Attack REGISTER Nonce=a7szh1 Username=joe Digest=z0v88a6 Digest= Hash(joe, a7szh1, alligator) OK, done! Digest= Hash(joe, a7szh1, alligator) = Aardvark 9z8v77a Abacus lkf88z7 Abate 8z77x ……. Alligator z0v88a6 Word Hash(joe, a7szh1,word)

117 Solution: Digest over TLS Digest= Hash(joe, a7szh1, alligator) Digest= Hash(joe, a7szh1, alligator) = TLS Armor This is how Web Security works!

118 Even Stronger: Mutual TLS for Devices TLS Armor MAC 8x7a6 Phone has a Certificate which identifies it

119 SIP Encryption RTP We want each SIP hop to be Encyprted so only the SIP servers and endpoints see the signaling.

120 SIP Encryption: TLS RTP Mutual TLS Authentication

121 Media Encryption Countermeasure against:  Eavesdropping  Barge-in  Modification Two useful techniques  IPSEC  SRTP Complications  Key management  Legal intercept (who has the keys)  Firewall and NAT issues (covered later)

122 Alternative: Secure RTP Authentication and encryption of RTP and RTCP packets timestamp PVXCCMPTsequence number synchronization source (SSRC) identifier contributing sources (CCRC) identifiers … RTP extension (optional) RTP payload SRTP MKI -- 0 bytes for voice Authentication tag -- 4 bytes for voice Authenticated portion Encrypted portion

123 SRTP Advantages  Provides both Privacy via encryption and authentication via message integrity check  Very little bandwidth overhead Does not break header compression schemes like cRTP For very low-rate channels (e.g. cellular) can sacrifice authentication and have no packet expansion.  Uses modern strong crypto suites: AES counter mode for encryption and HMAC for message integrity Disadvantages  Needs key management  End-to-end versus hop-by-hop trust tradeoffs in protecting keys  Yet another security mechanism to ensure is implemented and deployed correctly

124 NAT Traversal

125 What is NAT? Network Address Translation (NAT)  Creates address binding between internal private and external public address  Modifies IP Addresses/Ports in Packets  Benefits Avoids network renumbering on change of provider Allows multiplexing of multiple private addresses into a single public address ($$ savings) Maintains privacy of internal addresses Client NATNAT NATNAT S: :8877 D: :80 Binding Table Internal External :6554 -> :8877 S: :6554 D: :80 IP Pkt

126 Problem: Getting SIP Through NATs NATNAT INVITE m=audio 3456 RTP/AVP 0 c=IN IP RTP to

127 Solution Space Application Layer Gateways (ALGs) Session Border Controllers (SBC) Simple Traversal of UDP Through NAT (STUN) Traversal Using Relay NAT (TURN) Interactive Connectivity Establishment (ICE)

128 Application Layer Gateway NATNAT INVITE m=audio 3456 RTP/AVP 0 c=IN IP RTP to INVITE m=audio 1234 RTP/AVP 0 c=IN IP ALG NAT also modifies SIP messages to fix them up!

129 ALG Benefits and Drawbacks Drawbacks  Doesn’t work when security turned on  Hard to diagnose problems  Requires network upgrade to support new app  Frequent implementation problems (lack of expertise)  Incentives mismatched Benefits  No change to clients or servers

130 Session Border Controller NATNAT INVITE m=audio 3456 RTP/AVP 0 c=IN IP SBC INVITE m=audio 3225 RTP/AVP 0 c=IN IP RTP to SBC relays RTP back to source

131 SBC Benefits and Drawbacks Drawbacks  Expensive media relaying  Interferes with some SIP extensions  Breaks more advanced SIP security Benefits  No change to clients or NATs  Works with basic SIP security mechanisms  Easier to diagnose

132 Simple Traversal of UDP Through NAT (STUN) NATNAT What is my IP address and port please? STUN Server INVITE m=audio 3472 RTP/AVP 0 c=IN IP RTP to Its : 3472

133 STUN Benefits and Drawbacks Drawbacks  Doesn’t always work Benefits  No change to servers or NATs  Works with all SIP security mechanisms  Can support non-VoIP apps (e.g., games)

134 Traversal Using Relay NAT (TURN) NATNAT Give me an IP address and port please? TURN Server INVITE m=audio 2376 RTP/AVP 0 c=IN IP RTP to : 2376

135 TURN Benefits and Drawbacks Drawbacks  Expensive Media Relaying Benefits  No change to servers or NATs  Works with all SIP security mechanisms  Can support non-VoIP apps (e.g., games)

136 Interactive Connectivity Establishment (ICE) Hybrid of STUN and TURN P2P NAT Traversal Widely Deployed on Internet Popular with Application Providers

137 ICE Step 1: Allocation Before Making a Call, the Client Gathers Candidates Each candidate is a potential address for receiving media Three different types of candidates  Host Candidates  Server Reflexive Candidates (STUN)  Relayed Candidates (TURN) TURN Host Candidates reside on the agent itself STUN candidates are addresses residing on a NAT NAT TURN candidates reside on a TURN server STUN

138 ICE Step 2: Create Offer Each candidate is placed into an a=candidate attribute of the offer Each candidate line has IP address and port plus other info needed for ICE c=IN IP t=0 0 m=audio RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:1 1 UDP typ host a=candidate:2 1 UDP typ srflx raddr rport 8998

139 ICE Step 3: Send INVITE Caller sends a SIP INVITE as normal No ICE processing by SIP servers SIP Server INVITE

140 ICE Step 4: Allocation Called party does exactly same processing as caller and obtains its candidates Recommended to not yet ring the phone! TURN NAT STUN

141 ICE Step 5: Provisional Response Callee sends a provisional response containing its SDP with candidates As with INVITE, no processing by proxies Phone has still not rung yet SIP Proxy 1xx

142 ICE Step 6: Verification Each agent pairs up its candidates (local) with its peers (remote) to form candidate pairs Each agent sends a STUN-based ping on each pair, starting at highest priority If a response is received the check has succeeded and we know media can flow on that pair! TURN Server NAT TURN Server NAT

143 ICE Benefits and Drawbacks Drawbacks  Requires client changes  Requires other side to support it Benefits  Always Works  No change to servers or NATs  Works with all SIP security mechanisms  Minimum Media Relaying  Can support non-VoIP apps (e.g., games)  Built-In Anti-DOS  Eliminates Ghost Rings

144 That’s it! Questions?

145 Glossary

146 Glossary (2)

147 Thanks Enjoy Interop! to contact me:

Download ppt "Understanding VoIP Dr. Jonathan Rosenberg Chief Technology Strategist Skype."

Similar presentations

Ads by Google