Presentation on theme: "1 Voice over IP (VoIP) and the Session Initiation Protocol (SIP) Huei-Wen Ferng ( 馮輝文 ) Assistant Professor, CSIE, NTUST"— Presentation transcript:
1 Voice over IP (VoIP) and the Session Initiation Protocol (SIP) Huei-Wen Ferng ( 馮輝文 ) Assistant Professor, CSIE, NTUST
2 Outline r Introduction r Streaming stored audio and video r Real-time, interactive multimedia: Internet phone case study r Protocols for real-time interactive applications: RTP, RTCP, and SIP r Challenges r Our results r Q & A
3 VoIP & SIP Introduction
4 MM Networking Applications Fundamental characteristics: r Typically delay sensitive m end-to-end delay m delay jitter r But loss tolerant: infrequent losses cause minor glitches r Antithesis of data, which are loss intolerant but delay tolerant. Classes of MM applications: 1) Streaming stored audio and video 2) Streaming live audio and video 3) Real-time interactive audio and video Jitter is the variability of packet delays within the same packet stream
5 Streaming Stored Multimedia: What is it? 1. video recorded 2. video sent 3. video received, played out at client Cumulative data streaming: at this time, client playing out early part of video, while server still sending later part of video network delay time
6 Streaming Live Multimedia Examples: r Internet radio talk show r Live sporting event Streaming r playback buffer r playback can lag tens of seconds after transmission r still have timing constraint Interactivity r fast forward impossible r rewind, pause possible!
7 Interactive, Real-Time Multimedia r end-end delay requirements: m audio: < 150 msec good, < 400 msec OK includes application-level (packetization) and network delays higher delays noticeable, impair interactivity r session initialization m how does callee advertise its IP address, port number, encoding algorithms? r applications: IP telephony, video conference, distributed interactive worlds
8 A few words about audio compression r Analog signal sampled at constant rate m telephone: 8,000 samples/sec m CD music: 44,100 samples/sec r Each sample quantized, ie, rounded m eg, 2 8 =256 possible quantized values r Each quantized value represented by bits m 8 bits for 256 values r Example: 8,000 samples/sec, 256 quantized values --> 64,000 bps r Receiver converts it back to analog signal: m some quality reduction Example rates r CD: Mbps r MP3: 96, 128, 160 kbps r Internet telephony: kbps
9 VoIP & SIP Streaming Stored Audio and Video
10 Streaming Stored Multimedia Application-level streaming techniques for making the best out of best effort service: m client side buffering m use of UDP versus TCP m multiple encodings of multimedia r jitter removal r decompression r error concealment r graphical user interface w/ controls for interactivity Media Player
11 Internet multimedia: simplest approach audio, video not streamed: r no, “pipelining,” long delays until playout! r audio or video stored in file r files transferred as HTTP object m received in entirety at client m then passed to player
12 Internet multimedia: streaming approach r browser GETs metafile r browser launches player, passing metafile r player contacts server r server streams audio/video to player
13 Streaming from a streaming server r This architecture allows for non-HTTP protocol between server and media player r Can also use UDP instead of TCP.
14 constant bit rate video transmission Cumulative data time variable network delay client video reception constant bit rate video playout at client client playout delay buffered video Streaming Multimedia: Client Buffering r Client-side buffering, playout delay compensate for network-added delay, delay jitter
15 User Control of Streaming Media: RTSP HTTP r Does not target multimedia content r No commands for fast forward, etc. RTSP: RFC 2326 r Client-server application layer protocol. r For user to control display: rewind, fast forward, pause, resume, repositioning, etc… What it doesn’t do: r does not define how audio/video is encapsulated for streaming over network r does not restrict how streamed media is transported; it can be transported over UDP or TCP r does not specify how the media player buffers audio/video
16 RTSP: out of band control FTP uses an “out-of-band” control channel: r A file is transferred over one TCP connection. r Control information (directory changes, file deletion, file renaming, etc.) is sent over a separate TCP connection. r The “out-of-band” and “in- band” channels use different port numbers. RTSP messages are also sent out-of-band: r RTSP control messages use different port numbers than the media stream: out-of-band. m Port 554 r The media stream is considered “in-band”.
17 RTSP Operation
18 VoIP & SIP Real-time, Interactive Multimedia: Internet Phone Case Study
19 Real-time interactive applications r PC-2-PC phone m instant messaging services are providing this r PC-2-phone m Dialpad m Net2phone r videoconference with Webcams Going to now look at a PC-2-PC Internet phone example in detail
20 Interactive Multimedia: Internet Phone Introduce Internet Phone by way of an example r speaker’s audio: alternating talk spurts, silent periods. m 64 kbps during talk spurt r pkts generated only during talk spurts m 20 msec chunks at 8 Kbytes/sec: 160 bytes data r application-layer header added to each chunk. r Chunk+header encapsulated into UDP segment. r application sends UDP segment into socket every 20 msec during talkspurt.
21 Internet Phone: Packet Loss and Delay r network loss: IP datagram lost due to network congestion (router buffer overflow) r delay loss: IP datagram arrives too late for playout at receiver m delays: processing, queueing in network; end-system (sender, receiver) delays m typical maximum tolerable delay: 400 ms r loss tolerance: depending on voice encoding, losses concealed, packet loss rates between 1% and 10% can be tolerated.
22 constant bit rate transmission Cumulative data time variable network delay (jitter) client reception constant bit rate playout at client client playout delay buffered data Delay Jitter r Consider the end-to-end delays of two consecutive packets: difference can be more or less than 20 msec
23 Internet Phone: Fixed Playout Delay r Receiver attempts to playout each chunk exactly q msecs after chunk was generated. m chunk has time stamp t: play out chunk at t+q. m chunk arrives after t+q: data arrives too late for playout, data “lost” r Tradeoff for q: m large q: less packet loss m small q: better interactive experience
24 Fixed Playout Delay Sender generates packets every 20 msec during talk spurt. First packet received at time r First playout schedule: begins at p Second playout schedule: begins at p’
25 Adaptive Playout Delay, I Dynamic estimate of average delay at receiver: where u is a fixed constant (e.g., u =.01). r Goal: minimize playout delay, keeping late loss rate low r Approach: adaptive playout delay adjustment: m Estimate network delay, adjust playout delay at beginning of each talk spurt. m Silent periods compressed and elongated. m Chunks still played out every 20 msec during talk spurt.
26 Adaptive playout delay II Also useful to estimate the average deviation of the delay, v i : The estimates d i and v i are calculated for every received packet, although they are only used at the beginning of a talk spurt. For first packet in talk spurt, playout time is: where K is a positive constant. Remaining packets in talk spurt are played out periodically
27 Adaptive Playout, III Q: How does receiver determine whether packet is first in a talk spurt? r If no loss, receiver looks at successive timestamps. m difference of successive stamps > 20 msec -->talk spurt begins. r With loss possible, receiver must look at both time stamps and sequence numbers. m difference of successive stamps > 20 msec and sequence numbers without gaps --> talk spurt begins.
28 Summary: Internet Multimedia: bag of tricks r use UDP to avoid TCP congestion control (delays) for time-sensitive traffic r client-side adaptive playout delay: to compensate for delay r server side matches stream bandwidth to available client-to-server path bandwidth m chose among pre-encoded stream rates m dynamic server encoding rate r error recovery (on top of UDP) m FEC, interleaving m retransmissions, time permitting m conceal errors: repeat nearby data
29 VoIP & SIP Protocols for Real-Time Interactive Applications : RTP, RTCP, and SIP
30 Real-Time Protocol (RTP) r RTP specifies a packet structure for packets carrying audio and video data r RFC r RTP packet provides m payload type identification m packet sequence numbering m timestamping r RTP runs in the end systems. r RTP packets are encapsulated in UDP segments r Interoperability: If two Internet phone applications run RTP, then they may be able to work together
31 RTP runs on top of UDP RTP libraries provide a transport-layer interface that extend UDP: port numbers, IP addresses payload type identification packet sequence numbering time-stamping
32 RTP Example r Consider sending 64 kbps PCM-encoded voice over RTP. r Application collects the encoded data in chunks, e.g., every 20 msec = 160 bytes in a chunk. r The audio chunk along with the RTP header form the RTP packet, which is encapsulated into a UDP segment. r RTP header indicates type of audio encoding in each packet m sender can change encoding during a conference. r RTP header also contains sequence numbers and timestamps.
33 RTP and QoS r RTP does not provide any mechanism to ensure timely delivery of data or provide other quality of service guarantees. r RTP encapsulation is only seen at the end systems: it is not seen by intermediate routers. m Routers providing best-effort service do not make any special effort to ensure that RTP packets arrive at the destination in a timely matter.
34 RTP Header Payload Type (7 bits): Indicates type of encoding currently being used. If sender changes encoding in middle of conference, sender informs the receiver through this payload type field. Payload type 0: PCM mu-law, 64 kbps Payload type 3, GSM, 13 kbps Payload type 7, LPC, 2.4 kbps Payload type 26, Motion JPEG Payload type 31. H.261 Payload type 33, MPEG2 video Sequence Number (16 bits): Increments by one for each RTP packet sent, and may be used to detect packet loss and to restore packet sequence.
35 RTP Header (2) r Timestamp field (32 bytes long). Reflects the sampling instant of the first byte in the RTP data packet. m For audio, timestamp clock typically increments by one for each sampling period (for example, each 125 usecs for a 8 KHz sampling clock) m if application generates chunks of 160 encoded samples, then timestamp increases by 160 for each RTP packet when source is active. Timestamp clock continues to increase at constant rate when source is inactive. r SSRC field (32 bits long). Identifies the source of the RTP stream. Each stream in a RTP session should have a distinct SSRC.
36 Real-Time Control Protocol (RTCP) r Works in conjunction with RTP. r Each participant in RTP session periodically transmits RTCP control packets to all other participants. r Each RTCP packet contains sender and/or receiver reports m report statistics useful to application r Statistics include number of packets sent, number of packets lost, interarrival jitter, etc. r Feedback can be used to control performance m Sender may modify its transmissions based on feedback
37 SIP r Session Initiation Protocol r Comes from IETF SIP long-term vision r All telephone calls and video conference calls take place over the Internet r People are identified by names or addresses, rather than by phone numbers. r You can reach the callee, no matter where the callee roams, no matter what IP device the callee is currently using.
38 RFC and Related Protocols r Originally specified in RFC 2543 (March 1999) r RFC 3261, new standards track released in June 2002 r An application-layer control signaling protocol for creating, modifying and terminating sessions with one or more participants r A component that can be used with other IETF protocols to build a complete multimedia architecture (e.g. RTP, RTSP, MEGACO, SDP)
39 SIP Functionality Supports five facets of establishing and terminating multimedia communications r User Location r User Availability r User Capabilities r Session Setup Session Management
40 SIP Architecture r Client-server in nature r Main entities: m User Agent m Proxy Server m Redirect Server m Registration Server m Location Server
41 Registrar and UA Behavior SIP User Agent SIP Registrar SIP Location Service SIP Request SIP Reply Non-SIP Protocol
42 SIP Proxy/Redirect Servers and UA Behaviors SIP User Agent (Caller) SIP Proxy Redirect Server SIP User Agent (Caller) SIP Proxy Location Service 1 2, Non-SIP Protocol 5,6
43 Model of VoIP Communication Between Two Soft Phones r Protocol dependency UDP SIP SDP RTP Audio Codec (e.g. voice) SoftPhone Example does not represent actual scale
44 More Accurate Layout of Protocols
45 SIP Request Messages
46 SIP Response Messages 100 Trying 180 Ringing 181 Call is being Forwarded 182 Queued 200 OK 301 Moved Permanently 302 Moved Temporarily
47 Messages Flow r Primary protocol for establishing sessions between VoIP applications (softphones) Cooperating protocols – RTP (Realtime Transmission Protocol), SDP (Session Description protocol)
48 Example of SIP message INVITE SIP/2.0 Via: SIP/2.0/UDP From: To: Call-ID: Content-Type: application/sdp Content-Length: 885 c=IN IP m=audio RTP/AVP 0 Notes: r HTTP message syntax r sdp = session description protocol r Call-ID is unique for every call. Here we don’t know Bob’s IP address. Intermediate SIP servers will be necessary. Alice sends and receives SIP messages using the SIP default port number Alice specifies in Via: header that SIP client sends and receives SIP messages over UDP
49 Name translation and user locataion r Caller wants to call callee, but only has callee’s name or address. r Need to get IP address of callee’s current host: m user moves around m DHCP protocol m user has different IP devices (PC, PDA, car device) r Result can be based on: m time of day (work, home) m caller (don’t want boss to call you at home) m status of callee (calls sent to voic when callee is already talking to someone) Service provided by SIP servers: r SIP registrar server r SIP proxy server
50 SIP Registrar REGISTER sip:domain.com SIP/2.0 Via: SIP/2.0/UDP From: To: Expires: 3600 r When Bob starts SIP client, client sends SIP REGISTER message to Bob’s registrar server (similar function needed by Instant Messaging) Register Message:
51 SIP Proxy r Alice send’s invite message to her proxy server m contains address r Proxy responsible for routing SIP messages to callee m possibly through multiple proxies. r Callee sends response back through the same set of proxies. r Proxy returns SIP response message to Alice m contains Bob’s IP address r Note: proxy is analogous to local DNS server
52 Two major signaling standards r ITU-T H.323 m More mature and applicable m Less flexible and expansible r IETF Session Initiation Protocol (SIP) – RFC 2543 m greater scalability easing Internet application integration m Less definition
53 Comparison with H.323 r H.323 is another signaling protocol for real-time, interactive r H.323 is a complete, vertically integrated suite of protocols for multimedia conferencing: signaling, registration, admission control, transport and codecs. r SIP is a single component. Works with RTP, but does not mandate it. Can be combined with other protocols and services. r H.323 comes from the ITU (telephony). r SIP comes from IETF: Borrows much of its concepts from HTTP. SIP has a Web flavor, whereas H.323 has a telephony flavor. r SIP uses the KISS principle: Keep it simple stupid.
54 VoIP & SIP Challenges
55 Challenges: NATs and firewalls r NATs and firewalls reduce Internet to web and service m firewall, NAT: no inbound connections m NAT: no externally usable address m NAT: many different versions binding duration m lack of permanent address (e.g., DHCP) not a problem SIP address binding m misperception: NAT = security
56 Challenges: QoS Not lack of protocols – RSVP, diff-serv r Lack of policy mechanisms and complexity m which traffic is more important? m how to authenticate users? m cross-domain authentication may need for access only – bidirectional traffic m DiffServ: need agreed-upon code points NSIS WG in IETF – currently, requirements only
57 Challenges: Security r PSTN model of restricted access systems cryptographic security r Dumb end systems PCs with a handset r Objectives: m identification for access control & billing m phone/IM spam control (black/white lists) m call routing m privacy
58 Challenges: service creation Can ’ t win by (just) recreating PSTN services r Programmable services: m equipment vendors, operators: JAIN m local sysadmin, vertical markets: sip-cgi m proxy-based call routing: CPL m voice-based control: VoiceXML
59 Our Results r Members of our team: Prof. Chiu, Prof. Gu, and Prof. Ferng r Four Industrial Projects and one NSC project m Non-SIP based PC-to-PC UA m SIP-based UA m VOCAL SIP Servers m Secured UA (Under development)
60 VoIP & SIP Q & A
61 Related work r Vovida Open Communication Application Library (VOCAL) / m open source project targeted at facilitating the adoption of VoIP in the marketplace m includes a SIP based Redirect Server, Feature Server, Provisioning Server and Marshal Proxy
62 The Architecture of VOCAL
64 References r D. Collins, Carrier Grade Voice over IP, 2nd Edition, McGraw-Hill, r Vovida Open Communication Application Library (VOCAL) r L. Dang, C. Jennings, and D. Kelly, Practical VoIP Using VOCAL, OReilly & Associates Inc., r J. F. Kurose and K. W. Ross, Computer Networking: A Top-Down Approach Featuring the Internet, 2nd Edition, Addison Wesley, 2003.