Presentation is loading. Please wait.

Presentation is loading. Please wait.

SpeechTEK 2006 – Voice Over IP Tutorial Andrew Hunt, Ph.D. VP Engineering, Holly Connects Andrew Hunt, Ph.D. VP Engineering, Holly Connects.

Similar presentations

Presentation on theme: "SpeechTEK 2006 – Voice Over IP Tutorial Andrew Hunt, Ph.D. VP Engineering, Holly Connects Andrew Hunt, Ph.D. VP Engineering, Holly Connects."— Presentation transcript:


2 SpeechTEK 2006 – Voice Over IP Tutorial Andrew Hunt, Ph.D. VP Engineering, Holly Connects Andrew Hunt, Ph.D. VP Engineering, Holly Connects

3 Welcome! Who are you?

4 Andrew Hunt, Ph.D. VP of Engineering Level 11, 301 George Street Sydney, NSW 2000, Australia Tel +61 2 8207 8207 Email: Web:

5 DRAFT V03 DRAFT V03 4 Timing 8:30amStart 10:00-10:30amCoffee break 12:00-1:00pmLunch 2:30-3:00pmCoffee break 4:30pmClose

6 DRAFT V03 DRAFT V03 5 Agenda 1. Welcome & Introductions 2. Why Voice Over IP? 3. Brief History of Telephony 4. Digital Voice 5. Voice Over IP – Basics 6. VoIP Protocols 7. SIP: Session Initiation Protocol 8. RTP: Real-time Transport Protocol 9. Network Issues and Design 10. VoIP and Speech Recognition 11. VoIP and Mobile Telephony 12. Closing

7 DRAFT V03 DRAFT V03 6 Objectives Informative Relevant Interesting Interactive

8 Why Voice Over IP?

9 DRAFT V03 DRAFT V03 8 Module Overview Why Voice Over IP?  What is it?  Cost  Functionality & flexibility  Mobility

10 DRAFT V03 DRAFT V03 9 VoIP Definition Definitions  “Voice Over IP” is the use of the internet, intranets and other IP networks for the delivery of voice conversations  “Internet Protocol” (IP) is a protocol used for communicating data across a packet-switched network (specifically IPv4 or IPv6)  Many VoIP protocols exist: focus today on SIP and related protocols for use in speech recognition and IVR contexts

11 DRAFT V03 DRAFT V03 10 Why VoIP? Cost  Global data traffic exceeded voice traffic in late 1990’s  Telco charges & revenue largely from voice traffic  Migration to shared networks  Single network for voice and data  Utilize spare capacity in many data networks  Inwards-out approach to migration  Traditional voice carriage costs to business being driven down by VoIP  Residential advantages  Free services: Skype etc.  Lower cost services: Vonage, Skype etc.  Regulatory and service capability issues are evolving

12 DRAFT V03 DRAFT V03 11 Why VoIP? Functionality and Flexibility  Virtualization &mobility: move calls and agents locally, nationally, globally  Match traditional telephony functions  Transfers, voice mail, conferencing, redial, speed-dial, forwarding etc.  Integration with software and internet services  Availability notification (IM)  Instant messaging  Multi-media services: video, data files etc.  Options for traditional and VoIP telephony to co-exist and migrate steadily

13 DRAFT V03 DRAFT V03 12 Why VoIP? Mobility  Make VoIP calls virtually anywhere in the world  Number portability: landline, mobile, internet  Synchronization of address books and contacts

14 Brief History of Telephony

15 DRAFT V03 DRAFT V03 14 Module Overview Brief History of Telephony  Telephony switching  Basic concepts of telephony  Call establishment  Circuits  Switching  Migrating from analogue to digital

16 DRAFT V03 DRAFT V03 15 Telephony: Switching and Circuits Alexander Graham Bell Thomas Watson

17 DRAFT V03 DRAFT V03 16 Telephony: Switching and Circuits 1. Caller picks up 2. Caller “dials” 3. Callee phone rings 4. Callee answers 5. Circuit established 6. Conversation 7. Hang-up & tear-down

18 DRAFT V03 DRAFT V03 17 Telephony: Switching and Circuits Exchange

19 DRAFT V03 DRAFT V03 18 Telephony: Switching and Circuits PSTN: Public Switched Telephony Network 1. A-party picks up 2. A-party dials 3. B-party rings 4. B-party answers 5. Circuit established 6. Conversation 7. Hang-up & tear-down

20 DRAFT V03 DRAFT V03 19 Telephony: Switching and Circuits Pre-Digital “Analogue” Era  1878: New Haven, Connecticut  World’s first commercial telephone exchange  Built from “carriage bolts, handles from teapot lids and bustle wire”  Cost: $40 including the furniture  1891: Topeka, Kansas  Almon Strowger, an undertaker, patented the Strowger switch  Automation of the telephone circuit switching by decadic pulses  1950 onwards: Crossbar switches  1964: “Dual Tone Multi-Frequency” (DTMF) introduced

21 DRAFT V03 DRAFT V03 20 Telephony: Switching and Circuits Pre-Digital “Analogue” Era  Infrastructure  Local wiring to each phone  System of local, regional, national and international exchanges  Shared connections between exchanges  Call establishment  Numbering scheme (many iterations)  Voice  decadic pulsing  DTMF  Mapping of numbering scheme to the exchanges (e.g. CAstle=22)  Call communication  Dedicated circuit per call

22 DRAFT V03 DRAFT V03 21 Telephony: Switching and Circuits Digital Era  Digital started with the core telco networks  Efficiency on long-distance carriage  Efficiency of solid state switching technology  Migrated to local exchanges  What is digital voice?

23 Digital Voice

24 DRAFT V03 DRAFT V03 23 Module Overview Digital Speech  Sampling: creating digital audio  CODECs: Compression and companding  Sharing channels: Time Division Multiplexing (TDM)  Standard digital data links: E1 & T1

25 DRAFT V03 DRAFT V03 24 Telephony: Packet Switch Network …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… PSTN: Public Switched Telephony Network

26 DRAFT V03 DRAFT V03 25 Sampling Theory

27 DRAFT V03 DRAFT V03 26 Sampling Theory

28 DRAFT V03 DRAFT V03 27 Sample frequency  bandwidth Sampling Theory

29 DRAFT V03 DRAFT V03 28 Sample resolution = bits  noise/error Sampling Theory

30 DRAFT V03 DRAFT V03 29 Sampling Theory Time (msec) 0 Value 0 129 257 382 4102 5117 6125 7126 8120 9108 1089 1166 1239 139 14-20 15-49 16-75 17-97 18-114 19-124 20-127 ……

31 DRAFT V03 DRAFT V03 30 Non-linear sampling  better noise/error Sampling Theory: Companding G.711 μ-law for North America and Japan G.711 A-law for Europe and rest of the world

32 DRAFT V03 DRAFT V03 31 Sampling Theory MeasureG.711 TelephonyCompact Disc Sampling rate8,000 Hz44,100 Hz Frequency rangeLow – 3.5kHz (4kHz max)Lower – 22 kHz Sample type 8 bit A-law / μ-law Mono 16-bit linear PCM Stereo Signal-to-noise ratio< 70dB96 dB Data bandwidth8 kByte/sec176 kByte/sec Perceived qualityTelephonyGreat

33 DRAFT V03 DRAFT V03 32 Digital Transport 10 52 87 92 102 85 49 10 -18 -48 -60 -12 102 85 49 10 -18 -48 -60 -12 10 52 87 92 10 52 87 92 102 85 102 85 49 10 -18 -48 -49 10 -18 -48 -60 -12 -60 -12 10 52 87 92 TDM = Time Division Multiplex Latency Packet

34 DRAFT V03 DRAFT V03 33 Digital Transport  E1  World (ex. NA and Japan)  2.048 Mbit/s full duplex  32 time slots = 32 channels of 8-bit x 8 kHz  1 time slot reserved for framing  1 time slot is typically reserved for signalling  30 time slots for voice communications  E3 = 16 x E1 = 480 channels  T1  North America and Japan  1.536 MBit/s full duplex  24 time slots = 24 channels of 8-bit x 8kHz

35 DRAFT V03 DRAFT V03 34 Compression Compression of voice and audio  CODEC = COmpression - DECompression  Reduce bandwidth for the audio signal = more channels on the same transport  Lossless vs. lossy algorithms  Latency impact  CPU impact  Quality impact  Speech recognition impact

36 DRAFT V03 DRAFT V03 35 Sampling Theory CODECCharacteristicsDescription ITU-T G.711Sample rate: 8kHz Sample size: 8-bit A-law/μ-law Bandwidth: 64kbit/s Standard telephony quality with “companding” ITU-T G.726Sample rate: 8kHz Sample size: 2, 3, 4, 5-bit Bandwidth: 16, 24, 32, 40 kbit/s Adaptive Delta PCM (ADPCM). Supercedes G.721 & G.723 ITU-T G.728Bandwidth: 16 kbit/s Delay: 5 samples, 0.625 ms LDCELP = Low Delay Code Excited Linear Prediction ITU-T G.729Bandwidth: 8 or 6.4, 11.8 kbit/s Delay: 10 ms chunks CS-ACELP = Conjugate-Structure Algebraic-Code-Excited Linear Prediction ITU-T G.722Sample rate: 16kHz Sample size: 14-bit Bandwidth: 32-64kbit/s Standard wideband speech ADPCM codec. Non-telephony CODEC

37 DRAFT V03 DRAFT V03 36 Digital Transport  CODECs can increase channel capacity 1 E1/T1 channel = 64 kbit/s = 1 channel of G.711 = 2 channels of G.726 @ 32 kbit/s = 8 channels of G.729 @ 8 kbit/s 1 T1 = 24 channels = 24 channels of G.711 = 48 channels of G.726 @ 32 kbit/s = 196 channels of G.729 @ 8 kbit/s  BUT, CODECs can reduce voice quality

38 DRAFT V03 DRAFT V03 37 Digital Transport Advanced topics  Unreliable tone transmission on some CODECs  DTMF, Fax, Modem etc.  Use “out-of-band” communication (more later)  Silence suppression  Voice Activity Detection  Replace by simulated background noise  e.g. G.729 Annex B  Comfort noise generator (CNG)  Played when a communication channel fails temporarily  Usability: Reduces hang-up on temporary outages

39 DRAFT V03 DRAFT V03 38

40 Voice Over IP – Basics

41 DRAFT V03 DRAFT V03 40 Module Overview Voice Over IP Basics  Internet and intranet for voice communication  Challenges of the internet protocols  Quick guide to the internet protocols: TCP/IP, UDP  Applying internet protocols to session management & voice carriage

42 DRAFT V03 DRAFT V03 41 Telephony: Packet Switch Network …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… PSTN: Public Switched Telephony Network

43 DRAFT V03 DRAFT V03 42 Voice Over Internet Protocol …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… …0101100… …0011100… …1100110… …1101100… …0001100… Intranet OR Internet

44 DRAFT V03 DRAFT V03 43 Challenges  Call establishment protocols  Connect A-party and B-party  Perform advanced telephony functions  Audio transport protocols  Get the packets from A-to-B / B-to-A on time  Latency  Packet loss  Jitter  Quality of service

45 DRAFT V03 DRAFT V03 44 The Problem The internet was not designed to carry real-time voice traffic!!! (well almost)

46 DRAFT V03 DRAFT V03 45 Internet Protocol Suite Stack Link Layer 1 Network Layer 2 Transport Layer 3 Application Layer 4 Ethernet, Wi-Fi, ATM, Frame Relay… IP (IPv4, IPv6) TCP, UDP, SCTP, DCCP, IL, RUDP, … DNS, FTP, HTTP, SMTP, SNMP, TELNET, SIP, RTP, H.323…

47 DRAFT V03 DRAFT V03 46 Voice Over Internet Protocol Internet Application

48 DRAFT V03 DRAFT V03 47 Internet Protocol Suite Stack Link Network Transport Application Link Network Transport Application Peer-to-peer connection Link Network Link Network

49 DRAFT V03 DRAFT V03 48 Internet Protocol Suite Stack Link Network Transport Application Link Network Transport Application Peer-to-peer connection Link Network Link Network 0101100

50 DRAFT V03 DRAFT V03 49 Internet Protocol Suite Stack Link Network Transport Application Link Network Transport Application Peer-to-peer connection Link Network Link Network 0101100 Packet Loss

51 DRAFT V03 DRAFT V03 50 Internet Protocol Suite Stack TCP/IP = Transmission Control Protocol  Transport protocol  One of the core protocols of the Internet protocol suite: 75% of all traffic  Applications on networked hosts can create connections to one another using TCP for exchange of data or packets.  Guarantees reliable and in-order delivery of data from sender to receiver  Distinguishes data for multiple, concurrent applications on the same host  TCP supports many of the most popular application protocols including HTTP (Web), Email and SIP  Send a stream of bytes through a virtual “pipe”  Utilizes sequence numbers, acknowledgement, timeout, retransmission…

52 DRAFT V03 DRAFT V03 51 Internet Protocol Suite Stack TCP/IP for Voice Over IP  Good for session management  H.323 / ASN.1 built on TCP/IP  Sub-optimal for near-realtime audio transport  Latency  Jitter  Aside: Utilized for Skype

53 DRAFT V03 DRAFT V03 52 Internet Protocol Suite Stack UDP = User Datagram Protocol  Transport protocol:  One of the core protocols of the Internet protocol suite: 20% of all traffic  Does not provide the reliability and ordering guarantees of TCP  Datagrams may arrive out of order or be dropped by the network  Datagram transmission is stateless in the network  Lower overhead = faster, more efficient, suited to time-sensitive comms  UDP supports many application protocols including DNS and RTP

54 DRAFT V03 DRAFT V03 53 Internet Protocol Suite Stack UDP for Voice Over IP  OK for session management  End-parties need reliable communication about session status  SIP utilises UDP with retry behaviour to withstand packet loss  UDP offers faster setup time than TCP/IP  Suited to near-realtime audio transport  Utilized by RTP  Better latency than TCP/IP (though not ideal)  Better jitter than TCP/IP (though not ideal)  Packet loss causes poorer audio transmission than TCP/IP

55 DRAFT V03 DRAFT V03 54 Internet Protocols for Voice Over IP Internet vs. PSTN  Internet has smart terminals and “dumb” network  PSTN has “dumb” terminals and smart network  PSTN dedicates virtual connections for audio and session  Internet normally creates connections on an as-needed basis  Internet protocols emerging for traffic shaping suited to telephony  Network tools also emerging for banishing and punishing telephony

56 VoIP Protocols

57 DRAFT V03 DRAFT V03 56 Module Overview VoIP Protocols  Overview of the landscape of VoIP protocols  Major IETF standards: SIP, RTP, RTCP  Major ITU-T standards: H.323 family

58 DRAFT V03 DRAFT V03 57 VoIP Protocols: IETF ProtocolDescription SIPSession Initiation ProtocolSession management on UDP RTP SRTP Real-time Transport Protocol Secure RTP Audio/video media delivery on UDP RTCP SRTCP Real-time Transport Control Protocol Secure RTCP Out-of-band control protocol for RTP

59 DRAFT V03 DRAFT V03 58 VoIP Protocols About SIP  IETF Protocol  Standard for initiating, modifying, and terminating a user session that may involves media elements such as voice, video, instant messaging etc.  Used widely in telephony environments  Supported by numerous IVR platforms  Accepted in 2000 as the signalling protocol of the IMS architecture  Other uses  MRCP v2: Media Resource Control Protocol for Speech Recognition and Text-to- Speech  Microsoft Messenger

60 DRAFT V03 DRAFT V03 59 VoIP Protocols: ITU-T ProtocolDescription H.323 Umbrella recommendation for audio-visual comms on any packet network References the following specifications H.225.0 Protocol to describe call signaling, the media (audio and video), the stream packetization, media stream synchronization and control message formats H.245 Control protocol for multimedia communication with messages and procedures used for opening and closing logical channels for audio, video and data, capability exchange, control and indications H.235Describes security in H.323 H.329 Describes dual stream use in videoconferencing, usually one for live video, the other for presentation

61 DRAFT V03 DRAFT V03 60 VoIP Protocols About H.323  Based on ISDN Q.931  Suited to internetworking between IP and ISDN / QSIG  Similar call model to ISDN  Used widely in telephony environments  Telecommunications backbones  Other uses  Microsoft NetMeeting

62 DRAFT V03 DRAFT V03 61 Today’s Focus We’ll focus on SIP today

63 SIP: Session Initiation Protocol

64 DRAFT V03 DRAFT V03 63 Module Overview Session Initiation Protocol  Components of the SIP architecture  SIP Messaging  Standard SIP exchanges  Ring, hold, answer, transfer, consultative transfer, conferencing  SIP addresses  SIP working with RTP for Voice

65 DRAFT V03 DRAFT V03 64 SIP Overview Session Initiation Protocol (SIP) There are many applications of the Internet that require the creation and management of a session, where a session is considered an exchange of data between an association of participants. The implementation of these applications is complicated by the practices of participants: users may move between endpoints, they may be addressable by multiple names, and they may communicate in several different media - sometimes simultaneously. Numerous protocols have been authored that carry various forms of real-time multimedia session data such as voice, video, or text messages. The Session Initiation Protocol (SIP) works in concert with these protocols by enabling Internet endpoints (called user agents) to discover one another and to agree on a characterization of a session they would like to share. For locating prospective session participants, and for other functions, SIP enables the creation of an infrastructure of network hosts (called proxy servers) to which user agents can send registrations, invitations to sessions, and other requests. SIP is an agile, general-purpose tool for creating, modifying, and terminating sessions that works independently of underlying transport protocols and without dependency on the type of session that is being established.

66 DRAFT V03 DRAFT V03 65 SIP Overview Session Initiation Protocol (SIP)   Protocol developed by the IETF MMUSIC Working Group (now SIP)  Scope: initiate, modify and terminate an interactive user session that involves voice and multimedia elements such as video, instant messaging and games.  SIP 2.0 published as RFC 3261 in 2002  Initial release of SIP 1.0 as RFC 2543 in 1996 (now obsolete)  SIP enables device-to-device communication with media communication via other protocols  SDP: Session Description Protocol - RFC 2327 (describe media capabilities)  RTP: Real-time Transport Protocol - RFC 3550 (transport audio, video, media)  RTCP: Real-time Transport Control Protocol - RFC 3550 (control transport of media)  Standard protocols with high level of product interoperability

67 DRAFT V03 DRAFT V03 66 SIP Overview Session Initiation Protocol (SIP)  SIP is an “application-layer” control protocol in Internet stack  SIP is an device-to-device, client-server session signalling protocol  SIP establishes sessions for voice and other media  Allows integration with others services: web, email, IM…  Allows presence and mobility services

68 DRAFT V03 DRAFT V03 67 SIP Overview Applications of SIP  SIP can convey arbitrary payload  Session description  Instant messages  Pictures (e.g. picture of the caller)  Speech recognition control  Web pages

69 DRAFT V03 DRAFT V03 68 SIP Overview: Devices Cisco xTen Siemens Avaya BlackBerry Express Talk

70 DRAFT V03 DRAFT V03 69 SIP Overview Network Devices  SIP Proxy Server  Intermediary to relay call signalling  SIP Redirect Server  Redirects callers to other servers  SIP Registrar  Accept registration requests from users  Maintains user’s whereabouts  SIP IVR  SIP PBX

71 DRAFT V03 DRAFT V03 70 SIP Communications SIP Jargon  User Agent Client = Initiates a communication  User Agent Server = Respondent to a communication  Note: device can be both client and server in a single session  Examples:  Desktop phone is both a client (makes calls) and server (receives calls)

72 DRAFT V03 DRAFT V03 71 SIP Communications SIP Addresses  SIP address can make you globally reachable  Callees bind to this address using SIP REGISTER method  Callers use this address to establish real-time communication with callees  SIP address is a URI address format:     Can embed in web pages or place on your business card  Highlighted text is the public identifier  SIP URI address contents  Must include host  May include user name  May include the port number  May include others parameters (e.g., transport)

73 DRAFT V03 DRAFT V03 72 SIP Communications Protocol design  Similar protocol to HTTP and SMTP  Transmission via UDP messages  Human-readable messages  Simple interaction mechanism  User Agent “A” sends a Control Message to User Agent “B”  User Agent “B” sends a Response Code to User Agent “A”  Retry in the event of communication failure

74 DRAFT V03 DRAFT V03 73 SIP Communications SIP Methods (Control Messages)  INVITE = invite a user agent to a session  ACK = acknowledge a communication  OPTIONS = query servers about their capabilities  REGISTER = register with a SIP Registrar  BYE = terminate a session  CANCEL = cancel a session

75 DRAFT V03 DRAFT V03 74 SIP Communications SIP Response Codes  1xx: Provisional -- request received, continuing to process the request  2xx: Success -- the action was successfully received, understood, and accepted  200 OK  3xx: Redirection -- further action required to complete the request  4xx: Client Error -- the request contains bad syntax or cannot be fulfilled  5xx: Server Error -- the server failed to fulfil an apparently valid request  6xx: Global Failure -- the request cannot be fulfilled at any server

76 DRAFT V03 DRAFT V03 75 SIP Communications SIP Extensions (selection amongst many)  INFO = carry session-related control information  RFC 2976  e.g. ISUP and ISDN signalling messages  REFER = refer the recipient to a new resource  RFC 3515  e.g. Call transfer

77 DRAFT V03 DRAFT V03 76 SIP: Simple Peer-to-Peer Session “A” “B” Simple peer-to-peer SIP session  Assumptions  “A” knows address of “B”  “A” and “B” can see each other on the network  Audio communication  Humans are using “A” and “B”

78 DRAFT V03 DRAFT V03 77 SIP: Simple Peer-to-Peer Session “A” “B” INVITE From:A To:B A-SDP 100 TRYING 180 RINGING  Play ring-tone to user B Play ring-tone to user A  User makes a call to “B”   B User answers call 200 OK B-SDP RTP Audio Stream ACK A hears B   B hears A  B hangs up BYE 200 OK  B terminates RTP audio - - - Call Established - - - A terminates audio  - - - Session Over - - - - - - Waiting for answer - - -

79 DRAFT V03 DRAFT V03 78 SIP: Simple Peer-to-Peer Session “A” “B” INVITE From:A To:B A-SDP 100 TRYING 180 RINGING  Play ring-tone to user B Play ring-tone to user A  User makes a call to “B”   B User answers call 200 OK B-SDP RTP Audio Stream ACK A hears B   B hears A  B hangs up BYE 200 OK  B terminates RTP audio - - - Call Established - - - A terminates audio  - - - Session Over - - - - - - Waiting for answer - - - SIP Messages Status codes RTP

80 DRAFT V03 DRAFT V03 79 SIP Communications SIP Message: Example INVITE sip:0282078101@ SIP/2.0 Via: SIP/2.0/UDP From: ;tag=40A0C340-2BC To: Date: Fri, 07 Jul 2006 01:58:55 GMT Call-ID: FB5F1FE3-C9211DB-B22481EA-DE5FE23A@ Supported: timer,100rel Min-SE: 1800 Cisco-Guid: 4217155242-210899419-2988540394-3730825786 User-Agent: Cisco-SIPGateway/IOS-12.x Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, PRACK, COMET, REFER, SUBSCRIBE, NOTIFY, INFO CSeq: 101 INVITE Max-Forwards: 6 Remote-Party-ID: ;party=calling;screen=yes;privacy=off Timestamp: 1152237535 Contact: Expires: 180 Allow-Events: telephone-event Content-Type: application/sdp Content-Length: 235 > Header To & From Unique call ID User agent details Transmission info Multi-part media content SDP removed

81 DRAFT V03 DRAFT V03 80 SIP Communications SDP – Session Description Protocol  RFC 2327  Describe media capability of a SIP user agent v=0 o=Holly-HVG-4-2 2890844526 2890842809 IN IP4 s=sip call from the hvg c=IN IP4 t=0 0 m=audio 11946 RTP/AVP 8 101 c=IN IP4 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-16 v=0 o=Holly-HVG-4-2 2890844526 2890842809 IN IP4 s=sip call from the hvg c=IN IP4 t=0 0 m=audio 11946 RTP/AVP 8 101 c=IN IP4 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-16 Agent Description Audio Capability #1 RTP Protocol Agent Address G.711 u-law & A-law 8000Hz DTMF events 0123456789ABCD*#

82 DRAFT V03 DRAFT V03 81 SIP Communications SDP Protocol Structure v= (protocol version) o= (owner/creator and session identifier). s= (session name) i=* (session information) u=* (URI of description) e=* (email address) p=* (phone number) c=* (connection information) b=* (bandwidth information) One or more time descriptions z=* (time zone adjustments) k=* (encryption key) a=* (zero or more session attribute lines) Zero or more media descriptions m= (media name and transport address) i=* (media title) c=* (connection information - optional if included at session-level) b=* (bandwidth information) k=* (encryption key) a=* (zero or more media attribute lines) Time description t= (time the session is active) r=* (zero or more repeat times) * = optional

83 DRAFT V03 DRAFT V03 82 SIP Registration “A” Registration Process SIP Registrar REGISTER 200 OK - - - Repeat regularly (5, 60+ min) - - - REGISTER Via: IP:Host From: “Andrew Hunt” To: CallID: > Expires: 3600 REGISTER Via: IP:Host From: “Andrew Hunt” To: CallID: > Expires: 3600

84 DRAFT V03 DRAFT V03 83 SIP: SIP Session Via a Proxy INVITE From:A To:B A-SDP 100 TRYING 180 RINGING  Play ring-tone to user B Play ring-tone to user A  User makes a call to “B”   B User answers call 200 OK B-SDP RTP Audio Stream ACK A hears B   B hears A BYE 200 OK  B terminates RTP audio - - - Call Established - - - A hangs up  - - - Session Over - - - - - - Waiting for answer - - - “A” SIP Proxy “B” INVITE From:A To:B A-SDP 100 TRYING 180 RINGING 200 OK B-SDP ACK BYE 200 OK A terminates RTP audio 

85 DRAFT V03 DRAFT V03 84 SIP: SIP Proxy Chaining “A” “B” SIP Proxy

86 DRAFT V03 DRAFT V03 85 SIP Proxy SIP Proxy Function  Serve as rendezvous point at which callees are reachable  Perform routing function  Select the next hop or hops when chaining  Forking: try multiple destinations in parallel or sequence  Avoid loops when chaining  Available capabilities  Programmable routing decisions & tables  Least-cost routing  Firewall traversal  Direct certain calls to PSTN via gateway (e.g. 911, local calls)

87 DRAFT V03 DRAFT V03 86 SIP: Session Via a PBX INVITE From:A To:B A-SDP 100 TRYING 180 RINGING  Play ring-tone to user B Play ring-tone to user A  User makes a call to “B”   B User answers call 200 OK B-SDP RTP Audio Stream ACK A hears B  BYE 200 OK  B terminates RTP audio - - - Call Established - - - - - - Session Over - - - - - - Waiting for answer - - - “A” SIP PBX (e.g. Asterisk) “B” INVITE From:P To:B P-SDP 100 TRYING 180 RINGING 200 OK B-SDP ACK BYE 200 OK A terminates RTP audio  REGISTER OK REGISTER OK RTP Audio Stream  B hangs up

88 DRAFT V03 DRAFT V03 87 SIP: Session Via a PBX with Redirect (Direct Audio Link) BYE 200 OK  B terminates RTP audio - - - Call Established - - - - - - Session Over - - - “A” SIP PBX (e.g. Asterisk) “B” BYE 200 OK A terminates RTP audio   B hangs up (Re)INVITE From:P To:B A-SDP 200 OK RTP Audio Stream (Re)INVITE From:P To:A B-SDP RTP Audio Stream - - - Call Continues - - -

89 DRAFT V03 DRAFT V03 88 SIP-TDM Gateway  Translate signalling messages  To/From traditional telephony and VoIP  e.g. ISUP  SIP/RTP  Support heterogeneous environments  Staged migration to VoIP  Numerous gateway products available PSTN IP TDM/ISDN SIP/RTP VoIP Gateway

90 DRAFT V03 DRAFT V03 89 SIP-TDM Gateway Signalling gateway Media Gateway Controller Media Gateway PSTN IP RTP SIP TDM ISDN

91 DRAFT V03 DRAFT V03 90 SIP-TDM Gateway  Terminates RTP audio - - - Session Over - - - TDM-VoIP Bridge SIP UA Q.931 RELEASE COMPLETE Terminates RTP audio   Pick-up INVITE 555 1234 100 TRYING RTP Audio Stream 180 RINGING Q.931 SETUP Q.931 CALL PROCEEDING Play ring-tone  User makes a call  Q.931 PROGRESS Q.931 CONNECT  Phone rings 200 OK ACK BYE Q.931 DISCONNECT 200 OK Hang up  Q.931 RELEASE - - - Call in Progress - - -

92 DRAFT V03 DRAFT V03 91 SIP: Sending Auxiliary Information User makes a call  “A” INVITE sip:0282078101@ SIP/2.0 > Content-type: multipart/mixed; boundary="gc0p4Jq0M2Yt08jU534c0p" MIME-version: 1.0 This is a multi-part message in MIME format. --gc0p4Jq0M2Yt08jU534c0p Content-Type: application/sdp Content-Length: 235 > --gc0p4Jq0M2Yt08jU534c0p Content-type: image/jpeg PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogI CAgPHA+VGhpcyBpcyB0aGUgYm9keSBvZiB0aGUgbWVzc2Fn ZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg== --gc0p4Jq0M2Yt08jU534c0p “Brad calling”

93 DRAFT V03 DRAFT V03 92 SIP: Failed Establishment “A” “B” INVITE From:A To:B A-SDP User makes a call to “B”  INVITE From:A To:B A-SDP Timeout  - - - Retries - - - “B” gives up  INVITE From:A To:B A-SDP Timeout    

94 RTP: Real-time Transport Protocol

95 DRAFT V03 DRAFT V03 94 Module Overview RTP: Real-time Transport Protocol  Really real-time?  RTP session overview  RTP packets  RTP and network transmission

96 DRAFT V03 DRAFT V03 95 Real-time Transport Protocol  Standard for real-time transport over IP networks  Streaming audio and video  Utilised in SIP/RTP and H.323  Adopted by 3GPP for next generation cellular telephony  Widespread use in streaming: QuickTime, Real, Microsoft  RTP assumes  Network is dumb and imperfect, end-points are smart  Network may exhibit delays, jitter, packet loss etc.  Real-time Transport Protocol is NOT REAL-TIME  No end-to-end protocol, including RTP, can ensure in-time delivery. This would require the support of lower layers (switches, routers etc.)  RTP provides functionality suited for carrying real-time content, e.g., a timestamp and control mechanisms for synchronizing different streams with timing properties

97 DRAFT V03 DRAFT V03 96 Real-time Transport Protocol  One RTP session transmits one media type  Audio / voice  Video  Multi-media requires multiple RTP sessions  RTP session:  Implements a particular RTP profile  Includes an RTP data flow  Transports a single media type according to one or more payload formats  e.g. audio in G.711 format  Includes an RTP control protocol flow  Providing reception quality feedback, user information, etc.  Associates:  Source and destination IP addresses  A pair of UDP ports: one for RTP, one for RTCP

98 DRAFT V03 DRAFT V03 97 Real-time Transport Protocol Media Content Supports mixing e.g. for audio conferencing Detect packet loss Time of last payload sample

99 DRAFT V03 DRAFT V03 98 Real-time Transport Protocol Source Destination Playback Buffer Network Packet loss Out-of-order recovery Late packet loss

100 Network Issues and Design

101 DRAFT V03 DRAFT V03 100 Module Overview Network Issues and Design  Quality of Service  Packet loss, Latency, Jitter  Managing Quality of Service  Network quality  MPLS  Silence suppression  Local vs. long distance  Network design  Firewalls: NAT & STUN

102 DRAFT V03 DRAFT V03 101 Quality of Service QoS Definition  Probability of the network meeting a given traffic contract  Informally refers to the probability of a packet succeeding in passing between two points in the network within its desired latency period

103 DRAFT V03 DRAFT V03 102 Quality of Service What can go wrong as packets go from “A” to “B”?  Dropped packets: routers might fail to deliver (drop) some packets if they arrive when their buffers are full. Some, none, or all of the packets might be dropped, depending on the state of the network, and it is impossible to determine what happened in advance.  Delay: it may take a long time for a packet to get from “A” to “B” because it gets held up in long queues or takes a less direct route to avoid congestion  Jitter: packets from source will reach the destination with different delays, sometimes by taking different routes  Out-of-order delivery: different packets may take different routes with enough difference in delay to change the order of arrival  Error: sometimes packets are misdirected, or combined together, or corrupted, while en route

104 DRAFT V03 DRAFT V03 103 Quality of Service

105 DRAFT V03 DRAFT V03 104 Quality of Service  QoS issues can have a major impact on real-time media streaming  Packet loss  Missing packet replaced by silence  Jitter & out-of-order  Abrupt and unwanted variation in packet arrival timing  Arrival of packets out-of-order Original Packet loss Jitter

106 DRAFT V03 DRAFT V03 105 Quality of Service  Measure QoS on routers and end-points  QoS tends to degrade with network size and congestion  Managing Quality of Service  Generously over-provision a network – expensive and does not scale  Reserve network resources: e.g. RSVP = Resource Reservation Protocol  DiffServ: Differentiated services for bulk flows (e.g. packets from a university)  Multi-Protocol Label Switching (MPLS): emulates some properties of a circuit- switched network over a packet-switched network.  Traffic shaping: control computer network traffic to optimize or guarantee performance, low latency, and/or bandwidth. Traffic shaping deals with concepts of classification, queue disciplines, enforcing policies, congestion management, quality of service (QoS), and fairness.  Silence suppression

107 DRAFT V03 DRAFT V03 106 Quality of Service QoS on Local Network  QoS generally not a major issue in single-site deployments  Dedicate separate switches for voice and data traffic  Provide redundant networks for voice and data traffic  Gigabit Ethernet  Monitor QoS  Alarm network outages

108 DRAFT V03 DRAFT V03 107 Firewalls  Network Address Translation (NAT): re-writing the source and/or destination addresses of IP packets as they pass through a router or firewall  Aka network masquerading or IP-masquerading  Firewalls use NAT to enable multiple hosts on a private network to access the Internet using a single public IP address  Common in home and SOHO routers  SDP addresses are not translated by NAT  STUN (Simple Traversal of UDP over NATs, RFC 3489):  Network protocol allowing clients behind NAT to find its (a) public address, (b) the type of NAT and (c) the internet side port associated by the NAT with a particular local port.  Info is used to set up UDP communication between two machines both behind NAT routers

109 VoIP and Speech Recognition

110 DRAFT V03 DRAFT V03 109 Module Overview VoIP and Speech Recognition  MRCP v1 & v2  Impact of CODECs  Network issues: packet loss, latency

111 DRAFT V03 DRAFT V03 110 MRCP: Media Resource Control Protocol MRCP v1  IETF protocol  Client control of speech resources  Speech recognition  Text-to-speech  MRCP structure is similar to HTTP and SIP  Request by client in header+body format  Response by server  Media delivery typically via RTP  Widely supported by VoiceXML Platforms  Leverages existing W3C standards for speech recognition and TTS markeup

112 DRAFT V03 DRAFT V03 111 MRCP: Media Resource Control Protocol MRCP v2  IETF protocol to supersede MRCP v1  Broader client control of speech resources  Adds speaker verification and speaker identification  Adds recording  Utilizes SIP+SDP to establish the media pipe

113 DRAFT V03 DRAFT V03 112 VoIP and Speech Recognition Speech Recognition and CODECs  Lossless CODECs do not affect speech recognition accuracy  No loss of information  Lossy CODECs can affect speech recognition accuracy  Greater compression tends to cause great degration  Speech recognizers are generally very reliable with widely deployed CODECs  Mobile telephony has extensive compression  Speech recognition trained explicitly for mobile performance  DSR Aurora: Distributed Speech Recognition  CODEC specialized for the requirements of speech recognition  Promoted for mobile carrier usage

114 DRAFT V03 DRAFT V03 113 Speech Recognizer VoIP and Speech Recognition Speech Recognition and Latency  Speech recognition not as sensitive to latency as humans  Late packet is better than no packet  Speech recognizers have extensive buffers for non-real-time processing  Note: excessive latency (>1sec) can cause caller perceived service issues Buffer RTP Result

115 DRAFT V03 DRAFT V03 114 Speech Recognizer VoIP and Speech Recognition Speech Recognition and Packet Loss  Speech recognition are sensitive to packet loss  ASR can use packet loss information to minimize error reduction Buffer RTP Result

116 VoIP and Mobile Telephony

117 DRAFT V03 DRAFT V03 116 Module Overview VoIP and Mobile Telephony  Analog mobile  Digital mobile  3G and SIP  IMS – IP Multi-Media Subsystem Architecture

118 DRAFT V03 DRAFT V03 117 Mobile Telephony Analog Mobile  Experimental systems from 1920s  “1G” – 1 st Generation  AMPS: Advanced Mobile Phone Services  Analog transmission  1978: Trial in Chicago  1979: Commercial launch in Japan  1981: Commercial launch in Sweden, Norway, Denmark, Finland  1983: Commercial launch in Chicago  Issues: limited capacity, fraud, subscriber volume…

119 DRAFT V03 DRAFT V03 118 Mobile Telephony 2G – 2 nd Generation Mobile  Objectives achieved  Digital technology  Increased capacity  Greater security against fraud  Global roaming  Advanced services  Lower power = smaller handsets = longer battery life  Many standards evolved (examples)  GSM: Pan-European standard that spread globally  CDMA: Americas and parts of Asia (aka PCS)  PDC: Japan  Limitations:  Optimized for voice – not suited to data

120 DRAFT V03 DRAFT V03 119 Mobile Telephony 2.5G – Stepping Stone from 2G to 3G  2G system with both packet switching (for data) and circuit switching (for voice)  2.5G is a marketing term – not a standard  Objectives achieved  Re-use of much 2G infrastructure (GSM & CDMA)  Data rate of 144 kbit/sec or better  Used for sending photos and much more

121 DRAFT V03 DRAFT V03 120 Mobile Telephony 3G – 3 rd Generation Mobile  Combines high-speed mobile access with Internet Protocol (IP) based services  Covers range of network technologies  WCDMA, CDMA2000, UMTS, EDGE  Data rate: 384kbps for mobile systems and 2Mbps for stationary systems  Enables video, TV, images, music, games, location services…

122 DRAFT V03 DRAFT V03 121 Mobile Telephony 3GPP – 3 rd Generation Mobile  Collaboration agreement (Dec-98) between ETSI (Europe), ARIB/TTC (Japan), CCSA (China), ATIS (North America) and TTA (South Korea).  Goal: global 3G specification within the scope of the ITU's IMT-2000 project  3GPP specifications are based on evolved GSM specifications  Now generally known as the UMTS system  Introduced IMS…

123 DRAFT V03 DRAFT V03 122 Mobile Telephony IMS – IP Multi-Media Sub-System  Emerged in 3GPP Release 5 with following enhancements  Principles  Access independence: work with fixed, mobile or wireless networks  Different network architectures: implement on operator-selected architectures  Terminal and user mobility: provides terminal mobility (roaming)  Extensive IP-based services: offer just about any IP-based service. VoIP, push-to- talk over cellular (POC), multiparty gaming, video conferencing, messaging, community services, presence information, content sharing…

124 DRAFT V03 DRAFT V03 123 Mobile Telephony IMS is Built on SIP  3GPP Variant of SIP  Application servers for SIP session management  Caller ID, call waiting, call forwarding, transfer, call blocking, interception, announcements, conferencing, voice-mail, SMS…  CSCF: Call Session Control Function and other functions by SIP  Media Resource Function (MRF): SIP end-point with IVR-like functionality  TDM-VoIP gateways to bridge to fixed and mobile telephony  Who’s in control?  TDM = dumb terminals, smart network  Internet VoIP = smart terminals, dumb network  IMS = dumb/smart terminals, smart network

125 Closing

126 DRAFT V03 DRAFT V03 125 Module Objectives  Go home!!

127 DRAFT V03 DRAFT V03 126 Further Information Web sites  IETF:  SIP Tutorial:  SIP Home Page:  SIP Forum:  Asterisk PBX:  VoIP Wiki Reference:  SIP Knowledge:  SIP FAQ:  SIP Tech Portal:

128 Thank you!!!

Download ppt "SpeechTEK 2006 – Voice Over IP Tutorial Andrew Hunt, Ph.D. VP Engineering, Holly Connects Andrew Hunt, Ph.D. VP Engineering, Holly Connects."

Similar presentations

Ads by Google