Presentation is loading. Please wait.

Presentation is loading. Please wait.

Good day. I am the latest (2001) British voice from Bell Laboratories, part of Lucent Technologies. In a moment, David Thomson will talk, but if you will.

Similar presentations

Presentation on theme: "Good day. I am the latest (2001) British voice from Bell Laboratories, part of Lucent Technologies. In a moment, David Thomson will talk, but if you will."— Presentation transcript:


2 Good day. I am the latest (2001) British voice from Bell Laboratories, part of Lucent Technologies. In a moment, David Thomson will talk, but if you will first permit me, I would like to reminisce about my great, great, great grandfather….

3 Pedro the Voder (1939) Pedro the Voder Homer Dudley, Bell Labs scientist

4 Controls for Pedro Specialized keyboard Switches pressed by fingers One for each hand One year of practice

5 Pedro the Voder - 1939 Worlds Fair (no computers) Trained Operator Impressed audience members Difficult, complex job: real-time parameter control. Replacing the announced program we present... Radio Magic! Now we transfer you to the American Telephone and Telegraph building at the New York World's Fair. Will you please, Mr. Voder, say for our eastern listeners, good evening, radio audience. Good evening, radio audience. And now for our western listeners, say good afternoon, radio audience. Good afternoon, radio audience.

6 5Copyright © 2009 SpinVox Ltd Crossing Speech Technology Thresholds David Thomson, SVP, Speech Technology SpinVox AVIOS 2009 Tel Aviv, Israel

7 6Copyright © 2009 SpinVox Ltd Dimensions of difficulty Friendly Environment Harsh Medical/Legal Transcription Unknown topic Irregular grammars Large vocabulary Difficult (short) words Constrained topic User confirmation Small vocabulary Speaker-trained Careful users Quiet background Clean phone lines Speaker-independent Naïve users Noise / dropouts Speech Compression IVR Desktop Dictation Automotive (Telematics) Voice Dialing Video Games Directory Assistance Hard Task Complexity Easy Web Search Known topic Small-medium vocabulary Cooperative users Confirmation prompts Simple grammars Long, distinct words Moderate vocabulary size Speaker-trained Trained, motivated users Quiet background Clean phone lines Voicemail Conversion Unknown topic Very large vocabulary Informal grammar Dropouts, distortion, noise Users are unaware Speaker-independent Wide range of accents

8 7Copyright © 2009 SpinVox Ltd Chronology 1 YEARTECHNOLOGYDEPLOYED APPLICATION 1936Speech synthesisHomer Dudleys Vocoder 1952Speech recognitionFirst isolated digit recognizer 1982DSP devicesLiberty Phone – voice dialing 1986Speaker-independentFirst automated operator trial 1989HMMSpanish call routing – 3 word vocabulary 1990Connected digitWireless digit dialing trial 1991WordspottingSecond automated operator trial 1992Barge-inFirst automated operator service (VRCP) 1993Speaker verificationVoiceprint trial for ATMs 1993Isolated digitInbound 800 call routing in the U.S. 1994Finite state grammarWILDFIRE personal assistant 1994New languagesMulti-lingual voice dialing service 1995Waveform matchingAuto-dialing for directory assistance 1995Noise resistanceName dialing for wireless phones

9 8Copyright © 2009 SpinVox Ltd Chronology 2 YEARTECHNOLOGYDEPLOYED APPLICATION 1995Isolated wordAutomated collect call acceptance in Japan 1995Finite state grammarMovie locator trial 1996Beam search1000 company name voice dialing trial 1996Discriminative training Calling card connected digit dialing service 1996Large vocabularyCharles Schwab stock quotes 1996Complex grammarsDialogue-based personal assistant 1997MultilingualARISE – European train schedules 1997Large vocabularyConnected word desktop dictation 1998n-gram SLMsNatural language call routing for USAA 1999SubwordMovieFone (777-FILM) showtimes 2001Natural languageAT&Ts How May I Help You? 2002Short phrasesAmtrak train schedules & fares (Julie) 2004Message conversionAssisted voicemail to text conversion

10 9Copyright © 2009 SpinVox Ltd DSPs enable real-time ASR: 1982-1995 Products/services Liberty Phone (1982) Call routing (1989) ATM voiceprints (1993) Technology Speaker-independent Wordspotting OOV rejection Hidden Markov Models

11 10Copyright © 2009 SpinVox Ltd General-purpose processors: 1996-present Better for complex grammars Easier to program Inexpensive stock hardware Technology RISC (PowerPC) 1995-2000 PC (Pentium/AMD) 2000-present

12 11Copyright © 2009 SpinVox Ltd Network speech processing hardware 0 200 400 600 800 1000 1200 1400 1600 Megaflops 0 25 50 75 100 125 150 175 200 Megabytes Megaflops Megabytes 1988 1990 1993 1995 1996 1998 1999 DSP RISC

13 12Copyright © 2009 SpinVox Ltd High capacity cooling

14 13Copyright © 2009 SpinVox Ltd Voice Activated Call Routing - 1989 Caller Network Adjunct Accounts Loans Other For accounts, press or say 1, for loans, press or say 2, for other information, press or say 3.

15 14Copyright © 2009 SpinVox Ltd Voice Prompter - Overview For "Advanced 800" calls Deployed in Spain in 1989 Three-word vocabulary Uno, dos, and tres Deployed the U.S. in 1993 Took 900 million calls/year Technology: Diverse accents. First wordspotting in the network Robust to external noise Simple OOV rejection Please say one for Dept. X, two for Dept Y,... One 1-800-854-1928

16 15Copyright © 2009 SpinVox Ltd Automated operator 1992 Automated Attendant Network 5ESS Attendant Position Switch Collect

17 16Copyright © 2009 SpinVox Ltd AT&T Automated Operator - 1992 Five word vocabulary (collect, calling card, operator, …) Followed trials in 1986 and 1990 3,000,000 calls per day Cut operator workforce from 6000 to 3000 Saved AT&T billions over 10 years Technology Speaker-independent Wordspotting (Collect call, please) OOV rejection (1986 trial) Barge-in (1991)

18 17Copyright © 2009 SpinVox Ltd Digit recognition progress 1982198819971994200019911985 70% 75% 80% 85% 90% 95% 100% Isolated Word Connected digit strings Digital Networks Pin-drop ad 1986 Cell phones VoIP

19 18Copyright © 2009 SpinVox Ltd Wireless digit dialing 1991-1995 Three-one-two- seven-one-three- five-two-eight-three Network Voice Adjunct Cell Site SpeechNumber 312-713-5283

20 19Copyright © 2009 SpinVox Ltd Trial #1 – Wireless models (4.8%) Trial #2 – Used speakerphone models, faster processor (3.3%) Optimized on Trial #2 data (2.1%) What we promised the customer (6%) Our predicted accuracy (3%) What the customer wanted (1.5%) Initial simulations – wireless data with landline models (8.3%) Connected digit progress 1990-1993 9% 0% 2% 6% 8% 7% 5% 4% 3% 1% 10/19904/199110/19914/199210/19924/1993 Per-digit Error Rate

21 20Copyright © 2009 SpinVox Ltd Multilingual digit dialing - 1992 Low international touch-tone penetration International numbers have unknown length Lacked good multilingual ASR models Technology Isolated digit w/beeps New languages Lesson learned: Validate the market Test with live users

22 21Copyright © 2009 SpinVox Ltd Wireless name dialing - 1995 SBC trial 48 participants from SBC 4300 files Assumed cellular calls Accuracy: Correct89.5% Errors 4.0% Rejection 6.5% Technology Noise preprocessing Wireless subword models OOV rejection (Dial tone) Calling home Call home

23 22Copyright © 2009 SpinVox Ltd Network service: connected digits - 1996 Corporate phone calling cards Credit card help desk 300M calls/year Technology Discriminative training Head-body-tail models Reliable connected digit Early decision Please enter your card number 108-081-8214-8748 Please say the number you would like to call 630-713-5283

24 23Copyright © 2009 SpinVox Ltd Early Decision Make Cancel my plane rental car hotel reservations for Madrid Chicago Tokyo Declare Answer End Point Start Point Energy Contour

25 24Copyright © 2009 SpinVox Ltd ATM speaker verification - 1993 Technology: Two levels of security: PIN and SV Random strings prevent eavesdropping Cohort normalization Deployment impediments Cost: $2000 per ATM Accuracy less than 100% Enrollment Shy customers Germs Pick up the phone and say the following digit string: 3594. 3594

26 25Copyright © 2009 SpinVox Ltd WILDFIRE - 1994 Similar services Webley Mandi (SpeechPhone) Lucy (Lucent) Technology: Internet portals Movie Locator Weather Line Messages Shopping VoiceXML (online news) Voice E-mail Business Directory Voice Dialing

27 26Copyright © 2009 SpinVox Ltd MovieFone - 1995 & 1999 Business 777-FILM was well-known ASR replaced touch-tone service Dial first 3 letters of movie name 80 million calls per year Technology Complex grammars (not SLM) Beam search Barge-in with strong echoes Hello and welcome to MovieFone...

28 27Copyright © 2009 SpinVox Ltd Movie Locator demo What science fiction movies are playing? At the Ogden 6 theater, Starship Troopers is showing at 7:30. Near Wheaton, Starship Troopers is playing at the Ogden 6 theater. What time is it showing? Wheaton. Near what city?

29 28Copyright © 2009 SpinVox Ltd Newspaper Phone the Theater MovieFone (DTMF) MovieLocator (NL) Menu-based ASR 0 11 10 8 3 sometimesneveralwaysoften 8 5 10 5 6 6 4 1 6 8 7 1 0 2 4 Menu-based vs. natural language Total = 22 subjects Trained participants used NL version for months after trial Untrained participants liked menus better

30 29Copyright © 2009 SpinVox Ltd USAA natural language call router - 1998 Caller NL Call Router Im going to refinance my house. Network 40 departments How may I direct your call?

31 30Copyright © 2009 SpinVox Ltd USAA natural language call routing - 1998 75% of callers state problem, not destination System asks follow-on questions Recognizer tested more accurate than live agents Expensive development Technology SLM Statistical parsing Data-based dialog Whats your fax number? How may I direct your call?

32 31Copyright © 2009 SpinVox Ltd Demonstration: overview Source: Jason Williams, AT&T Labs

33 32Copyright © 2009 SpinVox Ltd Video Games - 2003 Uses headset designed for player communication Players communicate with virtual team members Technology: Isolated word Very low cost hardware Noise resistance Fast response Developer API

34 33Copyright © 2009 SpinVox Ltd Voice commands Change weapons Manipulate objects Fire weapon Change location Commands to virtual players Movement and positioning Advance & attack Restrain enemy combatants Open door (controls, kick, blast)

35 34Copyright © 2009 SpinVox Ltd Context- Sensitive Menu (Opening Doors)

36 35Copyright © 2009 SpinVox Ltd Acknowledgement Yes, sir!

37 36Copyright © 2009 SpinVox Ltd Video clip – SWAT: Global Strike Team (2003)

38 37Copyright © 2009 SpinVox Ltd Founded in 2003 Speech recognition in Cambridge, U.K. VMCS live-learning ensures high quality Converting 60 million messages a month ASR trained on two billion word corpus 18 patents + 71 pending SpinVox: Voice message conversion From: Mr Smith Hope youre still on for lunch. Its been too long. See you soon - via SpinVox Listen to me : 1234 Speak SMS to me: 1567 Convert to text Voicemail server

39 38Copyright © 2009 SpinVox Ltd Business Automation Consumer Feature Share Social Networking Share Social Networking Most connected! Always online Monetize chatter Voice, ads… UC & CC Enterprise UC & CC Enterprise Professional In control, time saved Productivity $ Mobilize workforce Workflow automation Cost-reduction Messaging (SMS, eMail, IM…) Messaging (SMS, eMail, IM…) Cool, fast, anywhere Speak or type messages Messaging $ drive New voice $ Connect Satisfaction Connect first time, every time with Mail and Messenger Monetize dead air Missed calls/call continuity Web 2.0 Audio/Video Indexing Web 2.0 Audio/Video Indexing 100% search Audio & video included New Ad-Search $ Compliance Financial/Legal Audio indexing Compliance Financial/Legal Audio indexing Professional Automatically compliant Assured Closing compliance loop-holes Government Audio indexing Government Audio indexing Better protection Scalable security Voice black- hole closed Multi-play 3x/4x Cable Cos Multi-play 3x/4x Cable Cos Family networked Always in-touch Monetize home phone Brand on every screen Speech-to-Text Markets and Applications

40 39Copyright © 2009 SpinVox Ltd Extrapolating the past into the future Technology drivers: Short term: Look at current technological gaps Long term: Assume machines can do anything humans can Market drivers: Ignore hardware and software costs Focus on tasks people already do (not what they could do) Estimate time savings for useful tasks Estimate entertainment value for non-useful tasks Assume unlimited data is available anywhere

41 40Copyright © 2009 SpinVox Ltd AT&Ts vision of the future - 1993

42 41Copyright © 2009 SpinVox Ltd What are we still missing? Match human accuracy Intelligently filter out noise Natural error handling Understand the semantic meaning Applications with real world knowledge Reasonable cost to build advanced dialogs

43 42Copyright © 2009 SpinVox Ltd Task sensitivity in speech recognition 40% 20%

44 43Copyright © 2009 SpinVox Ltd The Future Viability of agents+ASR presents new service opportunities Bootstrap human-assisted services to improve ASR ASR capability may out-pace dialog technology AI systems learning from the Internet (world knowledge) Machines that people want to talk to

45 44Copyright © 2009 SpinVox Ltd Language Query Tool Helping agents and learning from them ASR NLP Validate Conversion Software ASR NLP Text out Auto-assist Fully automated Audio input Training data QueryOptions list Agent input Metadata Confidence check

46 45Copyright © 2009 SpinVox Ltd 2009 Projected Users: 125 Million 20 R&D Specialists in Cambridge, UK-based Advanced Speech Group (ASG) Six Languages 5 Continents 700+ SpinVox Create API registrations SpinVox global footprint

47 46Copyright © 2009 SpinVox Ltd Confidential © SpinVox Limited 2009. All rights reserved. SpinVox®, SpinVox® and all related trademarks and logos are proprietary to SpinVox and the property of SpinVox Limited.

Download ppt "Good day. I am the latest (2001) British voice from Bell Laboratories, part of Lucent Technologies. In a moment, David Thomson will talk, but if you will."

Similar presentations

Ads by Google