Presentation on theme: "The Case for Embedded Speech Recognition Jordan Cohen CTO Voice Signal Technologies VOX 2002."— Presentation transcript:
The Case for Embedded Speech Recognition Jordan Cohen CTO Voice Signal Technologies VOX 2002
Voice Signal Technologies VST develops efficient speech-centric multi-modal interfaces for embedded devices. –Focus on Mobile Communications –Emphasis on performance while minimizing power, footprint, and computation
What VST is about: Our Vision: Every handheld device of the future will have an intuitive, speech centric, multi-modal, interface. Our Mission: Harness Speech and Multimodal technology to make mobile communications devices intuitive and useable.
The potential in Telecommunications? Handset sales are stagnant (Yankee Group) 2000 – 420M handsets sold 2001 – 395.8M handsets sold 2002 – 435M handsets to be sold 2010 – 800M handsets What drives the business? The Network? Information Services? Location Services? Entertainment? Messaging?
Serving Customers Figure out what they want and what they need Decide what can be delivered (at the right cost) Find the most efficient implementation User Service provider Technologist Deliver
Features Requested in New Phones Wireless Data Services Attract Strong Interest Among Subscribers Upgrading to New Phones" Telephia/Harris Interactive Study 8 July 2002
Embedded Voice Enhances Applications Voice Dialing Current Network Implementation Speaker Independent Dial-by-Name Dial by number With Embedded Speech Interface “Name Lookup” rather than”Voice Dialing” Dial by name or number Launch SMS application with address filled in –Enter message using buttons or speech-to-text Enter a name in a calendar application Lookup by first name, last name, employer. –Display choices using text or voice –Choose using voice or buttons
Embedded Speech Services The embedded speech recognizer is part of the the handset software infrastructure Results are available as actions or text The handset knows its state Language Software Network Status (User) The handset is multi-communicative Screen Buttons Audio Buzz/vibrate
What Speech Technologies are Available We can deliver multimodal interfaces using three speech capabilities on the device 1. Digit Recognition PIM, Dialing, Internet, Appointments 2. Choose from a list Phone management, PIM, Dialing(Lookup), e-mail, appointments/meetings, PDA 3. Speech-to-text PIM, SMS, e-mail, Internet, Appointments/meetings, PDA
Wins for the Carrier Stickiness Small Support Costs Efficiency Real Time Services require high bandwidth support and seamless coverage Local Data and Messaging Services require low bandwidth and occasional coverage SMS = Secure Money Stream Roaming
In the Short and Intermediate Term Very competent digit recognition and names dialing - Sprint Voice Command Voice interactions have voice response Network connectivity required Unimodal Multimodal Solutions use GPRS or UMTS Carriers must install infrastructure Multimodal capabilities require simultaneous voice and data 2.5G Network build out not before 2007 Bear Stearns Equity Research July 2002 “Overhang Still Outweighing Potential Catalyst” Tower and cell building reduced from 105,000 to 66,000 before 2007 Not enough infrastructure for national 2.5G network
Today’s Server-Based Speech Application Store your phonebook Dial your connection (but not your phone) By number By name (Speaker Independent) (If you are connected to the network)
Today’s Embedded Solutions Dial your phone Navigate and operate your phone by voice Lookup your PIM information for use as dialing or as an address for SMS Access the internet using HTTP or other text-based protocols Manage your local data Roam across networks without losing functionality Save your battery for use in voice calls Create SMS or e-mail messages using speech-to-text
The Global Market Billions Source, “Global Wireless Device Market,” Strategy Analytics, October 1, 2001
Speech and Embedded Technology If you heard it in this talk, You can see it in our laboratory today on a phone. If you wait only a few months, You will be able to buy it in your local phone store. If you don’t know that it is coming You haven’t talked with us!
The End Embed It! Jordan R. Cohen CTO Voice Signal Technologies firstname.lastname@example.org