Presentation is loading. Please wait.

Presentation is loading. Please wait.

Media Manager Mail Access Unified Messaging Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000.

Similar presentations


Presentation on theme: "Media Manager Mail Access Unified Messaging Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000."— Presentation transcript:

1 Media Manager Mail Access Unified Messaging Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000

2 Messages from many sources PSTN Phone Cell-Phone Desktop Pager MediaManager Mail Access

3 Project Overview Make messages more accessible –Get all types of messages –Access from different devices with different capabilities –Enable faster browsing of many voicemails Media Mail services –A unified messaging infrastructure –Voicemail is email encoded in MIME Transcoding services –Enhance voicemail interaction –Includes: skimmed audio, transcript, text/audio summary, and outline

4 Related Work Universal Inboxes/Unified Messaging –onebox.com –CoolMail.net –Lucent/Octel Unified Messenger –Stanford Mobile People Architecture Audio Content Extraction Techniques –SpeechSkimmer, MIT’s MultiMedia Lab [Arons95] –Auto-Summarization, Microsoft Research –CueVideo, IBM

5 Architecture Transcoder Service Voicemail->Text Transcript Voicemail->Text Summary Voicemail->Text Outline Email ->Plain Audio Email -. GSM Audio Voicemail -> GSM Summary Voicemail->Audio Summary Voicemail->Skimmed Audio Mail Access Interface NinjaMail Client Folder Store Client Mail Access Interface POP Mail Access Interface IMAP Media Manager Interface Media Manager Service

6 Applications Conventional GUIs Context-Aware Applications Iceberg Universal Inbox Component Desktop MediaManager Mail Access A conventional desktop gui can contact the Media Manager directly and request messages as text. The Media Manager will return emails and voicemails as text.

7 Context-Aware Application Palm Device Desktop Redirection Proxy 1 1 palm device asks for a list of messages as text and selects a voicemail 2 2 requests a redirection from the proxy, which forwards the redirection request to the desktop 3 3 desktop asks for the voicemail and plays it MediaManager Mail Access

8 Bhaskar’s Cell-Phone Automatic Path Creation Service 800-MEDIA-MGR UID: mediamgr@cs.berkeley.edu Naming Service 1 1 Preference Registry mediamgr: Cluster locn. 2 2 3 3 Barbara’s PSTN Phone Universal Inbox Iceberg Universal Inbox MediaManager Mail Access

9 Architecture Transcoder Service Voicemail->Text Transcript Voicemail->Text Summary Voicemail->Text Outline Email ->Plain Audio Email -. GSM Audio Voicemail -> GSM Summary Voicemail->Audio Summary Voicemail->Skimmed Audio Mail Access Interface NinjaMail Client Folder Store Client Mail Access Interface POP Mail Access Interface IMAP Media Manager Interface Media Manager Service

10 MediaManagerServiceIF getFolders( ) and getFoldersAs( ) –Given a username, returns a list of folder names –Returns the list as audio or gsm getList( ) and getListAs( ) –Given a username, foldername, and count –Returns a list of messages (sendername, title, date) –Returns the list as audio or gsm getMessage( ) –Given a Message Ref, returns the entire message getMessageContent( ) –Given a Content ID and return type –Returns one part of the message as the return type

11 Media Message –Media Reference id –Array of Content Objects Content Object –Content ID –Data Content ID –Media Reference id –Content Part index –Content Type Messages and Content Objects

12 Interface Example MediaManager Mail Access User asks for list of messages as GSM Media Manager returns a list of message headers Cell Phone sends a Content ID back Media Manager sends a voicemail Content Object Cell-Phone Media Message Header Content Object Content ID

13 Audio Tools Speech Recognition/Synthesis –Transcribe voicemail to text –IBM ViaVoice SDK and custom audio libs Natural Language Processing –Directed word spotting by “understanding” content –ViaVoice SRCL Pitch –Detecting important words by emphasized pitch Pause –Compression through pause removal Spurts –Retrieve sentence structure of voicemail

14 Transcoding Techniques Voice Mail ->Text TranscriptSpeech recognition Voice Mail ->Text Summary NLP, pitch detection and recognition Voice Mail ->Text Outline Pause detection and speech recognition E Mail ->Plain AudioSpeech synthesis E Mail ->GSM AudioSpeech synthesis and toast Voice Mail ->Skimmed AudioPause detection Voice Mail ->Audio Summary Text summary and speech synthesis Voice Mail ->GSM SummaryAudio summary and toast

15 Examples Original Voicemail: “Hello, This is Barbara. How are you and the cats doing? I was wondering if you would feed them a little more the first time in case they eat too much. My number is (713) 465-5155. You can call me anytime. Have a very good holiday. Bye bye” Processed Voicemail: Phyllis Barbara Area in the cat staring And then if you run but feed them A little more the first time in case they eat too much On my number is (713) 465-5155 You can call me anytime. Have every holiday Of light Translated Talk spurts (Pitch emphasized words in green) (Skimmed)(Just pitch) Translated using NLP Hello this is Barbara My number is (713) 465-5155

16 Examples continued... Original Voicemail: “Faced with a seemingly inevitable engineering task authors tend to adopt one of two strategies for adding new services to the Internet landscape: inflexible, highly tuned, hand-constructed services….” Processed Voicemail: Translated Talk spurts (Pitch emphasized words in green) (Skimmed)(Just pitch) Translated using NLP Faced with a seemingly inevitable engineering task authors tend to adopt what it to strategies for adding new services to the internet landscape. Inflexible, highly Tate, had constructed services….”

17 Results Pause detection –Worked well for given applications –Playback speedup by 50-70% Pitch detection –Problems due to high pitch sounds and transitions Speech recognition –Performance decrease in conversational settings Natural Language Processing –Performed well with small grammar

18 Example: Adding GSM Acess Define a specific types, ie GSMAudio, GSMSummary Optionally create new Content Objects Add Content Object definition to MediaManager Add add gsm transcoder to TranscoderService

19 Detail: Adding GSM Access Add Content Object definition to MediaManager –Define GSMAUDIO and GSMSUMMARY –Add cases to createObject() in Content Object –Add cases to Media Manager Add GSM to Transcodeer –Add method toGSM() to Transcoder –Edit.config file External.transcoder.gsmrungsm –Edit related transcoders speechSynthesizer and audioSummary()

20 Implementing Other Mail Stores Examples: IMAP, POP, Microsoft Exchange Server Implement MailAccessIF –String [] getMAFolders( userName ) –MediaMessage [] getMAList( userName, folderName, count ) –MediaMessage getMAMessage( MediaRef ) –ContentObject getMAMessageContent( ContentID ) Add new protocol to Media Manager protocol table Optionally add protocol for users in to FolderStore

21 Conclusion Overall –System useful as navigational hints –To achieve total comprehension, need better voice recognition What works well –Skimming using pause removal –Detecting spurts for structure What needs work –Speech detection in conversational settings –Pitch emphasis needs refining Future Directions –Implementing more mail stores –Enhancing interfaces –Pause detection/word boundaries using speech detection –Developing voicemail grammars –Using NLP feedback with pitch emphasis detection –Improved speech detection in noisy environments

22

23 MediaManagerServiceIF String[] getFolders( userName ) byte[][] getFoldersAs( userName, returnType ) MediaMessage [] getList( userName, folderName, count ) byte[][] getListAs( userName, folderName, count, returnType ) MediaMessage getMessage( MediaRef ) ContentObject getMessageContent( ContentID, returnType )

24 Pitch Detection The Idea –A speaker’s pitch naturally changes when introducing topics or emphasizing words [Hirshberg92] –Use pitch increases as hints for “important” words Algorithm [Aaron95] –Determine pitch for each 20 ms frame (FFT with SHS) –Set emphasis threshold to be top 1% of pitch values (by histogram) –Mark 1 sec interval as emphasized if contains >=3 emphasized frames

25 Pause Detection Why is pause detection useful? –Removing pauses speedups playback Typically, 50-70% of original time [Foulke71] –Long pauses signify groups (talk spurts) Noise and soft sounds create difficulties Algorithm: Smoothed Histogram [Lamet81] –Calculate energy per 10 ms frame –Threshold based on smoothed histogram (5 dB after first peak) –Use heuristics to remove artifacts Average energy (dB) Percent of Frames

26 Results Pause detection –Worked well for given applications –Playback speedup by 50-70% Pitch detection –Problems due to high pitch sounds and transitions Speech recognition –Performance decrease in conversational settings Natural Language Processing –Performed well with small grammar

27 Conclusion Overall –System useful as navigational hints –To achieve total comprehension, need better voice recognition What works well –Skimming using pause removal –Detecting spurts for structure What needs work –Speech detection in conversational settings –Pitch emphasis needs refining Future Directions –Implementing more mail stores –Enhancing interfaces –Pause detection/word boundaries using speech detection –Developing voicemail grammars –Using NLP feedback with pitch emphasis detection –Improved speech detection in noisy environments

28 Works Cited [Arons95] B. Arons. Interactively Skimming Recorded Speech, Ph.D. dissertation, MIT 1985. [Foulke71] E. Foulke The Perception of Time Compressed Speech. Ch 4 in Perception of Language, edit by P.M. Kjeldergaaid, D.L. Horton, and J.J. Jenkins, Charles E. Merill Publishing Company, 1971. pp. 79-107 [Hirshberg92] J. Hirschberg and B. Grosz. Intonational Features of Local and Global Discourse. In Proceedings of the Speech and Natural Language workshop (Harriman, NY, Feb. 23-26). Morgan Kaufman Publishers, 1992. pp. 441-446. [Lamel81] L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpson. An Improved Endpoint Detector for Isolated Word Recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-29, 4. (Aug, 1981), 771-785.

29 Architecture Transcoder Service Voicemail->Text Transcript Voicemail->Text Summary Voicemail->Text Outline Email ->Plain Audio Email -. GSM Audio Voicemail -> GSM Summary Voicemail->Audio Summary Voicemail->Skimmed Audio Mail Access Interface NinjaMail Mail Access Interface POP Mail Access Interface IMAP Client Folder Store Media Manager Service Media Manager Interface


Download ppt "Media Manager Mail Access Unified Messaging Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000."

Similar presentations


Ads by Google