L C SL C S SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science

Slides:



Advertisements
Similar presentations
CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.
Advertisements

Spoken Dialogue Technology Achievements and Challenges
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Lecture 23: Software Architectures
Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science
DT211/3 Internet Development Application Internet Development Application.
Equal-party Conversation System for Language Learning Chih-yu Chao (advisor: Stephanie Seneff) April 14 th, 2006 Dialogs on Dialogs Reading Group.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
A Framework For Developing Conversational User Interfaces
Computer Science 101 Web Access to Databases Overview of Web Access to Databases.
SCILL: Spoken Conversational Interaction for Language Learning
MiVoice Office v MiVoice Office v6.0 is mainly a service enhancement release, rather than a user feature rich enhancement release.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
Dale Roberts 1 Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
1 INTRO TO BUSINESS COMPONENTS FOR JAVA (BC4J) Matt Fierst Computer Resource Team OracleWorld Session
MMP - M204 Information Design/Cross Media Publishing - Spoken Language Interfaces - Dr. Ingrid Kirschning (UDLA)1 4. Speech Synthesis –Introduction to.
GIS technologies and Web Mapping Services
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Information Retrieval using Intelligent Speech Communication Interface Institute of Informatics of the Slovak Academy of Sciences, Bratislava
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Spoken Dialogue Systems and the GALAXY Architecture 29 October 2000 Advanced Technology Laboratories 1 Federal Street A&E Building 2W Camden, New Jersey.
CINEMA’s UbiComp Subsystem Stefan Berger and Henning Schulzrinne Department of Computer Science Columbia University
MIT 6.893; SMA 5508 Spring 2004 Larry Rudolph Lecture Introduction Speechbuilder Tutorial.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Research Challenges for Spoken Language Dialog Systems Julie Baca, Ph.D. Assistant Research Professor Center for Advanced Vehicular Systems Mississippi.
Small Devices on DBGlobe System George Samaras Chara Skouteli.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Practical Project of the 2006 Joint International Master’s Degree.
Integrating VoiceXML with SIP services
DS-UCAT: A New Multimodal Dialogue System For An Academic Application Ramón López-Cózar 1, Zoraida Callejas 1, Germán Montoro 2 1 Dept. of Languages and.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Task Achieving Agents on the World Wide Web An Introduction Sharif Univ. of Tech. Computer Eng. Dep. Semantic Web Course Mohsen Lesani 13 Ord 1374.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Sports Scores Speech Recognition System Major League Baseball Score System.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,
A Common Ground for Virtual Humans: Using an Ontology in a Natural Language Oriented Virtual Human Architecture Arno Hartholt (ICT), Thomas Russ (ISI),
Agilent Technologies Copyright 1999 H7211A+221 v Capture Filters, Logging, and Subnets: Module Objectives Create capture filters that control whether.
Architecture of Decision Support System
XML Grammar and Parser for WSOL Kruti Patel, Vladimir Tosic, Bernard Pagurek Network Management & Artificial Intelligence Lab Department of Systems & Computer.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Introduction to Conversational Interfaces Jim Glass Spoken Language Systems Group MIT Laboratory for Computer Science February 10, 2003.
M. Liu, T. Stanley, J. Baca and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University URL:
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Basics of JDBC Session 14.
L C S Spoken Language Systems Group Stephanie Seneff Spoken Language Systems Group MIT Laboratory for Computer Science January 13, 2000 Multilingual Conversational.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
A Speech Interface to Virtual Environment Authors Scott McGlashan and Tomas Axling Swedish Institute of Computer Science.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Presented By Sharmin Sirajudeen S7 CS Reg No :
OUTLINE Basic ideas of traditional retrieval systems
Ch 15 –part 3 -design evaluation
Retrieval of audio testimonials via voice search
Speech Capture, Transcription and Analysis App
David Cyphert CS 2310 – Software Engineering
Presentation transcript:

L C SL C S SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Developing robust, mixed-initiative spoken dialogue systems is difficult –Complex systems can be created by human- language technology experts Speech Builder Hub Speech Synthesis Speech Synthesis Language Generation Language Generation Dialogue Management Dialogue Management Context Resolution Context Resolution Language Processing Speech Recog. Speech Recog. Database Server Database Server Audio Bridging the Experience Gap SpeechBuilder aims to help novices rapidly create speech-based systems –Uses intuitive methods for specifying domain-specific constraints –Automatically configures HLT components using MIT GALAXY architecture *Leverages future technical advances *Encourages research on portability –Novice developers must overcome a considerable technical challenge

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 SpeechBuilder Server SpeechBuilder Server Hub CGI Parameter Generation CGI Parameter Generation Speech Recognition Speech Recognition Speech Synthesis Speech Synthesis Language Processing Audio Server Audio Server HTTP Gives developer total control over application functionality Developer Application Developer Application Communication with Galaxy via simple HTTP protocol “Turn on the lights in the kitchen” action=set&frame=(object=lights, room=kitchen,value=on) “Show me the banks on Main Street” action=identify&frame=( object=(type=bank, on=(street=Main, ext=Street))) Baseline Configuration

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Still gives developer total control over application functionality Frame Relay server exposes Galaxy meaning representation to app Developer Application Developer Application “Turn on the lights in the kitchen” {c turn_management :parse_frame {c turn :object “lights” :room “kitchen” :value “on”} “Show me the banks on Main Street” {c turn_management :parse_frame {c identify “type” bank :pred {p :on {:street “Main” :ext “Street”}}} Modified Baseline Configuration (this class) Frame Relay Server Frame Relay Server Hub CGI Parameter Generation CGI Parameter Generation Speech Recognition Speech Recognition Speech Synthesis Speech Synthesis Language Processing Audio Server Audio Server TCP SocketSemantic Frame

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 For a speech-based interface to structured data No programming required; specify table(s) and constraints Database Server Database Server Hub Language Generation Language Generation Speech Recognition Speech Recognition Discourse Resolution Discourse Resolution Speech Synthesis Speech Synthesis Dialogue Management Dialogue Management Language Processing I/O Server I/O Server Audio Server Audio Server Audio Server Audio Server INFO Database Access Configuration **

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Step 1: Off-line creation and compilation Hub NLG ASR Discours TTS Dialog NLU Aud io SB Query Response Step 2: On-line deployment INFO Dialog NLG HUB NLU Disc ASR UploadCompile Creating a Speech-Based Application

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Audio Server Audio Server Telephone or lightweight audio server Database Server Database Server Accesses back- end database Language Processing N-best interface with ASR Grammar from attributes & actions Backs off to concept spotting Context Resolution Context Resolution New component performs concept inheritance & masking Processes ‘E-form’ Dialogue Management Dialogue Management Generic server handles interaction Speech Synthesis Speech Synthesis Commercial product Language Generation Language Generation Generates ‘E-form’, SQL, & responses Default entries made Galaxy programmable hub controls interactions between all components Hub Human Language Technologies Speech Recognition Speech Recognition Generic acoustic models Unknown word model Class or hierarchical n-gram

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Some columns are used to access entries (e.g., Name) –Column entries must be incorporated into ASR & NLU Some columns are only used in responses (e.g., Phone) –Column names must be incorporated into ASR & NLU NamePhone Office Jim Stephanie Victor “What is the phone number for Victor Zue?” Extracting Database Information **

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Knowledge Representation Concepts and actions form basis for understanding –Concepts become key/value entries in meaning representation *city: Boston, New York…day: Monday, Tuesday –Actions provide sentence-level patterns of specific queries *“I want to fly from Boston to Taipei…” action=lookup_flight –Action text can be bracketed to define hierarchical concepts ** *“I want to fly source=(from Boston) destination=(to Taipei)” *source=Boston destination=Taipei –Concepts and actions used to configure the following components *Speech Recognition *Natural Language Understanding *Discourse Database columns define basic concepts –Column names can be grouped into concepts *property: phone, …weather: snow, rain…

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Concept usage can be fine-tuned to improve performance:** By default, concepts are used for language modeling, parsing grammar, and meaning representation –For language modeling and parsing grammar only (i.e., no meaning) –For keyword spotting only (i.e., no role in language modeling) –For fine-grained language modeling with coarser meaning representation rain hail snow weather: snow “Will it snow?” sprinkles flurries showers breezy rainy snowy snowfall accumulation rainfall snowstorm thunderstorm blizzard weather: snow Language Modeling and Understanding

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Current Status SpeechBuilder has been operational for over two years –Used by over 50 developers from MIT and elsewhere –Used in undergraduate classes at MIT and Georgetown University ASR capabilities benchmarked against main systems –Achieves same ASR performance as MIT Jupiter weather information system (6.8% word error rate on clean data) (phone #) Several prototype systems have been developed –Information about faculty, staff and students at LCS and AI Labs (phone, , room, voice messages, transfer, etc.) –Application to control the various physical items in a typical office (lights, curtains, TV, VCR, projector, etc.) –Others include TV schedules, real-time weather forecasts, hotel and restaurant information etc. SpeechBuilder used for initial design of many more complex domains

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Increase sophistication of discourse and dialogue manager to handle more complex dialogues –Enable finer specification of discourse capabilities –Add generic capabilities for times, dates, etc. Incorporate confidence scoring and implement unsupervised training of acoustic and language models Create functionality to allow developers to create domain- specific concatenative speech synthesis Create alternative methods of domain specifications to streamline development –Advanced developers don’t necessarily use web interface –Allow for more efficient automatic generation of SpeechBuilder domains Ongoing and Future Work

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Issam Bazzi Scott Cyphers Ed Filisko Jim Glass TJ Hazen Lee Hetherington Joe Polifroni Stephanie Seneff Michelle Spina Eugene Weinstein Jon Yi Misha Zitser Acknowledgements

L C SL C S SpeechBuilder Hands-on Activity Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 Frame Relay Server Frame Relay Server Hub CGI Parameter Generation CGI Parameter Generation Speech Recognition Speech Recognition Speech Synthesis Speech Synthesis Language Processing Audio Server Audio Server TCP Socket Still gives developer total control over application functionality Frame Relay server exposes Galaxy meaning representation to app Developer Application Developer Application Modified Baseline Configuration (this class) Semantic Frame Jaim

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003 SpeechBuilder API Galaxy Frame Relay Galaxy meaning representation provided through frame relay Applications connect via TCP sockets API provided in Perl, Python, and Java –This class: Python API Python class galaxy.server.Server Application Python class galaxy.frame.Frame galaxy.server.Server methods: Constructor(machine,port,ID) connect() processMessage(blocking) disconnect() galaxy.frame.Frame methods: getAction() getAttribute(attr_name) getText() toString() Python API TCP Socket