Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services

Slides:

Advertisements

Similar presentations

1 Open Source Grammars David Thomson CTO, SpeechPhone (VoiceXML Tools Committee chair)

Advertisements

VoiceXML: Application and Session variables, N- best and Multiple Interpretations.

Natural Language Systems

Technical Troubleshooting 1. Disclaimer Vendor technology is being shared to illustrate this field and is not an endorsement of the product or vendor.

TECHNOLOGY FOR MOBILE ADVERTISING SEARCH & COMMERCE © 2007 Apptera Inc. Optimizing Software Architecture for Voice Search SpeechTek 2007.

Chapter 5 Input and Output. What Is Input? What is input? p. 166 Fig. 5-1 Next  Input device is any hardware component used to enter data or instructions.

XISL language XISL= eXtensible Interaction Sheet Language or XISL=eXtensible Interaction Scenario Language.

Managing Complexity: 3rd Generation Speech Applications Roberto Pieraccini August 7, 2006.

© GyrusLogic, Inc AI and Natural Language VUI design Peter Trompetter, VP Global Development GyrusLogic, Inc. Tuesday, August 21 at 1:30 PM - C202.

Voice Biometric Overview for SfTelephony Meetup March 10, 2011 Dan Miller Opus Research.

Input to the Computer * Input * Keyboard * Pointing Devices

Frontiers in Interaction The Power of Multimodal Standards Deborah Dahl Principal, Conversational Technologies Chair, W3C Multimodal Interaction Working.

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.

IRead Team Members: Matt Cardin, Paul Kennedy, Oscar Perez.

Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science

MUSCLE Multimodal e-team related activity Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos Technical.

Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

© GyrusLogic, Inc A Conversational System That Reduces Development Cost Luis Valles, Chief Scientist GyrusLogic, Inc. Monday, August 7 at 1:30 PM.

VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Conversational Applications Workshop Introduction Jim Larson.

ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.

Unit 1_9 Human Computer Interface. Why have an Interface? The user needs to issue instructions Problem diagnosis The Computer needs to tell the user what.

1 Commercial Applications of Natural Language Technology April 14, 2005 Deborah A. Dahl Principal, Conversational Technologies Chair, World Wide Web Consortium.

1 Computational Linguistics Ling 200 Spring 2006.

21 August Agenda  Introductions  Logistics  Selecting a project  Working with a client.

Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©

The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch.

Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©

Larry Shi Computer Science Department Graduate Research Mini Talk.

Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.

Security PS Evaluating Password Alternatives Bruce K. Marshall, CISSP, IAM Senior Security Consultant

Usage Documentation.  Room Scheduling  Retrieve Room Schedules  Schedule Ad-Hoc Meetings  E-Lock Access  Use your phone to unlock a room  Request.

Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©

Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.

Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,

March 20, 2006 © 2005 IBM Corporation Distributed Multimodal Synchronization Protocol (DMSP) Chris Cross IETF 65 March 20, 2006 With Contribution from.

E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.

1 Information Systems CS-507 Lecture Types of Controls Access Controls – Controlling who can access the system. Input Controls – Controls over how.

Model of the Human  Name Stan  Emotion Happy  Command Watch me  Face Location (x,y,z) = (122, 34, 205)  Hand Locations (x,y,z) = (85, -10, 175) (x,y,z)

Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.

DAWN: Dynamic Aural Web Navigation Gopal Gupta, S. Sunder Raman, Mike Nichols, H. Reddy, N. Annamalai Department of Computer Science University of Texas.

L C SL C S The Intelligent Room’s MeetingManager: A Look Forward Alice Oh Stephen Peters Oxygen Workshop, January, 2002.

Phone Mashups Integrating Telephony & the Web Irv Shapiro CEO, Ifbyphone, Inc.

© 2013 by Larson Technical Services

Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.

1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.

GUI Meets VUI: Some Possible Guidelines James A. Larson VP, Larson Technical Services 4/21/20151© 2015 Larson Technical Services.

Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,

James A. Larson Developing & Delivering Multimodal Applications 1 EMMA Extensible MultiModal Annotation markup language Canonical structure for semantic.

Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:

What is Input?  Input  Processing  Output  Storage Everything we enter into the computer to do is Input.

Artificial Intelligence, simulation and modelling.

W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.

Client Website Online Account Access & Electronic Delivery.

Presented By Sharmin Sirajudeen S7 CS Reg No :

MANAGEMENT of INFORMATION SECURITY, Fifth Edition

System Access Authentication

Automatic Speech Recognition

Exploring the Basics of Windows XP

(No need of Desktop computer)

Presentation Outlines

Professor John Canny Spring 2003

A HCL Proprietary Utility for

Access via Voice A.VASUKI.

Download the My Learning App

Professor John Canny Spring 2004

VoiceXML An investigation Author: Mya Anderson

Presentation transcript:

Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services

Expectations for Human Agents Use multiple means of communication Recognize me Learn my preferences and choices Understand my requests Resolve ambiguous requests Identify required resources Perform my requests

Expectations for Human Agents Use multiple means of communication Recognize me Learn my preferences and choices Understand my requests Resolve ambiguous requests Identify required resources Perform my requests Automated

Use Multiple Means of Communication Speak and listen Text Illustrations and pictures

Text Capture Speech Recognition Hand Writing Recognition Keyboarding

Pickers and Pallets

Pattern Recognition UPC Code QR Code

Multiple Ways to Present Information MapsCharts VideoText Documents

Use Multiple Means of Communication Speak and listen Text Illustrations and pictures Use the appropriate mode (modes) of communication

Recognize Me User identification devices – Something the user has – Something the user knows – Something about the user Combination increases security

Recognize Me: Something the User Has

Recognize Me: Something the User Knows Account names and passwords – Complex strings of digits and characters – Must be changed frequently – Difficult for user to remember Challenge dialog: – System: what is your mother’s maiden name? – User: Lewis – System: What is the name of your first pet? – User: Fido – System: Apply your secret function to the number 7 – User: 4 – System: Approved to access

Recognize Me: Something About the User Signature Recognition Speaker Recognition Face Recognition Fingerprint Recognition

Learn My Preferences and Choices User Profile History

15 Understand My Requests: Semantic Interpretation To create smart voice user interfaces, we need to extract the semantic information from speech utterances Example: – Utterance: “I want to fly from Boston to Chicago” – Semantic Interpretation: { origin: "BOS" destination: "OHR" }

16 Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts Semantic Interpretation Processor VoiceXML Interpreter Application text ECMAScript object I want to fly from Boston

17 Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text I want to fly from Boston ECMAScript object Semantic Interpretation Processor I want to fly from Boston

18 Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text from Boston $.origin= "BOS"; ECMAScript object Semantic Interpretation Processor I want to fly from Boston

19 Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text { origin: "BOS" } from Boston $.origin= "BOS"; ECMAScript object Semantic Interpretation Processor I want to fly from Boston

20 Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts Semantic Interpretation Processor VoiceXML Interpreter Application text ECMAScript object To Chicago { origin: "BOS" }

21 Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text To Chicago ECMAScript object Semantic Interpretation Processor { origin: "BOS" } To Chicago

22 Semantic Interpretation ASR Grammar with Semantic Interpretation Scripts VoiceXML Interpreter Application text { origin: "BOS" destination: "OHR" } from Boston $.from="OHR" tag> ECMAScript object Semantic Interpretation Processor To Chicago { origin: "BOS" destination: "OHR" }

Resolve Ambiguous Requests Consult my user profile and histories Ask me if you are uncertain

Resolve Ambiguous Requests: Consult My User Profile and History User Profile. Preferred airline: United. Travel History. From Chicago to Portland, Maine. From Chicago to Portland, Oregon. I want to fly from Chicago to Portland Do you want to fly to Portland, Maine or Portland, Oregon?

Resolve Ambiguous Requests: Ask Me if You Are Not Certain I want to fly to Austin Do you want to fly to Austin or Boston?

Identify Required Resources Local Resources Remote Resources

Perform Required Tasks Carry out instructions accurately and efficiently

Expectations for Human Agents Use multiple means of communication Recognize me Learn my preferences and choices Understand my requests Resolve ambiguous requests Identify required resources Perform my requests Automated

Should intelligent agents Listen and speak to us? Yes! And a whole lot more.

Discussion Questions Will Intelligent multimodal agents replace – Voice only IVR applications? – GUIs on mobile devices? – GUIs on desktops? What architecture do you recommend for intelligent multimodal agents? What new UI guidelines are needed for intelligent multimodel agents?