Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.

Slides:



Advertisements
Similar presentations
Advanced XSLT. Branching in XSLT XSLT is functional programming –The program evaluates a function –The function transforms one structure into another.
Advertisements

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
HTML popo.
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
HTML/XML XHTML Authoring. Creating Tables  Table: An arrangement of horizontal rows and vertical columns. The intersection of a row and a column is called.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.
Designing a Multi-Lingual Corpus Collection System Jonathan Law Naresh Trilok Pace University 04/19/2002 Advisors: Dr. Charles Tappert (Pace University)
Introduction to VXML. What is VXML? Voice Extensible Markup Language Used in telephone-based speech applications voice browsing of the web.
Developing a Basic Web Page with HTML
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Introduction to XML This material is based heavily on the tutorial by the same name at
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
Web page - A Web page is a simple text file that contains HTML tags (code) that describe what should be displayed on the browser. -The Web browser interprets.
Basic HTML Hyper text markup Language. Lesson Overview  In this lesson, you will learn to:  Create Lists  Horizontal rule (line)  Create a page for.
MECHANICS OF WRITING C.RAGHAVA RAO.
Week 1.  Phillip Chee   Ext.1214 
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
1 Computational Linguistics Ling 200 Spring 2006.
New challenge: telephone Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
1 Labels and Tags October 14, Grammar A set of components and rules that define a method/means of communication among objects. Components are.
VoiceXML continued Speech reco/speech synthesis recap rps example ( ) Homework: Do VoiceXML examples. Start planning Project 2.
Creating Web Documents alt attribute Good and bad uses of ‘multimedia’ Sound files Homework: Discuss with me AND post announcement of Project II. Forms.
XP 2 HTML Tutorial 1: Developing a Basic Web Page.
INTRODUCTORY Tutorial 1 Using HTML Tags to Create Web Pages.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Math 5 Professor Barnett Timothy G. McManus Anthony P. Pastoors.
Levels of Language 6 Levels of Language. Levels of Language Aspect of language are often referred to as 'language levels'. To look carefully at language.
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
C++ Basics C++ is a high-level, general purpose, object-oriented programming language.
Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme.
by Maria Rita Marruganti DIFFERENT WAYS OF SENDING INFORMATION Passive e.g. newspapers, radio, television. You don’t produce, just receive information.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
What it is and how it works
ORAL COMMUNICATION SKILLS Discussion skills and Presentation skills The course is designed to improve students’ speaking skills in English by: activating.
Robert Crawford, MBA West Middle School.  Explain how input devices are suited to certain kinds of data.  Distinguish between RAM and ROM.  Identify.
© 2013 by Larson Technical Services
Student Pages
Creating User Interfaces Another example. Classwork/homework: work on VoiceXML project.
HTML Overview Part 5 – JavaScript 1. Scripts 2  Scripts are used to add dynamic content to a web page.  Scripts consist of a list of commands that execute.
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
Jeopardy Syntax Morphology Sociolinguistics and Prescriptivism Phonology Language and Diversity Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300.
Levels of Linguistic Analysis
Creating User Interfaces Ideas & Trends Homework: Post constructive comments. Work on project.
Creating User Interfaces XML, MathML, ChomeVox. XML eXtended Markup Language Tags and text Tags are singletons and paired. Tags have types and, generally,
Creating interfaces Multi-language example Definition of computer information system VoiceXML example Project proposal presentations Homework: Post proposal,
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
HTML5 SEMANTICS TO OR NOT TO THAT IS THE QUESTION BY WILLIAM MURRAY.
Creating User Interfaces VoiceXML. Examples. Classwork/Homework: Make proposal and start work on your VoiceXML project.
Introduction. Internet Worldwide collection of computers and computer networks that link people to businesses, governmental agencies, educational institutions,
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
IIS for Speech Processing Michael J. Watts
Dialog Design 4 Speech & Natural Language
Creating User Interfaces
Demystifying Web Content Accessibility Guidelines
Digital Audio Application of Digital Audio - Selected Examples
Presentation transcript:

Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.

Speech recognition Encompasses variety and range of activities Totally open-ended to content and audience – May claim more than really exists Restricted to small[er] set of phrases – Phrases within longer sections of speech Restricted to require training OR system learns – Dictation systems learn your voice

Speech recognition User speaks. System 'understands', at least enough to perform some action. Related to (but not the same as) –Natural language understanding –Voice print identification –Record information to be re-played to human in compressed form for later interaction –Speech synthesis (other direction): words to speech –?

Natural language understanding Skip speech altogether, but type in statements or phrases in normal language –What is normal? We tend not to speak that grammatically –Many 'natural language systems' actually use keywords Histor Moon rocks example Combine speech to natural language …

Continuous versus discrete Speaker speaks 'naturally' versus Speaker separates words

Examples Dictation: no understanding as such, produce words/sentences in a program (Telephone) Help desk / Information: generally restricted or directed speech, choosing from alternatives (may or may not be given). Advances the process [Restricted] commands: actually carrying out operations –Factory example: start and stop –Car: radio, heat/AC –Phone: call specific number

Training Dictation application: user takes time to read specific test to train the system –Note: some systems also adapt with use. If & when user corrects the results, system may do better next time. Phone lookup: user records names. No 'understanding', just record for matching.

Audience & content Some systems may allow adapting to audiences, for example, male versus female Some systems have restrictions on types of content –Historical note: IBM system in 1980s & 1990s was restricted to male, American-born speakers (no speech impediments) and legal text.

Speech recognition concepts Air pressure  diaphragm in phone  electrical signal  (Fourier Transform)  wave pattern matched against sets of canonical patterns (native speaker of English, perhaps male/female & young/old alternatives) generated for the specified grammar (using a segmentation=dividing up of the parts) Note: interplay of grammar and statistics distinguishes different approaches

Fourier Transform (Fast Fourier Transform -- FFT) Takes data representing a signal And produces numbers representing the combination of sine and cosine waves that make up the signal

Speech recognition Works on the product of the FFT Uses (in most cases) –Segmentation: attempt to break up into pieces, perhaps syllables or words –Grammar: definition of what is to be expected –Probabilities: if first part matched X, then greater probability that then next would match to Y

Current State of the Art General, no restrictions, speech reco, good enough to act on the speech? always about to happen? dictation / substitute for keyboard+ exists and satisfies many –Is this most important application for most users? –May not be killer ap, but may be good for motivating research Extra credit posting: prepare brief report on [a] current product or application. Can be one you use yourself.

Speech synthesis aka TTS (text to speech) Application determines that the computer needs to say certain words lexical units (syllables of words)  phonemes  pre-recorded (wav) files of phonemes

Speech synthesis This is again a segmentation process: need to divide up the words and then put together so speech sounds 'natural'. –particular phoneme may [need to] sound different in different context. –also need to deal with abbreviations & local accents –Place names (important in travel & weather applications) Special case: detect and use wav file for each name. Older methods were all synthesized –similar distinction between all synthesized and samples of music

Speech synthesis is essentially ‘the computer’ reading ‘out loud’. Easy to do most things More and more difficult to do complete job Different languages may be easier than English. People who are not monolingual please comment!

Restricted / directed speech applications The language is VoiceXML We will use evolution.voxeo.com to create directed speech applications. –Free facilty: put in URL pointing to a VoiceXML document. Supplies phone numbers to call in to test. –You need to register. –Note: previously used Tellme studios but they stopped offering service.

XML Generalization of HTML XML documents have markup. –Tag indicating type of element and, possibly with attributes, content, tag closer. Document must be well-formed. –Elements nested in other elements –Quotation marks around attribute values Developers decide on element types. –So, we need to obey rules of VoiceXML Each element type can only have certain child elements

Screen shot from Voxeo

Screen shot: phone numbers

Homework (over break) Sign up to be Voxeo developer. – Start VoiceXML tutorials. – Do your own hello, world application. Start planning your VoiceXML project.