Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conversational Technologies 1 Natural Language Processing August 23, 2007 SpeechTEK University Deborah Dahl Conversational Technologies.

Similar presentations

Presentation on theme: "Conversational Technologies 1 Natural Language Processing August 23, 2007 SpeechTEK University Deborah Dahl Conversational Technologies."— Presentation transcript:

1 Conversational Technologies 1 Natural Language Processing August 23, 2007 SpeechTEK University Deborah Dahl Conversational Technologies

2 2 Description of the Tutorial An introduction to the principles of natural language processing and the role of natural language processing in current and future speech applications 9:00-9:15 Introduction: what is natural language 9:15-10:15 Part 1: Overview and Principles 10:15-10:45 (30 minute break) 10:45-12:00 Part 2: Detailed Examples

3 Conversational Technologies 3 Attendees Backgrounds and goals

4 Conversational Technologies 4 Audience and Background A general technical background. No natural language processing background will be assumed, but experience developing speech applications would be helpful.

5 Conversational Technologies 5 What is Natural Language? Natural language is the kind of language thats used to communicate between people Can be spoken, written or gestural (in the case of Sign Languages) There are several thousand currently spoken human languages

6 Conversational Technologies 6 Why are We Interested in Natural Language? Support for more natural and effective computer-human interactions by accommodating the ways that people already communicate

7 Conversational Technologies 7 Natural Language Processing Natural language understanding Natural language generation Machine translation

8 Conversational Technologies 8 Part 1: Overview and Principles

9 Conversational Technologies 9 Goals Understand what natural language is Learn about the most common techniques for processing natural language Their strengths and weaknesses Understand where natural language processing technology is headed in the future. Focus is on commercial applications

10 Conversational Technologies 10 Topics What is natural language? Issues in spoken natural language and how to handle them Statistical Language Models (SLM's) speech grammars with semantic tags Variability in expression, pronouns, and filling multiple slots from a single utterance How emerging standards such as EMMA will contribute to more sophisticated future applications Recent topics in natural language research and how this research may eventually be utilized in future applications

11 Conversational Technologies 11 Natural Language Understanding The task of automatically assigning meaning to language

12 Conversational Technologies 12 What natural language processing isnt Speech recognition, which turns the sounds of spoken language into the words of written language Dialog management, which manages a natural language interaction between a user and a computer Artificial intelligence, which studies how to provide intelligent capabilities to computers

13 Conversational Technologies 13 Assigning Meaning to Language In most applications, the developer decides what the set of possible meanings is Meanings can be simple or complex Language can be simple or complex Current commercial techniques can Assign simple meanings to simple language Assign simple meanings to complex language Research systems can handle more complex meanings and language, but no existing system can handle all meanings and all language for even one human language

14 Conversational Technologies 14 Examples of Complex Language Shakespeare Religious texts The United States Constitution We dont have to worry about assigning meaning to these texts!

15 Conversational Technologies 15 Simple to Slightly More Complex Language yes New York call home a red t-shirt, size large I want to go from Philadelphia to New York on Sunday, August 19 As language becomes more complex, the more we need special techniques to process it

16 Conversational Technologies 16 Human Communication Process? language Thought Person A Person B

17 Conversational Technologies 17 More Realistic Communication Process language Thought 1 A thought somewhat similar to Thought 1 How should I express this? Is this something I really need to say? What does B already know? Why do I want to express this thought? Do I want to impress B? Might I offend B by saying this? What language should I use ? Should I believe this? Could A be lying or lacking credibility? If I think A is lying should I say so? Did I hear it right? Did I understand it? Why did Person A say that? Person A Person B

18 Conversational Technologies 18 Issues in Natural Language Variability of expression Infinite number of meanings that can be expressed Infinite number of possible sentences in a language Many ways to say the same thing The same thing can have different meanings in different contexts

19 Conversational Technologies 19 What is a Meaning? Many approaches to representing meanings in traditional linguistics and philosophy of language Most widely used commercial representation is as a token or as a set of slot/value pairs (also called key/value or attribute/value pairs) Often structured into a set of related slot/value pairs (for example, the fields of a VoiceXML, or a traditional frame)

20 Conversational Technologies 20 Tokens my printer is printing horizontal bands and everything is printing in blue printer problem I cant connect to the internet internet problem

21 Conversational Technologies 21 What is a Meaning? Slot/Value Pairs I want to go from Chicago to New York on August 19 midafternoon on United Form/frame – airline reservation Destination: New York Departure city: Chicago Departure date: August 19 Departure time: midafternoon Airline: United

22 Conversational Technologies 22 Information Available for Extracting Meaning Used by todays commercial systems Words of the utterance Word order Grammatical endings Specific grammar for the application Information about what previous instances of that utterance have meant Used by research systems and people Prosody (intonation, pauses, loudness, stress, timing) General information about the language itself (dictionaries, grammars, thesauri) Context of the utterance Information about the topic Facial expressions, gestures

23 Conversational Technologies 23 Traditional Tasks in Natural Language Understanding (Recognition – speech, handwriting, OCR…) Lexical lookup Part of speech tagging Sense disambiguation Syntactic parsing Semantic analysis Pragmatic analysis

24 Conversational Technologies 24 Problems with Traditional Approaches Try to describe the full language and a broad set of meanings For practical applications, its much easier to just write a small grammar for a specific application

25 Conversational Technologies 25 (Recognition – speech, handwriting, OCR…) Lexical lookup (part of recognition) Part of speech tagging – parts of speech not used Sense disambiguation – not needed, constrained application Syntactic parsing – syntactic structure used indirectly Semantic analysis Pragmatic analysis Natural Language Tasks in Commercial Speech Systems } Done in parallel

26 Conversational Technologies 26 Extracting Meaning in Commercial Applications Filling slots by using semantically tagged grammars (CFGs) Mapping complex utterances to categories (SLMs)

27 Conversational Technologies 27 Semantically Tagged Grammars A grammar defines what the recognizer can recognize (recognized strings) Tags define return values for different recognized strings Information used: words of the utterance and a special-purpose grammar

28 Conversational Technologies 28 Context-Free Grammar Formats Represent what a speech recognizer can recognize Example: Request PoliteWord + Action + Item (please open the door) Speech Recognition Grammar Specification (SRGS) (ABNF and XML formats) Java Speech Grammar Format (JSGF) Nuance GSL Microsoft Speech Application Programmers Interface (SAPI)

29 Conversational Technologies 29 Semantic Tags Reduce variability of expression Assign return values to recognized strings W3C Semantic Interpretation for Speech Recognition (SISR) JSGF tags SAPI tags IBM ECMAScript tags Nuance GSL

30 Conversational Technologies 30 Capabilities of Tag Formats Assign tokens to strings (JSGF) Yeah yes Create key-value pairs (SAPI) to chicago ord Perform computations (SISR, IBM,GSL) three days from now August 26, 2007 two medium and three large pizzas 5 pizzas

31 Conversational Technologies 31 SISR Tags for yes and no yes yeah yes you bet yes oui yes no nope no way no

32 Conversational Technologies 32 GSL Token DigitValue [ ([zero oh] one) { return (01) }...] oh one 01

33 Conversational Technologies 33 SISR Slot/Value "I would like a small coca cola and three large pizzas with pepperoni and mushrooms. I would like a out.drink = new Object(); out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize; and;

34 Conversational Technologies 34 GSL Slot/Value ;GSL 2.0; ColoredObject:public (Color Object) Color [ [red pink] { } [yellow canary] { } [green khaki] { } ] Object [ [truck car] { } [ball block] { } [shirt blouse] { } ]

35 Conversational Technologies 35 SAPI Slot-Value elvis presley the king

36 Conversational Technologies 36 Problems with Tagged Grammars Hard to maintain when complex Hard to anticipate all the variations in how someone might say something Can use wildcards/garbage to ignore parts of utterance Speech recognition suffers when grammars are too complex Speech recognition suffers when wildcards are used

37 Conversational Technologies 37 Statistical Language Models (SLMs) Speech recognition is based on statistical models, not grammars In commercial systems, natural language processing is a process of classification, relatively coarse meaning extraction Works well if goal is to extract very simple meanings

38 Conversational Technologies 38 Stages in SLM Processing Ngram speech recognition: probabilities of word sequences, usually 2-3 words Much more flexible (but less accurate) than a grammar However, accuracy is not as critical with SLMs because you dont have to get every single word right Text classification: given a text, assign it to categories based on training from previous texts There are many algorithms for classification

39 Conversational Technologies 39 Problems with SLMs Less accurate than CFGs Expensive to implement and maintain Require a lot of data for good performance

40 Conversational Technologies 40 Tagged Grammars or SLMs? Deeply nested menus SLMs Complex applications with many slots to fill and precise meanings needed grammars Can combine both approaches in one application Front-end SLM followed by grammar Prompt asks specific question to catch most common tasks but has other category

41 Conversational Technologies 41 Other Combination Approaches Use SLM technology to recognize but grammar to interpret Rules combined with SLMs Robust parsing Rules combined with wildcard I want um make that a large pizza with pepperoni and onions

42 Conversational Technologies 42 Emerging Standards: EMMA EMMA (Extensible Multi-Modal Annotation) Developed by the World Wide Web Consortium Multimodal Interaction Working Group An XML format for representing users inputs and the results of processing them

43 Conversational Technologies 43 How does EMMA relate to natural language understanding? EMMA represents the results of a natural language understanding process

44 Conversational Technologies 44 EMMA Benefits (1) EMMAs standard format lets all kinds of EMMA producers (multimodal modality components) exchange results handwriting recognizers speech recognizers text classifiers face recognizers speaker identification and verification …

45 Conversational Technologies 45 EMMA Benefits (2) Through, provides a way for specialist processing components to cooperate in processing a single input Speech recognition Lexical lookup Part of Speech tagging Parsing Semantic analysis Ngram speech recognition Classification

46 Conversational Technologies 46 EMMA Example – (1) Annotation Elements airline from philadelphia to boston and i want a vegetarian meal

47 Conversational Technologies 47 EMMA Example – (2) Annotation Attributes />

48 Conversational Technologies 48 EMMA Example (3) Application Semantics philadelphia boston vegetarian

49 Conversational Technologies 49 Part 2: Detailed Examples

50 Conversational Technologies 50 SAPI XML Grammar Examples Windows Speech Recognition (Vista) Office 2003 Speech Recognition Example – music player interface Id like to hear Beethovens 5 th Please play Brandenburg Concertos by Bach Play something by Elvis

51 Conversational Technologies 51 Canonicalizing Forms elvis presley the king

52 Conversational Technologies 52 Canonicalizing Forms (2) ninth symphony seventh symphony fifth symphony Brandenburg Concertos third symphony hound dog something anything symphony in d major opus 3

53 Conversational Technologies 53 Disambiguating J S Johann Sebastian Bach J C Johann Christian Bach

54 Conversational Technologies 54 SLM Examples Meta-utterances for channel control Im confused Speak louder please Could you say that again?

55 Conversational Technologies 55 Training Data Find out how people ask these questions Manually tag them with their categories Category:repeat could you say that again please i didn't catch that sorry pardon me? repeat that please say that again what? Category:operator I need to speak to a human are there any humans I can talk to? please get me an operator I want an operator operator please I need an agent

56 Conversational Technologies 56 Use NGram Speech Grammar Ngrams are sets of two or three words and the probabilities that theyll occur together in that order Much less constrained than CFGs Less accurate Used in How may I help you? applications, dictation systems, and research

57 Conversational Technologies 57 Use Text Classification Software Uses training data to develop probabilities that a new text is in one of the training categories Many algorithms and approaches to text classification Similar to the technology used in spam filters, but input is speech

58 Conversational Technologies 58 Example User says: Pardon me, I didnt catch that Speech recognizer hears: party may i didn't catch that Classifier classifies increase_volume 0.4595725150090289 decrease_volume 0.0 slower 0.0 faster 0.0 confused 0.4447495899966607 repeat 0.567774973957669 operator 0.5163977794943222

59 Conversational Technologies 59 EMMA Text Input Example boston philadelphia Tuesday

60 Conversational Technologies 60 EMMA: Classification Example internet connectivity

61 Conversational Technologies 61 Natural Language Research Natural language processing is an active area of academic and industrial research Topics studied include spoken dialog processing, text understanding, natural language generation, automatic translation, acquisition of natural language information such as words and grammars, information extraction, summarization and support for search

62 Conversational Technologies 62 Natural Language Research Most interesting to this audience are topics such as Broadening domains (sense disambiguation and parsing disambiguation) Handling spoken dialog phenomena such as pronouns and ellipses Handling speech errors such as hesitations, false starts Multimodal communication, such as integrating speech and gestures Extracting information provided by prosody and other suprasegmentals The main academic organization is The Association for Computational Linguistics (

63 Conversational Technologies 63 More Information: Websites W3C Voice Browser WG SISR W3C Multimodal Interaction WG (EMMA) Association for Computational Linguistics ( Loquendo Café (for testing SISR grammars) Voxeo Prophecy Platform (for testing Nuance grammars) SAPI XML grammars (test with Windows Speech Recognition or Office 2003 Microsoft 6.1 recognizer) Conversational Technologies

64 Conversational Technologies 64 More Information: Books, Journals, Articles Natural Language Processing: the Next Steps (September 2006) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition by Daniel Jurafsky and James H. Martin (2000 ) Computational Linguistics Natural Language Engineering

Download ppt "Conversational Technologies 1 Natural Language Processing August 23, 2007 SpeechTEK University Deborah Dahl Conversational Technologies."

Similar presentations

Ads by Google