1 LING 439/539: Statistical Methods in Speech and Language Processing Ying Lin Department of Linguistics University of Arizona.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

ECE 8443 – Pattern Recognition Objectives: Acoustic Modeling Language Modeling Feature Extraction Search Pronunciation Modeling Resources: J.P.: Speech.

ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Hidden Markov Models in NLP

Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Scalable Text Mining with Sparse Generative Models

CSE 590ST Statistical Methods in Computer Science Instructor: Pedro Domingos.

CIS 410/510 Probabilistic Methods for Artificial Intelligence Instructor: Daniel Lowd.

Part I: Classification and Bayesian Learning

Why is ASR Hard? Natural speech is continuous

CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

SI485i : NLP Day 1 Intro to NLP. Assumptions about You You know… how to program Java basic UNIX usage basic probability and statistics (we’ll also review)

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Introduction to Automatic Speech Recognition

Final review LING572 Fei Xia Week 10: 03/11/

Machine Learning Queens College Lecture 1: Introduction.

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

CEN 592 PATTERN RECOGNITION Spring Term CEN 592 PATTERN RECOGNITION Spring Term DEPARTMENT of INFORMATION TECHNOLOGIES Assoc. Prof.

Introduction Mohammad Beigi Department of Biomedical Engineering Isfahan University

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Machine Learning CUNY Graduate Center Lecture 1: Introduction.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

7-Speech Recognition Speech Recognition Concepts

1 Computational Linguistics Ling 200 Spring 2006.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

Introduction to CL & NLP CMSC April 1, 2003.

IRCS/CCN Summer Workshop June 2003 Speech Recognition.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Chapter 23: Probabilistic Language Models April 13, 2004.

Language and Statistics

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.

Course Name: Speech Recognition Course Number: Instructor: Hossein Sameti Department of Computer Engineering Room 706 Phone:

Auto Speech Recognition by İlkay ATIL Outline-1 Introduction Today and Future of ASR Automatic Speech Recognition Types of ASR systems Fundamentals.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.

Automatic Speech Recognition

Automatic Speech Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Statistical Models for Automatic Speech Recognition

Course Summary (Lecture for CS410 Intro Text Info Systems)

Special Topics in Data Mining Applications Focus on: Text Mining

HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs

CSE 515 Statistical Methods in Computer Science

Statistical Models for Automatic Speech Recognition

Language and Statistics

Overview of Machine Learning

CSCI 5832 Natural Language Processing

Presentation transcript:

1 LING 439/539: Statistical Methods in Speech and Language Processing Ying Lin Department of Linguistics University of Arizona

2 Welcome! Get the syllabus Fill out and return the information sheet Office: Douglass 224 OH: MW 2:00 --3:00 by appoint (also teaching another undergrad class) Course webpage: see syllabus Listserv coming soon.

3 438/538 and 439/539 LING 438/538 (Computational Linguistics): Symbolic representations (mostly syntax), e.g. FSA, CFG. Focus on logic Simple probabilistic models, e.g. N-grams.

4 438/538 and 439/539 This class complements 438/538: Numerical representations (speech signals): need digital signal processing Focus on statistics/learning More sophisticated probabilistic models, e.g. HMM, PCFG

5 Main reference texts (!)  Huang, Acero and Hon (2001). Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice- Hall.  Manning and Schutze (1999). Foundations of Statistical Natural Language Processing. MIT Press.  Rabiner and Juang (1993). Fundamental of Speech Recognition. Prentice-Hall.  Duda, Hart and Stork (2001). Pattern Classification (2nd ed). JohnWiley & Sons.  Rabiner and Schafer (1978). Digital Processing of Speech Signals. Prentice-Hall.  Hastie, Tibshirani and Friedman (2001). The Elements of Statistical Learning. Springer.

6 Guideline for course reading There is no single book that covers all of our materials. Most books are written either for EE or CS audience only. A few chapters are selected from each book (see the reading list). Lecture notes will summarize the reading. Expect a rough ride for the first time -- feedback is greatly appreciated!

7 Three skills for this class 1. Linguistics: understanding source of particular patterns. 2. Math/Statistics: underlying principles of the model. 3. Programming: implementation This class emphasizes 2, reason: Models are based on simple structures Programming skills require much practice

8 What is “statistical approach”? Narrow: uses statistical principle, I.e. based on the probability calculus or other theories of inductive inference Compared to logic: dedutive inference Broad: any work that uses a quantative measure of success Relevant to both language engineering and linguistic science

9 What is “statistical approach”? Narrow: uses statistical principle, I.e. based on the probability calculus or other theories of inductive inference Compared to logic: dedutive inference Broad: any work that uses a quantative measure of success Relevant to both anguage engineering and linguistic science This course

10 Language engineering: speech recognition Tasks: increasing level of difficulty Word Error Rate

11 A brief history of speech recognition 1950’s: U.S. government started funding research on automatic recognition of speech ’s: Isolated words, digit strings Debate: rules v.s. statistics Dynamic time warping 1980-now: continuous speech, speech understanding, spoken dialog Hidden Markov model dominates

12 Why the rules didn’t work? Completely bottom-up approach: Rules are hand-coded by experts Problem: variability in speech Sophisticated, symbolic rules are not flexible enough to handle continuous speech h A U A  j o U “How are you?” Phonetic rules Phonological rules

13 The rise of statistical methods in speech Initial solution: hire many linguists to continually improve the rule system This turns out to be costly and slow, failing the high expectation Advantage of statistical models: Allows training on different data: flexible, scalable Computing power much cheaper than expert Drives the move to less and less constrained tasks Bitterness: “every time I fire a linguist, the word error rate goes up” -- F. Jelinek (IBM)

14 The rise of statistics in NLP Very similar scenarios also happened in NLP: E.g. tagging, parsing, machine translation “Old” NLP: deductive systems, hand-coded “New” NLP: broad-coverage, corpus-based, emphasize training, evaluation Speech is now merging with NLP Many tools originated in speech, then got copied to NLP New task keep emerging: web as an (unstructured) data source

15 Basic architecture of today’s ASR system Audio speech Feature extraction X Model parameters trained offline: M1 = “I recognize speech” M2 = “I wreck a nice beach” … Acoustic modeling Likelihood p(X|M1), p(X|M2) Scoring Language model rank p(M1),p(M2) ANSWER

16 Component 1: signal processing / feature extraction First 1/3 of the course (also useful for understanding synthesis):

17 Examples of some common features

18 Component 2: Acoustic models Mixture of Gaussians: p(o t | qi) =  Dimension reduction: principle component analysis, linear discriminant analysis, parameter tying

19 Component 3: Pronunciation modeling Model for differnent pronunciations of “you” in continuous speech Other types of units: triphones, syllables ou  j a end start Each unit is an HMM

20 Component 4: Language model Provide the probability of word sequence models p(M) to combine with the acoustic model p(X|M) Common: N-gram with smoothing, backoff, very hard and specialized business Just starting to integrate parsing Fundamental equation: M* = argmax M p(M|X) = argmax M p(X|M)p(M) Viterbi, beam, A*, N-best search

21 ASR: example of a generative model Component provide an instance of generative models Language M generates word sequences Word sequence generates pronunciation Pronunciation generates acoustic features Unsupervised learning/training Maximum likelihood estimation Expectation-Maximization algorithm (different incarnations) Main focus of this class

22 Other models to look at: Descriptive/maximum entropy models Started in vision, then copied to speech, then NLP Discriminative models: directly using data to construct classifiers, with weak assumptions about prob distribution Supervised learning, focus on the perspective of classification Input string Feature vectorOutput labels count classifier “Machine learning approach to NLP”

23 Problem solved? No, improvements are mostly due to larger training set and speed up Driven by Moore’s law?

24 Challenges Environment distortion (microphone, noise, cocktail party) breaks feature extraction Acoustic condition mismatch Between + within speaker variability breaks the pronunciation modeling and acoustic modeling Conversational speech breaks the language model Understanding these problems is crucial for improving the performance of ASR

25 Dreaming “2001: A Space Odyssey” (1968) Dave: “Open the pod bay doors, HAL” HAL9000: “ I’m sorry Dave. I’m afraid I can’t do that.”

26 The reality, before the problem is solved Speech is used as a user interface only when people can’t use hand Driving a car (use speech to drive?) Device too small (cellphone) Customer service (who will tolerate touch tone?) Dictation (how many people actually use it?)

27 For next time: We will start with signal processing Uses engineering math, including power series (including convergence), trigonometric functions, integration and representation of complex numbers. If you forgot or do not know these materials, please look for references and study it before class.