Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

Similar presentations


Presentation on theme: "Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg."— Presentation transcript:

1 Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg

2 Spoken Language Processing How machines can interact with speech. Speech Recognition Speech Synthesis Analysis of Speech Lexical Processing Acoustic Signal Processing 1

3 Processing Tools In many applications, a common trajectory can be observed. –Manually written rule-based systems –Corpus-based Analysis –Automatic training of systems via Machine Learning 2

4 Rule based systems Expert knowledge of a domain Often based on un-tested hypotheses Brittle –These are difficult to modify –Often have complex interdependencies –Rarely are able to determine “confidence” 3

5 Machine Learning Systems Learn the relationship between –A Feature Vector, and –A dependent variable {label, or number} 4 Classifier Training data Learning Algorithm Classifier Hypothesis Feature Vector Featur e Vectors Labels

6 Where do we use machine learning? The trajectory in natural language processing and speech has been from –manually written rules, to –automatically generated rules learned from an abundance of data 5 Speech Recognition Speech Synthesis Prosodic Analysis Segmentation Grapheme to phoneme conversion Speech act classification Disfluency Identification Emotion classification Speech segmentation Part of speech tagging Parsing Translation Turn-taking Information Extraction

7 How do we use machine learning The Standard Approach to learning –Identify labeled training data –Decide what to label – syllables or words –Extract aggregate acoustic features based on the labeling region –Train a supervised classifier –Evaluate using cross-validation or a held-out test set. 6

8 What’s the role of linguistics? How do the rule based systems inform machine learning? Feature Representations. –The way we represent an entity or phenomenon is informed by intuitions and prior study. –The process of hand generating rules has moved to hand generation of Feature Extraction methods 7

9 What are the favorite tools in SLP? Decision Trees Support Vector Machines –Conditional Random Fields Neural Networks Hidden Markov Model k-means k-nearest neighbors Graphical Models Expectation maximization 8

10 Training, Development and Testing Available data is commonly divided into three sets –Training Used to train the model –Development Used to learn the best settings for parameters –Testing Used to evaluate the performance of the model trained on the training data with parameters l 9

11 Cross-Validation Cross Validation is a technique to estimate the generalization performance of a classifier. Identify n “folds” of the available data. Train on n-1 folds Test on the remaining fold. Calculate average performance 10 …

12 Stratified Cross-validation Some classes have skewed distributions –For example, parts of speech. When creating cross validation folds, the class distribution is maintained across all folds 11 Function Verb Adj.

13 Dimensionality In general, the more dimensions that a feature vector has, the more training data is necessary for reliable learning. –Some classifiers are more sensitive to this than others. When we have a vocabulary of size N, this is often converted to N binary variables. This can quickly lead to an enormous feature space. 12

14 Dimensionality Reduction techniques Dimensionality reduction techniques are commonly used to reduce the number of dimensions, while keeping as much information as possible Regularization Principle Components Analysis Multi-dimensional scaling Quantization 13

15 Next Time Working Session Anonymous Course Feedback 14


Download ppt "Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg."

Similar presentations


Ads by Google