Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

Automatic Soccer Video Analysis and Summarization

Supervised Learning Recap

Tutorial on Hidden Markov Models.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Hidden Markov Models in NLP

Accelerometer-based Transportation Mode Detection on Smartphones

Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,

Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Presented by Zeehasham Rasheed

Visual Recognition Tutorial

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Learning Hidden Markov Model Structure for Information Extraction Kristie Seymour, Andrew McCullum, & Ronald Rosenfeld.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Isolated-Word Speech Recognition Using Hidden Markov Models

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Graphical models for part of speech tagging

7-Speech Recognition Speech Recognition Concepts

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Understanding the Semantics of Media Lecture Notes on Video Search & Mining, Spring 2012 Presented by Jun Hee Yoo Biointelligence Laboratory School of.

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Case Study 1 Semantic Analysis of Soccer Video Using Dynamic Bayesian Network C.-L Huang, et al. IEEE Transactions on Multimedia, vol. 8, no. 4, 2006 Fuzzy.

CS Statistical Machine learning Lecture 24

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

1 Neighboring Feature Clustering Author: Z. Wang, W. Zheng, Y. Wang, J. Ford, F. Makedon, J. Pearlman Presenter: Prof. Fillia Makedon Dartmouth College.

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

John Lafferty Andrew McCallum Fernando Pereira

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.

1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

Trajectory-Based Ball Detection and Tracking with Aid of Homography in Broadcast Tennis Video Xinguo Yu, Nianjuan Jiang, Ee Luang Ang Present by komod.

Hidden Markov Models BMI/CS 576

Saliency-guided Video Classification via Adaptively weighted learning

EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture

Statistical Models for Automatic Speech Recognition

estimated tracklet partition

Hidden Markov Models Part 2: Algorithms

Statistical Models for Automatic Speech Recognition

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Hidden Markov Model LR Rabiner

CONTEXT DEPENDENT CLASSIFICATION

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

LECTURE 15: REESTIMATION, EM AND MIXTURES

Speech recognition, machine learning

Speech recognition, machine learning

Presentation transcript:

Hyeonsoo, Kang

▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results

What is “supervised learning?”

 It is the way of doing such that the algorithm designers manually identify important structures, collect labeled data for training, and apply tools to earn the classifiers

Burden of labeling and training Cannot be readily extended to diverse new domains at a large scale. Good Works for domain-specific problems at a small scale Bad

Burden of labeling and training Cannot be readily extended to diverse new domains at a large scale. Good Works for domain-specific problems at a small scale Bad Let’s aim at an automated method which works just fine for domain-specific problems but also flexible & scalable!

Burden of labeling and training Cannot be readily extended to diverse new domains at a large scale. Good Works for domain-specific problems at a small scale Bad Let’s aim at an automated method which works just fine for domain-specific problems but also flexible & scalable! But is that possible …?

A temporal sequence of nine shots, each shot is a second apart Observations?

Similar color & movements A temporal sequence of nine shots, each shot is a second apart

Observations?

Different color A temporal sequence of nine shots, each shot is a second apart

Observations?

Different camera walk A temporal sequence of nine shots, each shot is a second apart

Let’s focus on a particular domain of videos, such that (1) Video structures is in a discrete state-space (2) The features, i.e., observations from data are stochastic (small statistical variations on the raw features) (3) The sequence is highly correlated in time

Unsupervised learning approaches are chiefly twofold: (a) Model learning algorithm (b) Feature selection algorithm

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

(a) Model learning algorithm 1.Base line: uses a two level HHMM to model structures in video. 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.

(a) Model learning algorithm 1.Base line: uses a two level HHMM to model structures in video. 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance. Wait, what is HMM then?

[Quick Review: HMM]

Stated more formally, we define the observation sequence O as O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?” corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. This probability can be expressed (and evaluated) as This probability can be expressed (and evaluated) asP(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]

[Quick Review: HMM] Stated more formally, we define the observation sequence O as O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?” corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. This probability can be expressed (and evaluated) as This probability can be expressed (and evaluated) asP(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model] = P[S3]  P[S3|S3]  P[S3|S3]  P[S1|S3]  P[S1|S1]  P[S3|S1] P[S2|S3]  P[S3|S2]  P[S3|S1] P[S2|S3]  P[S3|S2]

[Quick Review: HMM]

MM Observable

(a) Model learning algorithm 1. Base line: uses HHMM 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.

(a) Model learning algorithm

An example of HHMM

sunny rain cloudy And lower nodes represent some variations … An example of HHMM

(a) Model learning algorithm 3. To estimate parameters we use (1)Expectation Maximization (EM) algorithm (2)Bayesian Learning Techniques (3)Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) (4)Bayesian Information Criteria (BIC)

(a) Model learning algorithm 3. To estimate parameters we use (1)Expectation Maximization (EM) algorithm (2)Bayesian Learning Techniques (3)Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) (4)Bayesian Information Criteria (BIC)  Model parameters are updated using EM  Model structure learning uses MCMC; parameter learning for HHMM using EM is known to converge to a local maximum of the data likelihood since EM is an hill- climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable.  we adopt randomized search

(a) Model learning algorithm 3. To estimate parameters we use (1)Expectation Maximization (EM) algorithm (2)Bayesian Learning Techniques (3)Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) (4)Bayesian Information Criteria (BIC)  Model parameters are updated using EM  Model structure learning uses MCMC; parameter learning for HHMM using EM is known to converge to a local maximum of the data likelihood since EM is an hill- climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable.  we adopt randomized search However, I will not go through them one by one… if you are interested, you can find it in the paper: Xie, Lexing, et al. [1].

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

Into what aspects does the feature selection can be divided and why?

 Feature selection is divided into two aspects: (1)Eliminating irrelevant features – usually irrelevant features disturb the classifier and degrade classification accuracy (2)Eliminating redundant ones – redundant features add to computational cost without bringing in new information.  Feature selection is divided into two aspects: (1)Eliminating irrelevant features – usually irrelevant features disturb the classifier and degrade classification accuracy (2)Eliminating redundant ones – redundant features add to computational cost without bringing in new information. Into what aspects does the feature selection can be divided and why?

(b) Feature selection algorithm 1. We use filter-wrapper methods and wrapper step corresponds to eliminating irrelevant features, and filter step corresponds to eliminating redundant ones. (a) Wrapper step – partitions the feature pool into consistent groups (b) Filter step – eliminates redundant dimensions 2. For example there are features like … Dominant Color Ratio (DCR), Motion Intensity (MI), the least-square estimation of camera translation (MX, MY), and five audio features – Volume, Spectral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero- crossing rate (ZCR)

(b) Feature selection algorithm 3. Algorithm structure The big picture would be: HHMM Viterbi state sequence  information gain Markov blanket filtering BIC fitness

(b) Feature selection algorithm 3. Algorithm structure The big picture would be: HHMM Viterbi state sequence  information gain Markov blanket filtering BIC fitness In detail:

Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm

Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm We use a fixed set of features manually selected on heuristics (dominant color ratio and motion intensity) (Xu et al., 2001; Xie et al., 2002b)

Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm We use a fixed set of features manually selected on heuristics (dominant color ratio and motion intensity) (Xu et al., 2001; Xie et al., 2002b) Built four different learning schemes against the ground truth: (1)Supervised HMM (2)Supervised HHMM (3)Unsupervised HHMM without model adaptation (4)Unsupervised HHMM with model adaptation

Experiments and Results

For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (b) Feature selection algorithm Based on the good performance of the model parameter and structure learning algorithm, we test the performance of the automatic feature selection method that iteratively wraps around, and filters. A 9-dimensional feature vector sampled at every 0.1 seconds including: Dominant Color Ratio (DCR), Motion Intensity (MI), the least- square estimation of camera translation (MX, MY), and five audio features – Volume, Spectral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero-crossing rate (ZCR)

Experiments and Results Evaluation against the play/break labels showed a 74.8 % accuracy. For clip Spain, the final selected feature set was {DCR, Volume}; with 74.8% accuracy For clip Korea, the final selected feature set is {DCR, MX}; with 74.5% accuracy [Testing on the baseball video]  Yielded three consistent compact feature groups: {HE, LE, ZCR}, {DCR, MX}, {Volume, SR}  Resulting segments have consistent perceptual properties, with one cluster of segments mostly corresponding to pitching shots and other field shots when the game is in play, while the other cluster contains most of the cutaways shots, score boards and game breaks, respectively.

SummarySummary With a specific domain of videos (sports; soccer and baseball), our unsupervised learning method can perform well. Our method was chiefly twofold, one was model learning algorithm and the other feature selection algorithm. In model learning algorithm, We used HHMM as the basic model and used other techniques such as Expectation Maximization (EM) algorithm, Bayesian Learning Techniques, Reverse-Jump Markov Chain Monte Carlo (RJ MCMC), and Bayesian Information Criteria (BIC) to set the parameters for the model. In feature selection algorithm, together with a model of good performance, we used filter-wrapper methods to eliminate irrelevant and redundant features.

QuestionsQuestions 1. What is supervised learning? 1. What is supervised learning? 2. What is the benefit of using unsupervised learning?  3. Into what aspects does the feature selection can divided and why? 

QuestionsQuestions 1. What is supervised learning? 1. What is supervised learning?  the algorithm designers manually identify important structures, collect labelled data for training, and apply supervised learning tools to learn the classifiers. 2. What is the benefit of using unsupervised learning?  (A) It alleviates the burden of labelling and training. (B) also it provides a scalable solution for generalizing video indexing techniques. 3. Into what aspects does the feature selection can divided and why?  Feature selection: is divided into two aspects  Feature selection: is divided into two aspects (1) eliminating irrelevant features: Usually irrelevant features disturb the classifier and degrade classification accuracy (2) eliminating redundant ones: Redundant features add to computational cost without bringing in new information.

Bibliography [1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): [2] Xie, Lexing, et al. "Structure analysis of soccer video with hidden Markov models." Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. Vol. 4. IEEE, [3] Xie, Lexing, et al. "Unsupervised mining of statistical temporal structures in video." Video mining. Springer US, [4] Xu, Peng, et al. "Algorithms and system for segmentation and structure analysis in soccer video." IEEE International Conference on Multimedia and Expo

THANK YOU!

Q & A