Presentation is loading. Please wait.

# Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results.

## Presentation on theme: "Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results."— Presentation transcript:

Hyeonsoo, Kang

▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results

What is “supervised learning?”

 It is the way of doing such that the algorithm designers manually identify important structures, collect labeled data for training, and apply tools to earn the classifiers

Burden of labeling and training Cannot be readily extended to diverse new domains at a large scale. Good Works for domain-specific problems at a small scale Bad

Burden of labeling and training Cannot be readily extended to diverse new domains at a large scale. Good Works for domain-specific problems at a small scale Bad Let’s aim at an automated method which works just fine for domain-specific problems but also flexible & scalable!

Burden of labeling and training Cannot be readily extended to diverse new domains at a large scale. Good Works for domain-specific problems at a small scale Bad Let’s aim at an automated method which works just fine for domain-specific problems but also flexible & scalable! But is that possible …?

A temporal sequence of nine shots, each shot is a second apart Observations?

Similar color & movements A temporal sequence of nine shots, each shot is a second apart

Observations?

Different color A temporal sequence of nine shots, each shot is a second apart

Observations?

Different camera walk A temporal sequence of nine shots, each shot is a second apart

Let’s focus on a particular domain of videos, such that (1) Video structures is in a discrete state-space (2) The features, i.e., observations from data are stochastic (small statistical variations on the raw features) (3) The sequence is highly correlated in time

Unsupervised learning approaches are chiefly twofold: (a) Model learning algorithm (b) Feature selection algorithm

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

(a) Model learning algorithm 1.Base line: uses a two level HHMM to model structures in video. 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.

(a) Model learning algorithm 1.Base line: uses a two level HHMM to model structures in video. 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance. Wait, what is HMM then?

[Quick Review: HMM]

Stated more formally, we define the observation sequence O as O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?” corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. This probability can be expressed (and evaluated) as This probability can be expressed (and evaluated) asP(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model]

[Quick Review: HMM] Stated more formally, we define the observation sequence O as O = {S3, S3, S3, S1, S1, S3, S2, S3} “sunny-sunny-rain-sunny-cloudy-sunny- …?” corresponding to t = 1, 2, …, 8, and we wish to determine the probability of O, given the model. This probability can be expressed (and evaluated) as This probability can be expressed (and evaluated) asP(O|Model) = P[S3, S3, S3, S1, S1, S3, S2, S3 | Model] = P[S3]  P[S3|S3]  P[S3|S3]  P[S1|S3]  P[S1|S1]  P[S3|S1] P[S2|S3]  P[S3|S2]  P[S3|S1] P[S2|S3]  P[S3|S2]

[Quick Review: HMM]

MM Observable

(a) Model learning algorithm 1. Base line: uses HHMM 2. HHMM ::= Hierarchical Hidden Markov Model. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Hierarchical Hidden Markov Model is a statistical model derived from the Hidden Markov Model (HMM). The HHMM utilizes its structure to solve a subset of the problems more efficiently, but can be transformed into a standard HMM. Therefore, the coverage of HHMM and HMM are the same, but their performance.

(a) Model learning algorithm

An example of HHMM

sunny rain cloudy And lower nodes represent some variations … An example of HHMM

(a) Model learning algorithm 3. To estimate parameters we use (1)Expectation Maximization (EM) algorithm (2)Bayesian Learning Techniques (3)Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) (4)Bayesian Information Criteria (BIC)

(a) Model learning algorithm 3. To estimate parameters we use (1)Expectation Maximization (EM) algorithm (2)Bayesian Learning Techniques (3)Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) (4)Bayesian Information Criteria (BIC)  Model parameters are updated using EM  Model structure learning uses MCMC; parameter learning for HHMM using EM is known to converge to a local maximum of the data likelihood since EM is an hill- climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable.  we adopt randomized search

(a) Model learning algorithm 3. To estimate parameters we use (1)Expectation Maximization (EM) algorithm (2)Bayesian Learning Techniques (3)Reverse-Jump Markov Chain Monte Carlo (RJ MCMC) (4)Bayesian Information Criteria (BIC)  Model parameters are updated using EM  Model structure learning uses MCMC; parameter learning for HHMM using EM is known to converge to a local maximum of the data likelihood since EM is an hill- climbing algorithm. – But searching for a global maximum in the likelihood landscape is intractable.  we adopt randomized search However, I will not go through them one by one… if you are interested, you can find it in the paper: Xie, Lexing, et al. [1].

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

(a) Model learning algorithm (b) Feature selection algorithm Using a fixed feature set manually selected based on heuristics  build a model of good performance to distinguish high-level structures of the given video Using both the model learning algorithm and the feature selection algorithm  results a model and a set of features that distinguish high-level structures of the given video well

Into what aspects does the feature selection can be divided and why?

 Feature selection is divided into two aspects: (1)Eliminating irrelevant features – usually irrelevant features disturb the classifier and degrade classification accuracy (2)Eliminating redundant ones – redundant features add to computational cost without bringing in new information.  Feature selection is divided into two aspects: (1)Eliminating irrelevant features – usually irrelevant features disturb the classifier and degrade classification accuracy (2)Eliminating redundant ones – redundant features add to computational cost without bringing in new information. Into what aspects does the feature selection can be divided and why?

(b) Feature selection algorithm 1. We use filter-wrapper methods and wrapper step corresponds to eliminating irrelevant features, and filter step corresponds to eliminating redundant ones. (a) Wrapper step – partitions the feature pool into consistent groups (b) Filter step – eliminates redundant dimensions 2. For example there are features like … Dominant Color Ratio (DCR), Motion Intensity (MI), the least-square estimation of camera translation (MX, MY), and five audio features – Volume, Spectral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero- crossing rate (ZCR)

(b) Feature selection algorithm 3. Algorithm structure The big picture would be: HHMM Viterbi state sequence  information gain Markov blanket filtering BIC fitness

(b) Feature selection algorithm 3. Algorithm structure The big picture would be: HHMM Viterbi state sequence  information gain Markov blanket filtering BIC fitness In detail:

Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm

Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm We use a fixed set of features manually selected on heuristics (dominant color ratio and motion intensity) (Xu et al., 2001; Xie et al., 2002b)

Experiments and Results For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (a) Model learning algorithm We use a fixed set of features manually selected on heuristics (dominant color ratio and motion intensity) (Xu et al., 2001; Xie et al., 2002b) Built four different learning schemes against the ground truth: (1)Supervised HMM (2)Supervised HHMM (3)Unsupervised HHMM without model adaptation (4)Unsupervised HHMM with model adaptation

Experiments and Results

For soccer videos, the main evaluation focused on distinguishing the two semantic evens, play and break (b) Feature selection algorithm Based on the good performance of the model parameter and structure learning algorithm, we test the performance of the automatic feature selection method that iteratively wraps around, and filters. A 9-dimensional feature vector sampled at every 0.1 seconds including: Dominant Color Ratio (DCR), Motion Intensity (MI), the least- square estimation of camera translation (MX, MY), and five audio features – Volume, Spectral roll-off (SR), Low-band energy (LE), High-band energy (HE), and Zero-crossing rate (ZCR)

Experiments and Results Evaluation against the play/break labels showed a 74.8 % accuracy. For clip Spain, the final selected feature set was {DCR, Volume}; with 74.8% accuracy For clip Korea, the final selected feature set is {DCR, MX}; with 74.5% accuracy [Testing on the baseball video]  Yielded three consistent compact feature groups: {HE, LE, ZCR}, {DCR, MX}, {Volume, SR}  Resulting segments have consistent perceptual properties, with one cluster of segments mostly corresponding to pitching shots and other field shots when the game is in play, while the other cluster contains most of the cutaways shots, score boards and game breaks, respectively.

SummarySummary With a specific domain of videos (sports; soccer and baseball), our unsupervised learning method can perform well. Our method was chiefly twofold, one was model learning algorithm and the other feature selection algorithm. In model learning algorithm, We used HHMM as the basic model and used other techniques such as Expectation Maximization (EM) algorithm, Bayesian Learning Techniques, Reverse-Jump Markov Chain Monte Carlo (RJ MCMC), and Bayesian Information Criteria (BIC) to set the parameters for the model. In feature selection algorithm, together with a model of good performance, we used filter-wrapper methods to eliminate irrelevant and redundant features.

QuestionsQuestions 1. What is supervised learning? 1. What is supervised learning? 2. What is the benefit of using unsupervised learning?  3. Into what aspects does the feature selection can divided and why? 

QuestionsQuestions 1. What is supervised learning? 1. What is supervised learning?  the algorithm designers manually identify important structures, collect labelled data for training, and apply supervised learning tools to learn the classifiers. 2. What is the benefit of using unsupervised learning?  (A) It alleviates the burden of labelling and training. (B) also it provides a scalable solution for generalizing video indexing techniques. 3. Into what aspects does the feature selection can divided and why?  Feature selection: is divided into two aspects  Feature selection: is divided into two aspects (1) eliminating irrelevant features: Usually irrelevant features disturb the classifier and degrade classification accuracy (2) eliminating redundant ones: Redundant features add to computational cost without bringing in new information.

Bibliography [1] Rabiner, Lawrence R. "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77.2 (1989): 257- 286. [2] Xie, Lexing, et al. "Structure analysis of soccer video with hidden Markov models." Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. Vol. 4. IEEE, 2002. [3] Xie, Lexing, et al. "Unsupervised mining of statistical temporal structures in video." Video mining. Springer US, 2003. 279-307. [4] Xu, Peng, et al. "Algorithms and system for segmentation and structure analysis in soccer video." IEEE International Conference on Multimedia and Expo. 2001.

THANK YOU!

Q & A

Download ppt "Hyeonsoo, Kang. ▫ Structure of the algorithm ▫ Introduction 1.Model learning algorithm 2.[Review HMM] 3.Feature selection algorithm ▫ Results."

Similar presentations

Ads by Google