The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Speech Processing for NSR Vs DSR Veeru Ramaswamy PhD CTO, Vianix LLC
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Digital Video Archiving. ViArchive Overview ViArchive provides user friendly solutions for… – uploading video clips with metadata (searchable file info.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
ADVISE: Advanced Digital Video Information Segmentation Engine
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Basic Computer Networks Configurations (cont.) School of Business Eastern Illinois University © Abdou Illia, Spring 2006 Week 2, Thursday 1/19/2006)
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Final Year Student Projects: Prelude Michael R. Lyu.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
Chinese Character Recognition for Video Presented by: Vincent Cheung Date: 25 October 1999.
Advance Information Retrieval Topics Hassan Bashiri.
AdvAIR Supervised by Prof. Michael R. Lyu Prepared by Alex Fok, Shirley Ng 2002 Fall An Advanced Audio Information Retrieval System.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
FYP0202 Advanced Audio Information Retrieval System By Alex Fok, Shirley Ng.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab The University of Texas at Dallas.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
A Web-based Multi-lingual Teaching and Learning Method for Engineering Prof. Eric Cheng Department of Electrical Engineering The Hong Kong Polytechnic.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Understanding the Semantics of Media Lecture Notes on Video Search & Mining, Spring 2012 Presented by Jun Hee Yoo Biointelligence Laboratory School of.
Aspects of Music Information Retrieval Will Meurer School of Information University of Texas.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Content-Based MP3 Information Retrieval Chueh-Chih Liu Department of Accounting Information Systems Chihlee Institute of Technology 2005/06/16.
Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
A Music Search Engine for Plagiarism Detection
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
Remember that our latest topics involve a more advanced look at how webpages work
Music Matching Speaker : 黃茂政 指導教授 : 陳嘉琳 博士.
Introduction to Music Information Retrieval (MIR)
Hybrid Features based Gender Classification
Video Summarization by Spatial-Temporal Graph Optimization
Next-Generation Search Engines -Perspective and challenges
Retrieval of audio testimonials via voice search
Cross-library API Recommendation Using Web Search Engines
David Cyphert CS 2310 – Software Engineering
Data Mining (Don’t worry, I am not presenting these slides; just for your reading pleasure)
Ying Dai Faculty of software and information science,
Network Controllable MP3 Player
A maximum likelihood estimation and training on the fly approach
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
EE 492 ENGINEERING PROJECT
SNR-Invariant PLDA Modeling for Robust Speaker Verification
Presentation transcript:

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System

Network-based AdvAIR System Consists of client side and server side Client Side  Consists of 2 parts Advanced Part Audio Data Mining Audio Data Retrieval and Indexing Basic Part Audio Streaming from Server side Server Side  For Audio Streaming, Searching on Server

Advanced Part of AdvAIR system Audio Data Mining  Segmentation  Recognition Engine  Segmentation with Speaker Recognition Audio Retrieval and Indexing  Query by Humming  Pattern Matching  Search on Server

Audio Data Mining – Recognition Engine Consists of Three functions:  Speaker Recognition  Language Recognition  Gender Recognition Speaker Recognition engine  Open-set system with 10 models and 1 general model Language Recognition engine  Close-set system with 3 models (Cantonese, English, Mandarin) Gender Recognition engine  Close-set system with 2 models (Male and Female)

Group 1 Group 2 Group 3 Audio Data Mining - Segmentation

Bayesian Information Criterion is used for determining the acoustic change point of the input Mpeg file First, input an Mpeg file Next, extract the features Use BIC criterion to calculate the change point Finally, have a list of segments which is cut according to acoustic change point

Audio Data Mining – Recognition Engine Input Mpeg Extract feature Trained Model Calculate a score For each model Select the most suitable model

Audio Data Mining – Recognition Engine Use Gaussian Mixture Model  text independent, robust, computationally efficient 256 mixture for each models Need pre-processing (Training) First, input Mpeg file Next, extract the features Calculate a score for each models and select the model with the best score

Audio Data Mining – Segmentation with Speaker Recognition Automatic speaker recognition engine First, do segmentation Next, each segmentation is sent to the speaker recognition engine Finally, we get list of segments in which the speakers of each segment will be known

Group 1 Group 2 Group 3 Speaker identification Process Speaker1 Speaker 2 Speaker 3 Speaker 2 Speaker 1 Speaker 2

Audio Retrieval and Indexing - Query by Humming First Step:  Do Pitch Tracking using time domain autocorrelation function, ACF for the input audio clips  Track the trend of input audio clips, in the manner of “UP”, “Down” or “Same”  Intermediate output: a file consists of a list of “Up”, “Down”, “Same” Second Step:  Do largest substring matching for each of the intermediate output of audio clips in the database and the intermediate output of the input audio clip and calculate a score Last Step:  List the audio clips in database according to the score

Hummed Song Intermediate representation Intermediate Database Largest Substring matching Pitch tracker

Pitch tracker Tack the pitch of hummed voice, convert into representation of relative change of voice E.g. Do Me Fa So Fa Re Me U U U D D U

Audio Retrieval and Indexing – Direct Audio Search First Step:  covariance matrix is calculated from the feature vectors of the cue-audio and a clip in database Second Step:  AHS (arithmetic harmonic sphericity) distance measurement to calculate a score Last Step:  List the audio clips in database according to the score

Target Clips with Same size Source Clip AHU Comparison

Audio Retrieval and Indexing – Search on Server Direct Audio Search on Server Server Side has a database Client connect to server Client select a cue-audio and upload to the server Server will do the direct audio search and send back the result Client can use the audio streaming to get the result file

Basic Part - Audio Streaming AdvAIR is N-to-N system, allow N server and N client Client and Server can be added at any time It’s Fault Tolerant

Basic Part – Server Side Have two parts:  For Audio Streaming  For Searching on Server (Direct Search on server) Separate it because Searching on Server use a lot of resource A server can’t process for too many users at the same time Only privileged users allow to use the searching on server function

Basic Part – Client Side Client request for download, an audio clips is divided into many small parts Each server send a small parts to client simultaneously to speed up the download speed Client combine all the small parts to form the whole file

The End