Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

3. Technical Documentation
CIS 376 Bruce R. Maxim UM-Dearborn
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
System Design and Analysis
Recording Meetings with the CMU Meeting Recorder Architecture Satanjeev Banerjee, et al. School of Computer Science Carnegie Mellon University.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Performance Evaluation of a Multi-Threaded Distributed Telerobotic Framework Mayez Al-Mouhamed, Onur Toker, and Asif Iqbal College of Computer Science.
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
PRASHANTHI NARAYAN NETTEM.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
Multi-Party XML Synchronization over Limited Bandwidth Thomas Wilczak Prof. James Riely SE 696 Research Project Initial Presentation 5th May, 2004.
Adnan Ozsoy & Martin Swany DAMSL - Distributed and MetaSystems Lab Department of Computer Information and Science University of Delaware September 2011.
INTERACT : M OTION S ENSOR D RIVEN G ESTURE R ECOGNITION C LOUD S ERVICE School of Electronic & Computer Engineering Technical University of Crete, Greece.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Introduction to Systems Analysis and Design Trisha Cummings.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Chapter 2 The process Process, Methods, and Tools
T Project Review RoadRunners [PP] Iteration
1 CMPT 275 High Level Design Phase Architecture. Janice Regan, Objectives of Design  The design phase takes the results of the requirements analysis.
© 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1 A Discipline of Software Design.
Notes on ICASSP 2004 Arthur Chan May 24, This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Software Metrics - Data Collection What is good data? Are they correct? Are they accurate? Are they appropriately precise? Are they consist? Are they associated.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 27Slide 1 Software change l Managing the processes of software system change.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
 Chapter 6 Architecture 1. What is Architecture?  Overall Structure of system  First Stage in Design process 2.
Vclass Presentation Component Kanchana Kanchanasut, DEC Director Dr. Akavute Sujare, DEC Consultant Mr. Chaiwarat Chaiyapotpanit, Project Manager Distance.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Introduction to Software Development. Systems Life Cycle Analysis  Collect and examine data  Analyze current system and data flow Design  Plan your.
ENTERFACE 08 Project 2 “multimodal high-level data integration” Mid-term presentation August 19th, 2008.
Lecture 22: Client-Server Software Engineering
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
Team S07. Agenda Scope of project Global use case diagram Analysis use cases High Level design (Software Architecture) Prototype challenges faced and.
ANKITHA CHOWDARY GARAPATI
1 IRI-h a Java-based Distance Education System Department of Computer Science Old Dominion University Norfolk, VA 23529, USA.
Real-Time Cyber Physical Systems Application on MobilityFirst Winlab Summer Internship 2015 Karthikeyan Ganesan, Wuyang Zhang, Zihong Zheng Shantanu Ghosh,
Theban Stanley, Julie Baca, Matt Elliott and Joseph Picone Human and Systems Engineering Center for Advanced Vehicular Systems Mississippi State University.
Collaborator Revolutionizing the way you communicate and understand
Software Engineering Chapter: Computer Aided Software Engineering 1 Chapter : Computer Aided Software Engineering.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
EMI INFSO-RI ARC tools for revision and nightly functional tests Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice,
Making the System Operational Implementation & Deployment
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
Speech Recognition Created By : Kanjariya Hardik G.
Business-logic Layer Presentation Layer Network Layer Digital Signal Processing Layer SmartHome API SmartHome Software Architecture SH mobile application.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Introduction to System Analysis and Design MADE BY: SIR NASEEM AHMED KHAN DOW VOCATIONAL & TECHNICAL TRAINING CENTRE.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Graphical Data Engineering
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Making the System Operational Implementation & Deployment
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Sphinx Recognizer Progress Q2 2004
Presentation transcript:

Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky

Role of the CALO recorder A centralized mechanism to collect all perceptual events. Speech, Text CMU provides technology on On Event Recording On Speech Recognition

Role of the CALO Recorder One of the component of CAMPER The four: CALO recorder Speechalizer End-pointing Information Prosodic Information Speech Recognition CAMSeg Speech Segmentation Understanding

An Architecture Diagram (Client Side) Audio CapturingText Capturing through Keyboard Ring Buffers End-PointerVU MeterSpeech Decoder Other Events Storage

Persistence of Data Background Intelligent Transfer System (BITS) Use to transfer data off-line

Technical Challenges in the Recorder Threading Audio Buffering Time-synchronization Real-time processing End-pointing Speech processing Portability Maintenance/Distribution

Threading Several processing needs to be concurrently VU meter Speech Processing and Higher-level Understanding Graphical User Interface Long development time was invested to make the communication between to be correct. (By Thomas Quisel) See Architecture Diagram next slides Example Issues: In some platforms, WX implementation will make GUI thread disallow other threads to call its drawing functions.

Audio Buffering Sphinx 2, 3.X libaudio require, Capture audio Do processing on the audio buffer. If the processing thread is slightly slower than 1xRT Audio will be lost (By Jason Cohen) A ring buffer structure is implemented.

Time Synchronization By David Huggins Simple NTP (SNTP) is used in getting universal time coordinate (UTC) from arbitrary NTP server Clone of standard NTP implementation Internal Synchronization Synchronization time between machines 50-60ms Major challenge is the delay imposed by OS/audio capturing software.

Real-time Processing Role of End-pointing and Recognition After long-time debate Two stage end-pointing and recognition architecture is chosen By Ziad High performance end-pointing routine is created Gaussian Mixture Model-based End-pointer implemented as a frames voter within segments The parameters are further manually tuned. Speed optimized. Now in s3ep, a customized version of Sphinx

Speech Recognizer Resulting output is fed to the recognizer Speech Recognition in meeting Regards as one of the biggest challenge in the field Results largely varied from meeting style, number of attendants, topics, disfluencies of the speakers.

Accuracy Performance, still under heavy work, Currently…… In the cleanest meeting (Bdb001) With one very dominating male speaker With one very dominating female speaker Speaker speaking rate entropy is lowest Error rate 29.4%

Phase IV of Accuracy Improvement (Core) Boosting-based training Confidence-based N-best re-ranking Speaker adaptation based on transformation Speaker normalization Include BN, SWB material in LM training Dictionary Refinement

Phase IV of Accuracy Improvement (Optional) STC MLLT DT PLP, TRAP LM with disfluencies and back- channeling

Speed 2.2G machine Communicator S2, 17.3%, 0.34xRT S3.X BL 11.8%, 4xRT S3.X Tuned 12.8, 0.87xRT WSJ 5k S3.X BL 7.4% 1.61xRT S3.X BL 8.3% 0.5xRT ICSI With tuning SVQ and CIGMMS, 0.7xRT is achieved. We may possibly tune up the results. Benchmarking results need time to prepared

Maintenance and Distribution All in local CVS C, Java Will soon move to SRI Regular release is created, usage of SRI’s CVS will blur this line.

Conclusion Engineering work is mostly done for the recorder Time to improve individual components. Everyone is welcomed to join the effort.