Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.

Prediction with Regression

Biointelligence Laboratory, Seoul National University

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.

Chapter 2: Lasso for linear models

Building Global Models from Local Patterns A.J. Knobbe.

Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.

Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.

Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.

Support Vector Machines

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.

Part I: Classification and Bayesian Learning

Classification and Prediction: Regression Analysis

An Introduction to Support Vector Machines Martin Law.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.

PATTERN RECOGNITION AND MACHINE LEARNING

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines Jon Hamaker and Joseph Picone Institute for.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.

An Introduction to Support Vector Machines (M. Law)

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

Statistical Analysis with Big Data Dr. Fred Oswald Rice University CARMA webcast November 6, 2015 University of South Florida - Tampa, FL 1.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

CpSc 881: Machine Learning

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.

Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.

Predicting Children’s Reading Ability using Evaluator-Informed Features Matthew Black, Joseph Tepperman, Sungbok Lee, and Shrikanth Narayanan Signal Analysis.

LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.

Collaborative Deep Learning for Recommender Systems

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

Machine Learning with Spark MLlib

Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI

Bag-of-Visual-Words Based Feature Extraction

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

LECTURE 16: SUPPORT VECTOR MACHINES

Introduction to Data Science Lecture 7 Machine Learning Overview

Automatic Fluency Assessment

Roberto Battiti, Mauro Brunato

Machine Learning Feature Creation and Selection

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Linear Model Selection and regularization

Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.

Educational Testing Service (ETS)

Object Detection Implementations

Presentation transcript:

Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael Heilman* Educational Testing Service *CIVIS Analytics

Copyright © 2015 by Educational Testing Service. Overview Motivation Data Scoring models Results Conclusion

Copyright © 2015 by Educational Testing Service. Context and motivation Scoring of constructed responses -- speech Computation of features using NLP + speech technology, using speech recognition and signal processing outputs Predict scores using supervised machine learning Educational measurement: managing trade-off: - Maximize empirical performance - Maximize model interpretability

Copyright © 2015 by Educational Testing Service. Ideal Properties of Scoring Models High empirical performance Contains features that evaluate all relevant aspects of the test construct Relative Contribution by each feature should be obvious Inter-correlations between features not too high Polarity of feature weights correspond to their meaning Smaller and simpler is better (interpretability)

Copyright © 2015 by Educational Testing Service. Linear Regression Scoring Models Built by Human Experts Straightforward and well-known in all disciplines Allow to address most requirements of ideal scoring models Disadvantage: cumbersome development due to manual selection of features and checking for all constraints

Copyright © 2015 by Educational Testing Service. Proposed Model Explore alternative regression models, e.g., shrinkage methods Can do feature selection automatically while still addressing ideal model constraints 6

Copyright © 2015 by Educational Testing Service. Data Spoken English proficiency test Spontaneous speech, ~1 minute per response Score scale: 1 – 4 Data SetSpeakersResponsesH-H Correlation Train9,3129, Eval8,10147,

Copyright © 2015 by Educational Testing Service. Features 75 features extracted for each response via SpeechRater Construct dimensions: – fluency – pronunciation accuracy – prosody – grammar – vocabulary Dimensions not covered: content, discourse

Copyright © 2015 by Educational Testing Service. Scoring Models 1.Baseline: human expert (12 features) 2.All features using OLS regression 3.Hybrid stepwise regression 4.Non-negative least-square regression 5.Non-negative LASSO regression (LASSO*; lambda optimized to obtain a feature set size of about 25)

Copyright © 2015 by Educational Testing Service. LASSO Shrinkage model – dimensionality reduction Penalty for larger coefficients Sets subset of coefficients to zero Lambda-parameter: if zero: yields OLS model; if infinity: yields model with no features Determined optimal lambda empirically (Target number of features where performance flattens out)

Copyright © 2015 by Educational Testing Service. Crossvalidation Results ModelFeaturesNegative CoeffsCorrelation Expert baseline12No0.606 All OLS75Yes0.667 Hybrid stepwise~40Yes0.667 Non-neg Ls~35No0.655 LASSO*~25No0.649

Copyright © 2015 by Educational Testing Service. Results on Evaluation Set ModelFeaturesItem CorrSpeaker Corr Expert baseline All OLS LASSO*

Copyright © 2015 by Educational Testing Service. Construct Coverage Comparison Adding relative standardized beta-weights ConstructExpertLasso* Fluency Pronunciation accuracy Prosody Total for Delivery Grammar Vocabulary Total for Language Use

Copyright © 2015 by Educational Testing Service. Summary Building scoring models for constructed responses in line with best practices in educational measurement is a complex task of constraint satisfaction Therefore, this Task has been typically performed by human experts Our study demonstrates the viability of using automated methods of feature selection that can satisfy multiple requirements of ideal scoring models LASSO* model is more accurate, has very similar construct coverage compared to expert baseline and is highly interpretable