Non-Monotonic Parsing of Fluent Umm I mean Disfluent Sentences Mohammad Sadegh Rasooli[Columbia University] Joel Tetreault[Yahoo Labs] This work conducted.

Slides:

Advertisements

Similar presentations

Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.

Advertisements

Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.

Random Forest Predrag Radenković 3237/10

HPSG parser development at U-tokyo Takuya Matsuzaki University of Tokyo.

Yasuhiro Fujiwara (NTT Cyber Space Labs)

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Dependency Parsing Joakim Nivre. Dependency Grammar Old tradition in descriptive grammar Modern theroretical developments: –Structural syntax (Tesnière)

Dependency Parsing Some slides are based on:

® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.

AdaBoost & Its Applications

Face detection Many slides adapted from P. Viola.

Application of HMMs: Speech recognition “Noisy channel” model of speech.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

1. 2 General problem Retrieval of time-series similar to a given pattern.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

An SVM Based Voting Algorithm with Application to Parse Reranking Paper by Libin Shen and Aravind K. Joshi Presented by Amit Wolfenfeld.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Online Learning Algorithms

Semantic Parsing for Robot Commands Justin Driemeyer Jeremy Hoffman.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.

NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.

Ling 570 Day 17: Named Entity Recognition Chunking.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Develop a fast semantic decoder Robust to speech recognition noise Trainable on different domains: Tourist information (TownInfo) Air travel information.

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail:

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

National Taiwan University, Taiwan

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypothesis in real time Robust to speech recognition noise Trainable.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Sai Zhang Michael D. Ernst Google Research University of Washington

Measuring the Structural Similarity of Semistructured Documents Using Entropy Sven Helmer University of London, Birkbeck VLDB’07, September 23-28, 2007,

Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.

STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.

Incremental Text Structuring with Hierarchical Ranking Erdong Chen Benjamin Snyder Regina Barzilay.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

1 Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Language Identification and Part-of-Speech Tagging

Efficient Inference on Sequence Segmentation Models

Raymond J. Mooney University of Texas at Austin

CSC 594 Topics in AI – Natural Language Processing

David Mareček and Zdeněk Žabokrtský

Dependency parsing spaCy and Stanford nndep

Artificial Intelligence 2004 Speech & Natural Language Processing

Presentation transcript:

Non-Monotonic Parsing of Fluent Umm I mean Disfluent Sentences Mohammad Sadegh Rasooli[Columbia University] Joel Tetreault[Yahoo Labs] This work conducted while both authors were at Nuance’s NLU Research Lab in Sunnyvale, CA

Mohammad Sadegh Rasooli and Joel Tetreault “Joint Parsing and Disfluency Detection in Linear Time.” EMNLP 2013 Mohammad Sadegh Rasooli and Joel Tetreault “Non-Monotonic Parsing of Fluent Umm I mean Disfluent Sentences.” EACL 2014

Motivation Two issues for spoken language processing:  ASR errors  Speech Disfluencies ~10% of the words in conversational speech are disfluent An extreme case of “noisy” input: Error propagation from these two errors can wreak havoc on downstream modules such as parsing and semantics

Disfluencies Three types  Filled pauses: e.g. uh, um  Discourse markers and parentheticals: e.g., I mean, you know  Reparandum (edited phrase) ReparandumRepair FPDM Interregnum I want a flight to Boston uh I mean to Denver

Processing Disfluent Sentences Most approaches deal solely with disfluency detection as a pre-processing step before parsing Serialized method of disfluency detection and then parsing can be slow… Why not parse disfluent sentences at the same time as detecting disfluencies?  Advantage: speed-up processing, especially for dialogue systems

Our Approach Dependency parsing and disfluency detection with high accuracy and processing speed Source: I want a flight to Boston uh I mean to Denver Output: I want a flight to Denver I want a flight to Boston uh I mean to Denver [ Root ] prep pobj root dobj subj det Real output of our system!

Our Work: EMNLP13 Our approach is based on arc-eager transition-based parsing [Nivre, 2004] Parsing is the process of choosing the best action at a particular state and buffer configuration Extend 4 actions {shift, reduce, left-arc, right-arc} with three additional actions:  IJ[w i..w j ]: interjections  DM[w i..w j ]: discourse markers  RP[w i..w j ]: reparandum

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] prep root dobj subj det Example StackBuffer IJ[7] [ Root 0 ] uh 7 I 8 mean 9 to 10 Denver 11 want 2 flight 4 to 5 Boston 6 pobj

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] prep root dobj subj det Example StackBuffer DM[8:9] [ Root 0 ] I 8 mean 9 to 10 Denver 11 want 2 flight 4 to 5 Boston 6 pobj

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] prep root dobj subj det Example StackBuffer RP[5:6] [ Root 0 ] to 10 Denver 11 want 2 flight 4 to 5 Boston 6 pobj

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] prep root dobj subj det Example StackBuffer Deleting words and dependencies [ Root 0 ] to 10 Denver 11 want 2 flight 4 to 5 Boston 6 pobj

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] root dobj subj det Example StackBuffer Right-arc: prep [ Root 0 ] to 10 Denver 11 want 2 flight 4

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] root dobj subj det Example StackBuffer Right-arc: pobj [ Root 0 ] Denver 11 want 2 flight 4 to 10 prep

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] root dobj subj det Example StackBuffer Reduce….. [ Root 0 ] prep pobj

The Cliffhanger Method parsed at a high accuracy but on task of disfluency detection was 1.1% off of Qian et al. ’13: [82.5 to 81.4] How can we improve disfluency detection performance? How can we make model faster and more compact to work in real-time SLU applications?

EACL 2014 Two extensions to prior work to achieve state-of-the-art disfluency detection performance Novel disfluency-focused features  EMNLP ’13 work used standard parse features for all classifiers Cascaded classifiers  Use series of nested classifiers for each action to improve speed and performance

Nested classifiers: two designs

New Features New disfluency-specific features Some of the prominent ones:  N-gram overlap  N-grams after a RP is done  Number of common words and POS tag sequences between reparandum candidate and repair  Distance features Different classifiers use different combinations of features

Joint Model: First Pass Augment arc-eager classifier with three additional actions: inaccurate and slow  Each new type should be modeled with its own set of features  More candidates  more memory fetch Instead of having a one level model, use a sequence of increasingly complex models that progressively filter space of possible outputs Each classifier has its own unique feature space

Evaluation: Disfluency Detection ModelDescriptionF-score [Miller and Schuler, 2008]Joint + PCFG Parsing30.6 [Lease and Johnson, 2006]Joint + PCFG Parsing62.4 [Kahn et al, 2005]TAG + LM rerank78.2 [Qian and Lui, 2013] – optIOB tagging82.5* (previous best) Flat ModelArc-Eager Parsing41.5 EMNLP ’13 – Two Classifiers (M2)Arc-Eager Parsing81.4 EACL ’14 – Two Classifiers (M2)Arc-Eager Parsing82.2 EACL ‘14 – Six Classifiers (M6)Arc-Eager Parsing82.6 * Also performed 10-fold x-val tests on SWB, M2 outperforms Qian et al. by 0.6 Corpus: parsed section of Switchboard (mrg) Conversion: T-surgeon and Penn2Malt Metrics: F-score of detecting reperandum Classifier: Average Structured Perceptron

Other Evaluations Speed: M6 is 4 times faster than M2 # Features: M6 has 50% fewer features than M2 Parse score: M6 is slightly better than M2 and within 2.5 points of “gold standard trees”

Conclusions State-of-the-art disfluency detection algorithm which also produces accurate dependency parse  New features + engineering  improved performance  Runs in linear time  very fast!  Incremental, so could be coupled with incremental speech and dialogue processing  Future work: acoustic features, beam search, etc. Special note: current approach surpassed by Honnibal et al. TACL to appear (84%)

Thanks! Mohammad S. Rasooli: Joel Tetreault:

Evaluation: Parsing ModelMS/Sentence# FeaturesUnlabeled Acc. F-score Flat Model1849M Classifiers1282M Classifiers361M Lower bound F-score: 70.2 Upper bound F-score: 90.2

Action: IJ IJ[i]: remove the first i words from the buffer and tag them as interjection (IJ) StackBuffer want flight to Bostonuh I mean to Denver IJ[1] StackBuffer want flight to BostonI mean to Denver

Action: DM DM[i]: remove the first i words from the buffer and tag them as discourse marker (DM) StackBuffer want flight to BostonI mean to Denver DM[2] StackBuffer want flight to Bostonto Denver

Action: RP RP[i:j]: from the words outside the buffer, remove the un-removed words i to j and tag them as reparandum (RP) StackBuffer want flight to Bostonto Denver RP[5:6] StackBuffer want flightto Denver

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] prep pobj root dobj subj det Example StackBuffer Shift [ Root 0 ] I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11

[ Root 0 ] Example StackBuffer Shift [ Root 0 ] I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11

[ Root 0 ] Example Stack Buffer Left-arc: subj [ Root 0 ] I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] subj Example StackBuffer Right-arc: root [ Root 0 ] want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] root subj Example StackBuffer Shift [ Root 0 ] a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 want 2

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] root subj Example StackBuffer left-arc: det [ Root 0 ] flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 want 2 a 3

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] root subj det Example StackBuffer Right-arc: dobj [ Root 0 ] flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 want 2

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] root dobj subj det Example StackBuffer Right-arc: prep [ Root 0 ] to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 want 2 flight 4

I 1 want 2 a 3 flight 4 to 5 Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 [ Root 0 ] prep root dobj subj det Example StackBuffer Right-arc: pobj [ Root 0 ] Boston 6 uh 7 I 8 mean 9 to 10 Denver 11 want 2 flight 4 to 5