Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Coupled Sequence.

Slides:



Advertisements
Similar presentations
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
1 Manifold Alignment for Multitemporal Hyperspectral Image Classification H. Lexie Yang 1, Melba M. Crawford 2 School of Civil Engineering, Purdue University.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
We propose a successive convex matching method to detect actions in videos. The proposed scheme does not need foreground/background separation, works in.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
Part of speech (POS) tagging
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Learning Hidden Markov Model Structure for Information Extraction Kristie Seymour, Andrew McCullum, & Ronald Rosenfeld.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Using Entropy-Related Measures in Categorical Data Visualization  Jamal Alsakran The University of Jordan  Xiaoke Huang, Ye Zhao Kent State University.
Part E: conclusion and discussions. Topics in this talk Dependency parsing and supervised approaches Single model Graph-based; Transition-based; Easy-first;
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih-Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Language Identification and Part-of-Speech Tagging
Fast Preprocessing for Robust Face Sketch Synthesis
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Soft Cross-lingual Syntax Projection for Dependency Parsing
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
Presented by: Chang Jia As for: Pattern Recognition
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Presentation transcript:

Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Coupled Sequence Labeling on Heterogeneous Annotations (POS tagging)

An interesting problem in our mind The existence of multiple labeled data, with different annotation guidelines or formulations (heterogeneous annotations) How to effectively utilize such data? How to train a model with heterogeneous data?

An interesting problem in our mind CTB PD Train a better model?

Challenges How to capture the structure/tag correspondences between two guidelines? Usually context-dependent. Hard to represent with rules. The datasets (PD/CTB) are typically non- overlapping. Thus it is difficult to build a model to automatically learn the correspondences.

Previous work Guide-feature based methods (stacked learning) Word segmentation, POS tagging (Jiang+ 09; Sun & Wan 12; Jiang+12; Gao+ 14) Dependency parsing (Li+ 12) Constituent treebank conversion (Zhu+ 11; Jiang+ 13) …

Guide-feature based methods PD 中国 /n Tagger (PD) CTB 中国 /NR Tagger (CTB)

Guide-feature based methods PD 中国 /n Tagger (PD) CTB 中国 /NR (n) Tagger (CTB) Extra guide features

The problem with guide-feature based methods The methodology is not simple/elegant: twice training/decoding. Although very effective and robust for different problems very simple to implement. The source data is not fully exploited, and not directly contribute to training. The final target model does not directly learn from the source sentences. (Prof. Haifeng Wang, Baidu)

This work Directly learn from two non-overlapping datasets with heterogeneous annotations. Step 1: Bundle the tags from both schemes. (product) Step 2: Learn with ambiguous labeling CTB 中国 /NR PD 中国 /n A unified model: Tagger (CTB & PD)

The big picture PD 中国 /n Tagger (CTB+PD) Trained with ambiguous labeling CTB 中国 /NR CTB+PD (bundled tag space) 中国 /NR_n Test sentence: 中国 加油 Output: 中国 /NR_n 加油 /VV_v

Illustration of bundled tags

How to create bundled tags?

Mapping functions (Qiu+ 13) A set of bundled tags that include all possible symmetric mappings between two annotation schemes. NN => n vn an v NN NR NT <= n A mapping function defines a search/label space for our model.

Mapping functions (Qiu+ 13) Tight mapping function: 145 tags Automatic mapping function: 346 tags Relaxed mapping function: 179 tags Complete mapping function: 1,254 tags (33 × 38)

CTB5/PD now fall in the same bundled tag space

The coupled model in the bundled tag space

Features in the coupled model

Joint features Separate features

外交部 1 Foreign Ministry 发言人 2 Spokesman 答 3 answers NN_n NN_n^ 外交部 NN^ 外交部 n^ 外交部 Features in the coupled model Joint features Separate features

What is the benefit of this model? Both datasets are directly used for training. Can use both joint and separate features. Joint features capture the implicit correspondences between annotations. Separate features function as back-off/base features.

How to train the model?

Ambiguous labeling (partial annotation, natural annotation) Relaxed/weak supervision Multilingual transfer on dependency parsing (Tackstrom+ 13) Semi-supervised dependency parsing (Li+ 14a, 14b) Word segmentation (Jiang+ 13; Liu+14; Yang and Vozila 14)

Ambiguous labeling (a PD sentence) 外交部 1 Foreign Ministry 发言人 2 spokesman 答 3 answers 记者 4 reporters’ 问 5 questions ntnvnvn 外交部 1 Foreign Ministry 发言人 2 Spokesman 答 3 answers 记者 4 reporters’ 问 5 questions nt_NT nt_NN nt_NR nt_AD … n_NN n_NT n_NR … v_VV v_VC v_VE v_AD … n_NN n_NT n_NR … vn_NN vn_NR vn_VV …

Train with ambiguous labeling Maximize the likelihood of the data Maximize the probability of a set of paths Maximize the sum probability of all paths in the set Can be solved with the forward-backward algorithm

How to merge two training datasets? CTB 中国 /NR PD 中国 /n A unified model: Tagger (CTB & PD)

SGD training For each iteration PD 中国 /n CTB 中国 /NR Training data (N+M) for the current iteration (shuffled) Random sampling: N sentences Random sampling: M sentences

Previous work (Qiu+ 13) We are directly inspired by their work. Differences from our work Linear model with perceptron-like training Only explore separate features Approximate decoding Rely on manually designed mapping functions

Experiments Data statistics Newly annotated data for conversion evaluation (partial annotation: 20% most difficult tokens)

Effect of mapping functions The coupled CRF can learn the mapping, without linguistic inputs/constraints.

Effect of weighting CTB and PD Decide a balance in merging two training data

Effect of weighting CTB and PD Decide a balance in merging two training data

Final results on CTB5-test Slightly yet significant better than baselines

Feature study All features contributes

Conversion Accuracy (PD => CTB) Significantly better than baselines

Using Converted PD Slight accuracy decrease; much more efficient

Conclusions We propose a coupled CRF model for utilizing multiple heterogeneous labeled data. Can effectively learn the implicit mappings between annotations, without the need of a manually designed mapping function. Effective on both one-side POS tagging and POS conversion/transfer tasks. We have partially annotated 1,000 sentences for POS tag conversion evaluation.

Future directions Annotate more data with both CTB and PD tags, and investigate the coupled model with small amount of such annotation as extra training data. Propose a more principled and theoretically sound method to merge multiple training data. Efficiency issue Word segmentation guidelines also differ, which is ignored in this work

Thanks for your time! Questions? Codes, newly annotated data, and other resources are released at for non-commercial usage.

Work going on Our approach is also effective on the word segmentation task. Adapt our approach to dependency parsing.

Coupled model used for conversion Constrained decoding PD=>CTB conversion the search space is constrained by the PD-side tags.

The big picture (conversion) PD 中国 /n Tagger (CTB+PD) Trained with ambiguous labeling CTB 中国 /NR (n) CTB+PD (bundled tag space) 中国 /NR_n Test sentence: 中国 /?_n 加油 /?_v Output: 中国 /NR_n 加油 /VV_v

Data annotation

Domain adaptation Previous studies suggest that directly combining out-domain and in-domain training data does not lead to an optimal model.