CL Research ACL 2014 1 Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,

Slides:



Advertisements
Similar presentations
Corpus Linguistics Richard Xiao
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Computational Lexicography Frank Van Eynde Centre for Computational Linguistics.
1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.
Statistical NLP: Lecture 3
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Modeling Semantic Relations Expressed by Prepositions Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Semantic Frames: FrameNet. What is FrameNet? FrameNet is an ongoing project at the International Computer Science Institute located in Berkeley California.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Corpus Linguistics Lexicography. Questions for lexicography in corpus linguistics How common are different words? How common are the different senese.
Research methods in corpus linguistics Xiaofei Lu.
Memory Strategy – Using Mental Images
Mining and Summarizing Customer Reviews
ELN – Natural Language Processing Giuseppe Attardi
The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English Ryo Nagata et al. Hyogo University of Teacher Education ACL 2006.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Adobe Dreamweaver CS3 Revealed CHAPTER THREE: WORKING WITH TEXT AND IMAGES.
A Language Independent Method for Question Classification COLING 2004.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Corpus-based generation of suggestions for correcting student errors Paper presented at AsiaLex August 2009 Richard Watson Todd KMUTT ©2009 Richard Watson.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Macromedia Dreamweaver 8 Revealed AND GRAPHICS WORKING WITH TEXT.
A New Multi-document Summarization System Yi Guo and Gorge Stylios Heriot-Watt University, Scotland, U.K. (DUC2003)
Supertagging CMSC Natural Language Processing January 31, 2006.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Automatic Ontology Extraction Miloš Husák RASLAN 2010.
Automatic Writing Evaluation
An Introduction to the Government and Binding Theory
Statistical NLP: Lecture 3
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Statistical NLP: Lecture 9
Extracting Semantic Concept Relations
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus, Maryland USA

CL Research ACL PDEP Objectives A new lexical resource for the study of preposition behavior An environment for characterizing all English prepositions – A detailed examination of their prototypical syntagmatic patterns – Based on representative corpus instances (47285 sentences from the BNC) – Characterizing the preposition objects (complements) and the point of attachment (governor) – Within a semantic framework of traditional English grammar (Quirk et al., 1985)

PDEP Motivations Need for a representative corpus of prepositions – Results from SemEval 2007 preposition WSD did not generalize – Decline from 88.4 percent to 39.4 percent accuracy – Results skewed by reliance on FrameNet instances Value of prepositional phrases in joint modeling with verbs for semantic role labeling Put prepositions into consistent theoretical lexicographic framework – Follow principles of Hanks theory of norms and exploitations – Interface with Pattern Dictionary of English Verbs (PDEV) with corpus pattern analysis (CPA) CL Research ACL

PDEP Design Considerations Provide an interface to facilitate examination of corpus evidence – Modeled behavior and used code from CPA of PDEV – Integrated tagging of corpus instances with PDEP patterns (senses) – Add capability to examine features of preposition behavior Expand TPP fields to capture syntactic and semantic features of preposition use – Generate dependency parses for corpus instances (Tratz) – Exploit semantic and syntactic features (including WordNet) – Add other resources (FrameNet and VerbNet) Add capability for analysis of preposition classes CL Research ACL

5 The Preposition and Pattern Inventories Preposition inventory – Lists 304 single-word and phrasal prepositions – Number of patterns for each – Number of instances from three corpora (FrameNet, Oxford English Corpus, TPP), with number tagged in TPP – Target size for TPP instances was 250 Pattern list for each preposition – Each sense shows sense number, number of instances in each corpus, syntagmatic pattern, and primary implicature Pattern details (pattern box) – Syntactic and semantic properties of the complement and the governor (TPP data, feature selectors, ontological categories) – Semantic class, semantic type, cluster (Tratz), and relation (Srikumar) – Syntactic function and meaning (from Quirk) – Substitutable prepositions

Preposition Inventory (Fragment) Preposition Pattern List (below) CL Research ACL

Preposition Pattern Details CL Research ACL

Tagging Process Starts with display of TPP instances (sentences) not yet tagged Examination of complement and governor features (including WordNet, FrameNet, and VerbNet) Comparison with existing pattern (sense) inventory Selecting instances and tagging with a sense – Adding senses as needed – Identifying ill-formed instances Further analysis of instances with tagged senses to characterize behavior CL Research ACL

Feature Examination Word-Finding Rules – Governor (verb or head to the left, head to the left, verb to the left, word to the left, governor) – Complement (syntactic preposition complement, heuristic preposition complement) Feature Extraction Rules – Word class, part of speech, lemma, word, WN lexical name, WN synonyms, WN hypernyms, whether capitalized, affixes CL Research ACL

Examining FrameNet Lexical Units and VerbNet Classes CL Research ACL

Selecting and Tagging Instances CL Research ACL

Preposition Class Analyses Corpus evidence and tagging provides a check on class assignments (and reveals past inconsistencies) Substitutable prepositions (Yuret) and collapsing semantically- related senses across prepositions (Srikumar & Roth) – E.g. for temporal class, 21 senses of 14 prepositions in Srikumar and 62 senses in 50 prepositions in PDEP Quirk paragraphs provide organizing principle – PDEP enables bottom-up approach, building details for an individual sense – Proceeds by organizing nuances across prepositions – Generalizes complement and governor behavior for class Provides basis for enhanced cross-preposition analysis CL Research ACL

Future Developments Completion of tagging (now at 23%) Identifying complement and governor in sentence display Additional download options – Access to PHP scripts – Download of full, up-to-date data sets Collocation analysis – Processing of instances with USAS tagger (UCREL semantic analysis system CL Research ACL

Evaluation of Preposition Data Essential to drive future developments on utility of PDEP data – All sentences available in SemEval lexical sample format – PDEP data available in online Javascript Object Notation (JSON) Use of data in SemEval tasks (TempEval, SpaceEval, CauseEval?) Potential SemEval 2016 task on dictionary entry building (modeled on SemEval 2015 CPA task) CL Research ACL

NLP Community Involvement Volunteers to help tagging and preposition characterization What do you want? Suggestions for incorporation of additional resources Critiques of existing structures Suggestions for further analyses CL Research ACL

CL Research ACL Summary and Conclusions The Pattern Dictionary of English Prepositions (PDEP) is a new lexical resource for the study of preposition behavior – Provides sentences, as a representative sample – All sentences dependency-parsed, with features to describe preposition behavior PDEP has been designed to explore and download any of the available data