UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Context-Enhanced Citation Sentiment Analysis Awais Athar & Simone Teufel.
CS4025: Advanced Information Extraction. Overview CS4025, Department of Computing Science, University of Aberdeen 2 Overview of aspects of IE and General.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Using corpora in SLA research: investigating word order Universidad Autónoma de Madrid WOSLAC project: 2 learner corpora CEDEL2WriCLE.
Introduction to RST Rhetorical Structure Theory Maite Taboada and Manfred Stede Simon Fraser University / Universität Potsdam Contact:
LING 581: Advanced Computational Linguistics Lecture Notes March 9th.
The user entered the query “What is the historical relation between Greek and Roma”. Here are the query’s results. The user clicked the topic “Roman copies.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
1/13 Parsing III Probabilistic Parsing and Conclusions.
1/17 Probabilistic Parsing … and some other approaches.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Narrative support for technical documents Formalising Rhetorical Structure Theory Professor Peter Henderson, Nishadi De Silva Declarative Systems and Software.
Top Ten Tips for teachers preparing students for the academic version of IELTS Sam McCarter Macmillan Online Conference 2013.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
ELN – Natural Language Processing Giuseppe Attardi
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Calculation BIM Curriculum 07. Topics  Calculation with BIM  List Types  Output.
PLATFORM INDEPENDENT SOFTWARE DEVELOPMENT MONITORING Mária Bieliková, Karol Rástočný, Eduard Kuric, et. al.
Information Extraction From Medical Records by Alexander Barsky.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010.
GSI 1.0 by A. Elmekati M. Zeghal Geotechnical System Identification Software Framework 8/20/07 Introducing.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Compiling and Analyzing Your Own Learner Corpus Xiaofei Lu CALPER 2012 Summer Workshop July 16, 2012.
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
LIN 6932 LIN6932 Topics in Computational Linguistics Lecture 11 Hana Filip.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Treebank Troubles Eckhard Bick Southern Denmark University
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail:
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Exercise Your your Library ® RefWorks: The Basics October 10, 2006.
IBM Research © Copyright IBM Corporation 2005 | A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture Youssef Drissi,
MedKAT Medical Knowledge Analysis Tool December 2009.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
LAB: Linguistics Annotated Bibliography – A searchable Portal for Normed Database Information Erin M. Buchanan, Kathrene D. Valentine, Marilee L. Teasley,
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
The Abstract: A Key Component of a Proposal/Publication/Thesis 15th Annual HuQAS Scientific Conference Dr Margaret Muturi (KU) Kenya Institute of Curriculum.
Automatic Writing Evaluation
Neural Machine Translation
Building a Discourse-Tagged Corpus in the Framework of RST
Computational and Statistical Methods for Corpus Analysis: Overview
Natural Language Processing (NLP)
Introduction to JSP Liu Haibin 12/09/2018.
Improving a Pipeline Architecture for Shallow Discourse Parsing
LING/C SC 581: Advanced Computational Linguistics
Code Analysis, Repository and Modelling for e-Neuroscience
A CASUAL CONTRIBUTOR’S LEARNING AID FOR DITA STRUCTURING
Natural Language Processing (NLP)
Code Analysis, Repository and Modelling for e-Neuroscience
Artificial Intelligence 2004 Speech & Natural Language Processing
Natural Language Processing (NLP)
Presentation transcript:

UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014

Outline UAM CorpusTool (O’Donnell, 2008)  Tool description  A short tutorial Annotating signals of coherence relations by UAM CorpusTool Feb 5, 2014 Discourse Research Group 2

UAM CorpusTool Created by Mick O’Donnell in 2008 Replaces prior software Systemic Coder which allowed coding of single documents at a single layer Available at Runs on Windows and Mac OS “… primarily aimed at the linguist or computational linguist who does not program, and would rather spend their time annotating text than learning how to use the system.” (O’Donnell, 2008: 13) Feb 5, 2014 Discourse Research Group 3

UAM CorpusTool Annotate documents  text type, writer characteristics, register, etc. Annotate segments  Tagging sections of a text by function (abstract, introduction, body, conclusion)  Tagging sentences (active/passive; simple/ complex) or clauses (relative/imperative/non-finite)  Semantic or pragmatic annotation (synonymy/antonymy; speech acts)  Tagging POS (noun, verbs, adjective) Automatic grammar analysis (English only) using Stanford parser Rhetorical structure annotation Feb 5, 2014 Discourse Research Group 4

Annotation in UAM CorpusTool Main Steps  Start a new project  Add (an) annotation layer(s) You can use some pre-built annotation schemes or design your own  Add file Import.txt files and Incorporate them  Annotate Feb 5, 2014 Discourse Research Group 5

Annotation in UAM CorpusTool Main Window Screenshot Feb 5, 2014 Discourse Research Group 6

Annotation in UAM CorpusTool Annotation Scheme Screenshots Feb 5, 2014 Discourse Research Group 7

Annotation in UAM CorpusTool Document Coding Screenshot Feb 5, 2014 Discourse Research Group 8

Annotation in UAM CorpusTool Segment Coding Screenshot Feb 5, 2014 Discourse Research Group 9

Other Components Search Autocode Statistics Explore Options Help Feb 5, 2014 Discourse Research Group 10

Annotating Signals of Coherence Relations Goal  Annotate signals of coherence relations Signals of coherence relations  E.g., John is tall, but Mary is short.  One straightforward signal: the discourse marker ‘but’  Also, there are two more signals Antonyms (tall ~ short) Parallel syntactic constructions (subj – copula – adj) Feb 5, 2014 Discourse Research Group 11

Annotating Signals of Coherence Relations  Annotate the RST Discourse Treebank (Carlson et al., 2002) Contains 385 documents from The Wall Street Journal articles Texts in those articles are annotated already for rhetorical (coherence) relations Approx. 22,000 discourse units and 17,000 relations in total Feb 5, 2014 Discourse Research Group 12

Annotating Signals of Coherence Relations  Requirements from an annotation tool Importability  Relevant data to be imported into the tool Annotation Scheme  Support for three-level hierarchical taxonomy Customizability  Easy access to the annotation scheme for editing Multiple Annotations  Two or more tags for a single element Convertibility  XML output Simplicity  No advanced computational knowledge  Graphical interface Feb 5, 2014 Discourse Research Group 13

Signalling Annotation by UAM CorpusTool  Problem with Importing data UAM CorpusTool supports RST annotation and can directly import RST files However, it cannot provide layered annotation on top of the RST-level structure  Solution to the problem Convert RST base files from LISP to text format Import the converted files This retains discourse structures and all relational information Feb 5, 2014 Discourse Research Group 14

Signalling Annotation by UAM CorpusTool  How did we do the rest? Feb 5, 2014 Discourse Research Group 15

Signalling Annotation by UAM CorpusTool  Annotation Scheme Screenshot Feb 5, 2014 Discourse Research Group 16

Signalling Annotation by UAM CorpusTool  Annotation Window Screenshot Feb 5, 2014 Discourse Research Group 17

References Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST Discourse Treebank, LDC2002T07 [Corpus]. Philadelphia, PA: Linguistic Data Consortium. O'Donnell, M. (2008). The UAM CorpusTool: Software for corpus annotation and exploration. Paper presented at the XXVI Congreso de AESLA, Almeria, Spain. Feb 5, 2014 Discourse Research Group 18

Thank You! Feb 5, 2014 Discourse Research Group 19