a Terminological and Statistical Approach

Slides:



Advertisements
Similar presentations
The impact of Grey Literature in the web environment: A citation analysis using Google Scholar Rosa Di Cesare, Daniela Luzi, Roberta Ruggieri Consiglio.
Advertisements

Introduction to Computational Linguistics
Introduction to Computational Linguistics
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Introduction to Computational Linguistics Lecture 2.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
3-1 Chapter 3 Data and Knowledge Management
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Chapter 14 The Second Component: The Database.
1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.
Methodology Conceptual Database Design
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Tools and resources supporting the cultural tourism Istituto di Linguistica Computazionale “Antonio Zampolli” CNR - Pisa GL14: November 28, Sassolini.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
ELN – Natural Language Processing Giuseppe Attardi
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Gabriella Pardelli, Manuela Sassi, Sara Goggi Istituto di Linguistica Computazionale “Antonio Zampolli”, ILC Consiglio Nazionale delle Ricerche CNR- Pisa,
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Suléne Pilon & Danie Prinsloo Overview: Teaching and Training in South Africa 25 November 2008;
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Decision Support Systems
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
GREY LITERATURE AND COMPUTATIONAL LINGUISTICS: FROM PAPER TO NET Claudia Marzi, Gabriella Pardelli, Manuela Sassi Istituto di Linguistica Computazionale.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
GL15 Grey Literature Bratislava 2-3 december 2013 Industrial Philology: problems and techniques of data and archives preservation for future generations.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Digital University of Pisa Alessandro Lenci CoLing Lab – Laboratorio di Linguistica Computazionale Università di Pisa Aix-Marseille Université.
Multilingual Search Shibamouli Lahiri
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Introduction to Machine Translation
TERMINOLOGY AND TRANSLATION
How to publish in a format that enhances literature-based discovery?
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 3 prof. ssa Laura Liucci –
Knowledge Representation for Natural Language Understanding
Presentation transcript:

a Terminological and Statistical Approach Istituto di Linguistica Computazionale - Pisa Grey Literature for Natural Language Processing: a Terminological and Statistical Approach Laura Cignoni, Gabriella Pardelli, Manuela Sassi Istituto di Linguistica Computazionale (ILC) Consiglio Nazionale delle Ricerche (CNR) - Pisa, Italy

Natural Language Processing (NLP) and Computational Linguistics (CL) Natural Language Processing is a branch of computer science that studies computer systems for processing natural languages (Cunningham: 1999) Computational Linguistics is a branch of linguistics in which computational techniques and concepts are applied to the elucidation of linguistic and phonetic problems (Crystal: 1991) The two expressions are often used indifferently

AIM Our aim is to contribute to the creation of language resources of grey literature terms of the last decades. This can help prevent the disappearance of documents containing words that have undergone rapid changes and that represent the main anchors for information retrieval

Statistical representation The most significant old and new terms relative to grey literature in the field of natural language processing and other interrelated disciplines have been associated, highlighting the terminological changes that have taken place in the course of time

ROLE OF TERMINOLOGY As the queries are often incorrect, inappropriate, or simply far too general, it is necessary to integrate pre-existing or obsolete words and expressions used by specialists in the different domains to create a synonym relationship between the terms contained in the different NLP documents. In this way a term, even if dated and no longer in use, can become the key to enter the world of knowledge

GREY LITERATURE CORPUS Our grey literature corpus is composed of ca 13,000 records corresponding to the titles of papers presented at International Conferences in the field of natural language processing (1950 to June 2008)

SOURCES - ACL Anthology - LREC Conferences - Weaver Memorial The main sources for our Corpus include: - ACL Anthology - LREC Conferences - Weaver Memorial - Alpac Report - Conferences on Automatic Translation

Methodology The methodology used is the following: Search and saving of the most common single terms which are the object of this study Extraction of the contexts with year and abbreviation of the conference Generation of tables according to the chronological use of these terms Creation of charts

Words extracted from GL Corpus automated, automatic, automatically, automatically-extracted, automating, automation, automatique, automatisation, automatischen, automatisée, automatism, automatized, computability, computation, computationally, computational, computationally, computational-semantic, computations, compute, computed, computer, computer-aided, computer-assisted, computer-based, computerization, computerized, computer-mediated, computers, computes, computing, mechanical, mechanized, machina, machine, machine-aided, machine-guided, machine-induced, machine-learning, machine-mediated, machine-readable, machines, machine-tractable, machine-translation, electronic translation

Trend of Computational Linguistics (CL), Machine Translation (MT) and Natural Language Processing (NLP)

MOST FREQUENT CO-OCCURRENCES DS Dialogue System(s) IE Information Extraction IR Information Retrieval LG Language Generation LR Language Resource(s) ML  Machine Learning NE Named Entity PC  Parallel Corpora QA  Question Answering SD Spoken Dialogue(s) SR Recognition Speech WSD  Word Sense Disambiguation

Use of co-occurrences