IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

How to Author Teaching Files Draft Medical Imaging Resource Center.
© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
IAEA International Atomic Energy Agency United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) September 2013, Geneva.
Multilingual eLearning in LANGuage Engineering. Project Overview  Project span: Oct 2004 – Oct 2007  Kick-off meeting Oct  Project goals:
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features INIS Training Seminar 7-11 October 2013, Vienna Domenico.
IAEA International Atomic Energy Agency ICSTI 2013 Annual Members’ Meeting March 2013.
EDEN 2007 Naples, Italy LIFELONG LEARNING TEACHERS’ NEEDS IN VIRTUAL LEARNING ENVIRONMENTS Josep Maria Boneu 1, Maria Galofré 2, Julià Minguillón 2 1 Centre.
Future challenges of Corpus Linguistics Voltaire comment from earlier: we see things from our own perspective How to “harness the power” of text archives,
Technical Tips and Tricks for User Support Mike Gardner
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Metadata Server system software laboratory. Overview metadata service in Grid environment Grid environment Metadata server User query data search information.
Using Corpora in Linguistics
Lesson 2 Technology: Federated Searching Explained.
SQL Forms Engine Koifman Eran Egri Ozi Supervisor: Ilana David.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
IST Databases and DBMSs Todd S. Bacastow January 2005.
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
PowerPoint 2003 – Level 1 Computer Concepts Cathy Horwitz April 25, 2011.
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
The way Archiving should be!.  Many organisations have either no archiving policy or is severely fragmented.  Archiving is considered as just another.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Plenary meeting 2015 – Chania - Crete CASCADE Data Services Yusuf Yigini, Panos Panagos, Martha B. Dunbar Joint Research Centre - European Commission.
Class 1Intro to Databases Goals of this class Understand the architecture behind web database applications Gain a basic understanding of what relational.
Satish Ramanan April 16, AGENDA Context Why - Integrate Search with BI? How - do we get there? - Tool Strategy What - is in it for me ? - Outcomes.
More about Databases. Data Entry through Forms Table View (Data sheet view) is useful for data entry of new records But sometimes customization would.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
英 3B 戴偲婷. WConcord is a fast and easy to use concordancer for unlimited amounts of text. It allows the user to load multiple plain text files (.txt)
Project? Microdata? Say what? TRY Conference May 5, 2008 Suzette Giles, Ryerson University Laine Ruus, University of Toronto.
AnCoraPipe: A tool for multilevel annotation Manu Bertran, Bàrbara Soriano, Oriol Borrega, Marta Recasens Universitat de Barcelona CBA 2008.
The Public Face of TAIR User Interface Design Responsiveness to User Input.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Class 1Intro to Databases Goals of this class Understand the architecture behind web database applications Gain a basic understanding of what relational.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.
7. Data Import Export Lingma Acheson Department of Computer and Information Science IUPUI CSCI N207 Data Analysis Using Spreadsheets 1.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Introduction to Databases Angela Clark University of South Alabama.
10.1Retrieving a Database File – In the last chapter, we have created a database file and several tables for that database. – In order to view and modify.
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Edexcel OnCourse Databases Unit 9. Edexcel OnCourse Database Structure Presentation Unit 9Slide 2 What is a Database? Databases are everywhere! Student.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
With the support of the LPP programme of the European Union 1 This project has been funded with support from the European Commission. This publication.
IPC reform 2006: WIPO Products and Services for the new IPC Special seminar for patent information vendors World Intellectual Property Organization WIPO.
English-Lithuanian-English Lexicon Database Management System for MT Gintaras Barisevicius and Elvinas Cernys Kaunas University of Technology, Department.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. REDCap:
Etere Subtitling tool. Advantages Etere subtitle tool simplify the subtitle management It use all the resources of MAM as.
XAIRA is an XML Aware Indexing and Retrieval Architecture ● Developed from the British National Corpus Sara program, it provides: – platform-independent.
Random Logic l Forum.NET l Localization & Globalization Forum.NET ● May 29, 2006.
Indexing Goals: Store large files Support multiple search keys
Computational and Statistical Methods for Corpus Analysis: Overview
What’s New in Colectica 5.3 Part 1
DIGITAL LIBRARY.
More about Databases.
Experience with XML – based production of publications Case of « Statistical yearbook 2005 and 2006  » Guy Zacharias Centralisation et Diffusion STATEC.
The European Union case law corpus (EUCLCORP)
Planning and Storyboarding a Web Site
Using GOLD to Tracking L2 Development
A new web-based corpus management and analysis platform
Presentation transcript:

IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO (BARCELONA MEDIA) CARME COLOMINAS (UNIVERSITAT POMPEU FABRA) UCCTS, 2010 (Omskrik)

IAC CORPORA USE: REQUIREMENTS Its easy to build corpus from the web but difficult to search We need tools that allow frequency statistics, sorting results, linguistically-annotated sequences, etc.

Concordances software (MonoConc, Concordance) Databases Corpus query systems (ie.CQP, EMDROS) Useful but tough to learn Not useful for training as students spend too much time to learn the query system IAC CORPORA: SEARCHING METHODS

IAC CORPORA: INTERFACES (SEARCHING METHODS) DISADVANTAGES Learn more than 1 interface from the user point of view Programming and design interfaces background needed (external resources) If different attribute types are added > new design of the interface > new founding needed Usually, more expensive than other options ADVANTAGES User-friendly Not necessary training

IAC (ACCESS INTERFACE CORPUS) Translation Department (UPF) had many corpus (changing and growing constantly) IAC was born (developed by Barcelona Media and UPF) GOALS Monolingual and aligned corpora Fast and easy creation of interfaces for corpora One interface design for all the corpora

IAC INTERFACES Simple : Key Words Out of Context Advanced : Key Words In Context Statistics: KWIC and frequency-based results *** For corpus searching and indexation, IAC uses Corpus WorkBench (CWB) developed by IMS Stuttgart EXAMPLESIAC

IAC CORPUS FORMAT TheDetsg boyNounsg buysVerbsg pencilsNounpl Tabular xml for metadataVerticalizedxml for structural data

IAC CORPORA: INSERTING A CORPUS INTO IAC Upload the corpus (txt file) at the server Searching interface design through a graphical tool (included in IAC) according to the corpus type and the linguistic annotation added

IAC is a flexible and powerful tool that goes beyond current corpora interfaces limitations User-friendly tool Access to multiple corpus from the same platform No need of external developer or programming background Fast interface creation that can be modified easily IAC CONCLUSIONS

Thank you! Temporary web: webconsultaiactemporal.barcelonamedia.org

SOME EXAMPLES…

ADVANCED SEARCH To show the advanced search, we use an annotated corpus with translation. Let's look at examples of sequences with 1 or more words with syntax errors.

ADVANCED SEARCH

ALIGNED CORPORA WITH METADATA As example of aligned corpora, a Spanish > English corpus Can Could May Might Poder (verb) Our goal is to get examples of poder (Verb) translated as may or might in Economics texts.

ALIGNED CORPORA WITH METADATA

STATISTICS Statistics are useful to get quantitative results of sequences. Our goal in this case is to get quantitative results of the prepositions that follow the verb pensar (to think) in Spanish

STATISTICS

Back