Enrichment and Structuring of Archival Description Metadata Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou** * Tilburg.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Intute Repository Search Project A showcase for UK research output Sophia Jones SHERPA October.
RSP Summer School14-16 September 2009 UK Institutional Repository Search: a collaborative project to showcase UK research output through advanced discovery.
PKP Conference, Vancouver, BC Canada, 9 July 2009 UK Institutional Repository Search: a collaborative project to showcase UK research.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Supporting Knowledge Sharing & Construction in Virtual Learning Communities Styliani Kleanthous Supervisor: Vania Dimitrova School of Computing, University.
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
Maurice Hendrix (Semi-)automatic authoring of AH.
Maurice Hendrix (Semi-)automatic authoring of AH.
Management, Population and Marketing of institutional repositories / open access journals Iryna Kuchma, eIFL Open Access program manager, eIFL.net Presented.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
The 5th annual UK Workshop on Computational Intelligence London, 5-7 September 2005 Department of Electronic & Electrical Engineering University College.
1 Relational Data Mining Applied to Virtual Engineering of Product Designs Monika Žáková 1, Filip Železný 1, Javier A. Garcia-Sedano 2, Cyril Masia Tissot.
Improved TF-IDF Ranker
Large-Scale Entity-Based Online Social Network Profile Linkage.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
ISHER: Integrated Social History Environment for Research Sophia Ananiadou National Centre for Text Mining School of Computer Science.
1 ESCRIRE: Embedded Structured Content Representation In Repositories Jérôme Euzenat INRIA Rhône-Alpes
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Information Retrieval Review
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Overview of Search Engines
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Digital Encoding What’s behind E-text Resources?.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
ACORNS Acquisition of COmmunication and RecogNition Skills The CareGiver corpus Toomas Altosaar, L. ten Bosch, G. Aimetti, C. Koniaris, K. Demuynck, H.
Digitization An Introduction to Digitization Projects and to Using the Montana Memory Project.
November 10, 2005DLF OAI Training Interoperability, OAI, and Shareable Metadata Sarah Shreeves University of Illinois at Urbana-Champaign OAI Best Practices.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
LREC 2008 Marrakech1 Clustering Related Terms with Definitions Scott Piao, John McNaught and Sophia Ananiadou
©2003 Paula Matuszek CSC 9010: Text Mining Applications Dr. Paula Matuszek (610)
Faceted browsing for ACL Anthology Praveen Bysani.
A centre of expertise in digital information managementwww.ukoln.ac.uk DCMI Affiliates: Implications for Institutions Rosemary Russell UKOLN University.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
Lifecycle Metadata for Digital Objects September 4, 2002 Overall framework: OZ meets WC3.
Metadata-based Discovery: Experience in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK A centre of.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
11 Thoughts on STS regarding Machine Reading Ralph Weischedel 12 March 2012.
Alexandria Digital Library ADL Metadata Architecture Greg Janée.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Clustering of Web pages
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Applications of Text Mining
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Social Knowledge Mining
Peggy van der Kreeft Deutsche Welle
Metadata supported full-text search in a web archive
Presentation transcript:

Enrichment and Structuring of Archival Description Metadata Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou** * Tilburg Centre for Cognition & Communication The University of Tilburg, NL ** National Centre for Text Mining The University of Manchester, UK

ACL/LaTeCH-Portland, June 24th 2011 Research on Metadata Developing standards: – collection specific (e.g. EAD, MARC21) – cross-collection (e.g. Dublin Core) Provide mappings: – across schemas – ontologies (ad hoc or standard CDOC-CRM) Discard metadata for IR (Koolen et al., 2007) Exploit metadata for IR (Zhang&Kamps, 2009)

ACL/LaTeCH-Portland, June 24th 2011 The IISH EAD dataset EAD: XML standard for encoding archival descriptions Challenges: – Variety of languages used – Varying type and amount of information – Style: enumerations, lists, incomplete sentences

ACL/LaTeCH-Portland, June 24th 2011 Motivation & Objectives Improved search and retrieval – content-based metadata document clustering – content-based/semantic search – support exploratory search – link across collections, metadata formats & institutions – create unified metadata knowledge resources

ACL/LaTeCH-Portland, June 24th 2011 Method overview

ACL/LaTeCH-Portland, June 24th 2011 Method overview

ACL/LaTeCH-Portland, June 24th 2011 Pre-processing EAD/XML element selection & extraction – EAD elements containing free-text & archive content information Language identification (n-gram method) – Identifier trained on Europarl corpus Text snippets length: ~20 tokens

ACL/LaTeCH-Portland, June 24th 2011 Snippet length based on language

ACL/LaTeCH-Portland, June 24th 2011 Method overview

ACL/LaTeCH-Portland, June 24th 2011 Method overview

ACL/LaTeCH-Portland, June 24th 2011 Enrichment & Structuring Topic detection: Automatic term recognition using C-value method Agglomerative hierarchical term clustering: – complete, single & average linkage criteria – document co-occurence & lexical similarity measures

ACL/LaTeCH-Portland, June 24th 2011 Method overview

ACL/LaTeCH-Portland, June 24th 2011 Method overview

ACL/LaTeCH-Portland, June 24th 2011 Term results (auto eval)

ACL/LaTeCH-Portland, June 24th 2011 Results C-value best performance: candidates that occur as non-nested at least once Average linkage criterion & Doc Co- occurence: provide broader and richer hierarchies

Questions? Check-out our poster!