Mark M Hall Information School / Computer Science Sheffield University Sheffield, UK EuropeanaTech 2011, Vienna, 4 th - 6 th October 2011 Aggregating Cultural.

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

ELIBRARY CURRICULUM EDITION The ultimate K-12 curriculum and reference solution.
Support.ebsco.com Nursing Reference Center Tutorial.
Effective Practices for Effective Programs National Service Resource Center Effective Practices Collection
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Topic models Source: Topic models, David Blei, MLSS 09.
Navigating Cultural Heritage Collections using Pathways N. Aletras, P.D. Clough, S. Fernando, N.Ford, P. Goodale, M.M. Hall, M. Stevenson University of.
Ask Sheffield FAQ Database Student Services Information Desk.
Resource Navigator Discovering, delivering and managing your information resources.
Microsoft ® Office Word 2007 Training Table of Contents I: Create an automatic TOC Neeginan Institute of Applied Technology GTR&O presents:
Home This training presentation is designed to introduce the Residency Management Suite to new users. This presentation covers the following topics: Login.
R2 Library Features and Functionality Overview. The R2 Library  The R2 Library is an electronic database that enables access to digital book content.
Click the Enter button to begin using the Compendium Click to continue.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Advanced Searching Engineering Village.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
CINAHL – Part 1 Cumulative Index to Nursing and Allied Health Literature.
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Sakai Overview ITS Teaching and Learning Interactive Aurora Collado January 10, 2008.
Access to Digital Heritage Resources using What, Where, When and Who Michael Buckland Electronic Cultural Atlas Initiative University of California, Berkeley.
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Interface for the University Library Catalogue Implementing Direct Manipulation Proposal 4.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
With Windows 7 Comprehensive© 2012 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Windows 7 Comprehensive.
Title of the Poster. “Digital library services and their impact with reference to a developing country: The case of the Faculty of Health Sciences library,
Improving the Catalogue Interface using Endeca Tito Sierra NCSU Libraries.
ENCYCLOPEDIA BRITANNICA ONLINE
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Internet Research Fourth Edition Unit C. Internet Research – Illustrated, Fourth Edition 2 Internet Research: Unit C Browsing Subject Guides.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
EBSCO Host Psychology and Behavioral Sciences Collection.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
EPSIplus Information Management Standards and Data Quality Thematic Meeting, Madrid, PSI Navigator Metadata experiences, future requirements.
Librarians vs. Automation Carolyn Weber Lucio Campanelli Will Hohyon Ryu.
OVERVIEW OF INSTRUCTIONAL MATERIALS. Instructional Materials In Classrooms, ‘Instructional Materials’ refers to: Curriculum Curricular Units Instructional.
Fifth Grade Library Skills Lesson Uses navigational tools of a website to find information.
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Welcome! Presenter: Michael Sinnott. Major Topics To Be Covered In This Presentation Section Header The Warehouse Locator Tracking “Eaches” in the Locator.
"Hyper Clumps, Mini Clumps and National Catalogues: resource discovery for the 21st century“ 11th November 2004, British Library, London Making sense of.
Research Paper English 12 If we knew what we were doing
Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Student Edition: Gale Info Trac Database Lesson Grades 9-12 High School Anita Cellucci.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Anatomy of Subject Results Search Screen. A subject search will result in.
ONLINE SERVICES FOR ACCESSING PHC MICRODATA POPULATION AND HOUSEHOLD CENSUS QUERY SYSTEM APPLICATION (1992, 2002 and 2011 CENSUSES)
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
University of Malta CSA4080: Topic 7 © Chris Staff 1 of 15 CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department.
MSG Reuse Catalog T.W. van den Berg 7 April 2010.
1 Guess the Covered Word Goal 1 EOC Review 2 Scientific Method A process that guides the search for answers to a question.
How "Next Generation" Are We? A Snapshot of the Current State of OPACs in U.S. and Canadian Academic Libraries Melissa A. Hofmann and Sharon Yang, Moore.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Hierarchical Clustering & Topic Models
System for Semi-automatic ontology construction
Matching Words with Pictures
Junghoo “John” Cho UCLA
Presentation transcript:

Mark M Hall Information School / Computer Science Sheffield University Sheffield, UK EuropeanaTech 2011, Vienna, 4 th - 6 th October 2011 Aggregating Cultural Heritage Collections using Automatically Generated Topic Hierarchies

Accessing aggregated collections Searching works If you know what you are looking for EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011

Accessing aggregated collections Recommendation works If you want to have an area of investigation suggested

Accessing aggregated collections What if you want neither of those options? What if you just want to browse around? What if you want to know what is available in the collection?

Providing collection overviews Use an existing thesaurus –Only parts of the aggregated collection will refer to it Use multiple thesauri –No unified overview Create a manual overview –Time/Resource consuming Use an automated approach –Not perfect, but good enough EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011

Automatic overview approaches Using statistical topic models –Creates an entirely new, custom hierarchy –Latent Dirichlet Allocation Flat Hierarchical Using a unifying thesaurus –Uses a known thesaurus –Links items to the thesaurus EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011

Latent Dirichlet Allocation (LDA) EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011 Topics Documents Words Inferred Observed

Flat LDA 1.Generate topics 2.Select image for each topic 3.Place in grid EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011

LDA Hierarchies Generate a small set of topics Assign each item to one topic For each of the topics generate a new set of sub- topics using the items assigned to that topic Repeat Display as a navigational grid EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011

Unifying Thesaurus Pick a thesaurus that covers all / most subjects in the aggregated collection Use Flat LDA topics to identify the most important keywords in the collection Map these keywords into the thesaurus Use the mapped keywords to link all items to thesaurus entries EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011

Conclusion It works It’s flexible It can deal with large collection sizes It is language neutral (except for the thesaurus work) It is not as good as a manual classification –But it is a lot faster and cheaper EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011

Contact Thank you for your attention. I have leaflets if you are interested in our work. See how it could work in our Hackathon contribution! EuropeanaTech 2011, Vienna, 4 th - 5 th October 2011