Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel.

Slides:



Advertisements
Similar presentations
Support.ebsco.com EBSCOhost Digital Archives Viewer Tutorial.
Advertisements

Reference Model Ideas. Geospatial Semantics and Ontology Reference Model Metadata Data Sources Underlying Ontologies Semantic and Ontology Services Ontology.
INFORMATION SOLUTIONS Citation Analysis Reports. Copyright 2005 Thomson Scientific 2 INFORMATION SOLUTIONS Provide highly customized datasets based on.
The North American Carbon Program Google Earth Collection Peter C. Griffith, NACP Coordinator; Lisa E. Wilcox; Amy L. Morrell, NACP Web Group Organization:
Opinion Mapping Travelblogs Efthymios Drymonas Alexandros Efentakis Dieter Pfoser Research Center Athena Institute for the Management of Information Systems.
California Digital Library Applications in the Real World: The Counting California Experience with the DDI Patricia Cruse Ilona Einowski Juri Stratford.
Oregon Spatial Data Library Partnership Metadata Training OU Knight Library Eugene, Oregon December 3, 2009 Kuuipo Walsh Institute for Natural Resources.
IS 466 ADVANCED TOPICS IN INFORMATION SYSTEMS LECTURER : NOUF ALMUJALLY 20 – 11 – 2011 College Of Computer Science and Information, Information Systems.
A New Learning Tools. Topic Maps is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information.
Research Paper Presentation – CS572 Summer 2011 Presented by Donghee Sung Paper by Paul Clough (University of Sheffield Western Bank)
1 The GeoParser. 2 Overview What is a geoparser? –Software for the automated extraction of place names from text Why would you want one? –Document characterisation.
Welcome to EDINA Digimap Digimap is an EDINA service offering online access to a range of spatial data. It is authenticated using Athens and is available.
Welcome to EDINA Digimap Digimap is an EDINA service offering online access to a range of spatial data. It is authenticated using the UK Federation and.
Introducing Symposia : “ The digital repository that thinks like a librarian”
1 Find Books, Audiovisual Materials, and Periodical Articles from the Library Dr. Jun Wang Professor of Library & Information Studies Coordinator of Bibliographic.
Retrieving Location-based Data on the Web Andrei Tabarcea,
Databases & Data Warehouses Chapter 3 Database Processing.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
╬Cory Vardaman Project Manager ╬Joe Clark Assistant Manager ╬Lisa Albanese GIS Technician, Web Master ╬Ethan Roberts GIS Analyst, Graphics Design.
Classroom User Training June 29, 2005 Presented by:
Metadata Understanding the Value and Importance of Proper Data Documentation Exercise 2 Reading a Metadata File Exercise 3 Using the Workbook Exercise.
1 Chuck Koscher, CrossRef New Developments Relating to Linking Metadata Metadata Practices on the Cutting Edge May 20, 2004 Chuck Koscher Technology Director,
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Search Server Index Search Server Index Somewhere There’s a PLACE for Us: Linking Fedora Digital Collections and Open Geoportal Eleta Exline, Thelma Thompson,
Next generation library catalogs and the integration of gazetteer information for geographical research Julie Sweetkind-Singer Assistant Director of Geospatial,
Improving user engagement in a data repository with web analytics LITA Forum November 7, 2013 Heather CoatesSummer Durrant Digital Scholarship & Data Management.
Extracting metadata for spatially- aware information retrieval on the internet Pual Clough Presented by Ali Khodaei CS 572.
Let VRS Work for You! ELUNA Conference 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
EBSCOhost 2.0 GOLD/GALILEO ANNUAL USERS GROUP CONFERENCE August 1, 2008.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.
Introduction of Geoprocessing Topic 7a 4/10/2007.
Enhancing the Web With End-User Programming Tak Yeon Lee, Ben Bederson.
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of using ePADD Software.
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
VIVO and Scholarly Repositories: Synergistic Opportunities.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
Rob Walker The INSPIRE metadata regulations and quality issues – a user view Rob Walker Association for Geographic Information, London.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
 LeaseSync Land Management System Overview By Micro Applications Corp.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
5 Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Other Administrator Tasks.
Collaborative Query Previews in Digital Libraries Lin Fu, Dion Goh, Schubert Foo Division of Information Studies School of Communication and Information.
Big Data Using Big Data for Cultures and Communities Jeremy Reffin Simon Wibberley CASM, University of Sussex Carl Miller CASM, Demos July 2014.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
Smart Linking With SFX SFX Training, Intranet Internet range of authorities, technologies A&I e-print FTXT OPAC FTXT A&I Electronic Scholarly Information.
CONTENTdm A proven solution September A complete digital collection management software solution Stores, manages and provides access for all digital.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Discovery and Metadata March 9, 2004 John Weatherley
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Kathy Weimer Coordinator of Map and GIS Services and Collections
The IPT user interface and data quality tools
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
Accessing Spatial Information from MaineDOT
Martin Moyle Digital Curation Manager UCL Library Services, UK
EBSCO Discovery Service (EDS)
Federated & Meta Search
Introduction to Google Maps
EBSCO Discovery Service (EDS)
Integrated Collaborative Information Systems
EBSCOhost Digital Archives Viewer
Presentation transcript:

Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery James Creel Katherine H. Weimer TCDL May 7, 2013

Geospatial Information Retrieval Challenges How to utilize locations represented in text? –20% of web queries have a geographic relation (Ahlers) Traditional catalog subjects and keywords do not suffice Location information is increasingly in demand (Reid) Ex total ETDs in 2005 –300 included locations (< 30%) –130 contained international locations (> 10%)

What about a Visual Search? Searching collections with a map interface? –Visual representation of research –Enable serendipitous cross-disciplinary collaborations and networking –Enhance access to the collection

Map Prototype

2011 – Geoparsing Work Begins 1.Overarching goal is to automate geocoding 2.Find toponyms in scholarly documents 3.Look up toponyms in a gazetteer 4.Disambiguate homonymous toponyms 5.Obtain geographic coordinates from gazetter 6.Encode coordinates in item surrogates for map-based view 7.Create map with link to original text

Desired Map Functionality 1.Base map: use Google Maps and other available interfaces 2.Cluster placemarks according to zoom level 3.List the displayed placemarks 4.Dropdown menu for countries and states in the US 5.Dropdown menu for departments grouped by college 1.Selection of multiple departments in more than one college 2.If selecting the college, then select all departments within the college 6.Search by author 7.Time range slider (by year) 8.Use the Web-friendly University Brand color palette

Geocoding with KML files The KML file with locations includes: –Author –Title –Academic department –Advisor –Degree level –Year –Place –Keywords –Url to document Info box displays: –Author –Title –Academic department –Degree level –Year –Place –Url to document

Beta Version of Map: Showing Google Street Maps

Clustering Mechanism

User clicks on Point of Interest  Title and Metadata Appear with Link to Text

Automated Process / Geoparser Geoparsing addresses two key problems: 1)Name extraction 2)Name disambiguation Document text Extracted names Disambiguated names Geospatial metadata

Geoparser: Comparable Models Edinburgh Geoparser –Grover, et. al. used OCR with historic records, provided the GeoCrossWalk gazetteer DIGMAP Geoparser –Martins, et al. used originally for DIGMAP digital library of historic maps

Geoparser: Setting DSpace 1.7 supports curation tasks –Custom Java programs Our instantiation: –Suggest New Metadata –Generate KML

Geoparser Workflow

Geoparser: Pre-Processing –DSpace filter-media script extracts plain-text from PDFs. –Suggest New Metadata curation task Partitions the document into sections using regular expressions Excludes sections containing non-topical toponyms (author-affiliation locations, conference locations, etc.)

Geoparser: Name Extraction ‘Named Entity Recognition’ or NER –Various open-source tools/training data Current version uses Apache OpenNLP or Stanford NER Classifies substrings of the text as names Toponym occurrences are recorded in context and counted

Name Disambiguation Requires reliable data- or knowledge-base We employ the Geonames dataset –Conglomeration of International gazetteers Includes GNIS (USGS) Several complimentary methods –Rule-based –Heuristic –Statistical

Heuristics: Overview Various heuristics can help indicate the probable referent of a given toponym Other heuristics can help pick out false positives from the classifiers Heuristics are based on context-clues in the text or on general observations about human discourse

Heuristics: Context-based One document, one sense Unambiguous extended names i.e. “Paris, France” Favor locations close to other mentioned locations Favor locations contained in other mentioned locations Favor locations of mentioned feature types

Heuristics: Generalized Favor higher-level administrative units (countries, states, cities) Favor locations of larger population

Heuristics: Application Heuristics - grouped into refinement iterations and then applied sequentially Resolve obvious cases first in order to provide better data for subsequent heuristics

Geoparser Evaluation Comparison of human annotations to geoparser output Precision/Recall of name extraction Accuracy of name disambiguation

Evaluator Workflow

Future Work Explore statistical disambiguation Explore relevance of toponyms to the subject matter Expand to TDL collections Expand to other digital collections or collection types, even the library catalog? Much more work to be done!

References Ahlers & Boll, “Location Based Web Search” in The Geospatial Web (London: Springer 2007) Apache OpenNLP. DigMap. Leidner, Jochen L. “Toponym Resolution in Text” (Univ. Edinburgh 2007) Jenny Rose Finkel, Trond Grenager, and Christopher Manning “Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling” (ACL 2005) Reid, James. “GeoXwalk – A Gazetteer Server and Service for UK Academia” (ECDL 2003)

Contact: –James Creel –Kathy Weimer