Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen Logos Bible Software SemTech 2010 Slides:

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
OUP in support of digital libraries Main objectives Historical Context Why Xml ? Librarian Resource Centre Oxford Index Marzena Giers Fidler 5 th June.
NERC DataGrid Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid Vocabulary Server Description.
Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Of 17 course outline. of 17 marek reformat ecerf building, w ece 627, winter'13.
Mining the web to improve semantic-based multimedia search and digital libraries
A New Learning Tools. Topic Maps is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
A Registry for controlled vocabularies at the Library of Congress
Chapter 6 Building Vocabulary This multimedia product and its contents are protected under copyright law. The following are prohibited by law: –any public.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
SciFinder Web Version Pootorn R. Book Promotion & Service Co.,Ltd. Thailand.
Thesaurusmanagement Quickstart Introduction. What are controlled vocabularies? organized arrangement of words and phrases used to index content and/or.
ATLA Religion Database The ATLA Religion Database is the premier index to journal articles, book reviews, and collections of essays in all fields of religion.
Context and Prosopography: Putting the 'Archives' Into LOD-LAM Corey A Harper SAA MDOR
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
Reference Skills. They will help you! Know WHERE to find information Useful for ALL SUBJECTS *You need to know where to look!
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Live Search Books University of Toronto – Scholar’s Portal Forum 2007 January 2007.
WISER: Workshops in Information Skills and Electronic Resources with Kerry Weller, Reader Services Librarian, English Faculty Library
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
CAB Abstracts on CAB Direct Chris Ison International Training Manager.
Agropedia IIT Kanpur The Knowledge & Interaction Hub for Indian Agriculture (
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Grade 8 – Writing Standards Text Types and Purposes (1b) Write arguments to support claims with clear reasons and relevant evidence. Support claim(s) with.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
Digital libraries and web- based information systems Mohsen Kamyar.
OWL Representing Information Using the Web Ontology Language.
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
EConnect WP1 & semantic issues VU members –Guus Schreiber, Antoine Isaac, Jacco van Ossenbruggen, Jan Wielemaker.
The Indexer’s Legacy: Promoting Access to a Million Books Michael Huggett Edie Rasmussen ICDL 2010.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Site Technology TOI Fest Q Celebration From Keyword-based Search to Semantic Search, How Big Data Enables That?
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
Emerging Approaches to Subject Information Terry Willan Talis CIG Conference University of Strathclyde 4.
Semantic Web COMS 6135 Class Presentation Jian Pan Department of Computer Science Columbia University Web Enhanced Information Management.
IRS Tax Map Electronic Research Tool David Brown Internal Revenue Service Media and Publications Division David Brown Internal Revenue Service Media and.
Credo Online Reference. What is Credo Online Reference Service? A research starting point that brings together both the best in citable reference e-book.
SNOMED CT Vendor Introduction 27 th October :30 (CET) Implementation Special Interest Group Tom Seabury IHTSDO.
“So, Brothers”: Pauline Use of the Vocative Steve Runge Sean Boisen Biblical Greek Language and Linguistics.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Innovative Novartis Knowledge Center
Food and Agriculture Organization of the UN GILW Library and Documentation Systems Division Food, Nutrition and Agriculture Ontology Portal.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Research on Knowledge Element Relation and Knowledge Service for Agricultural Literature Resource Xie nengfu; Sun wei and Zhang xuefu 3rd April 2017.
Integrating Data for Archaeology
ece 720 intelligent web: ontology and beyond
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
knowledge organization for a food secure world
Taxonomies, Lexicons and Organizing Knowledge
PREMIS Tools and Services
Malte Dreyer – Matthias Razum
PubMed Database Interface Part A (Basic Course Module 4)
Presentation transcript:

Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen Logos Bible Software SemTech 2010 Slides:

Outline Introduce the Logos digital library Logos Controlled Vocabulary (LCV) –What it is –How do we use it –What’s interesting about it Next steps

Who Am I? 19 years with BBN Technologies –Information extraction, human language technology –Scientist, technology manager 3+ years with Logos Bible Software –Senior Information Architect –Manager of Design & Editorial Dept. –Academic Products Manager

The Importance of the Bible The most widely distributed book –~83M per year worldwide The most widely translated work –> 2000 languages –50 languages at Spans 1000s of years of ancient history

Logos Bible Software High-end desktop digital library –> 10k titles –> 100k users in 180 countries –Extensive cross-indexing and hyper linking –Resources in a dozen languages –Windows/Mac/iPhone/mobile Leading publisher and developer of digital resources for Bible study Original Language Tools Commentaries Dictionaries, Maps, Reference Works Other Related Texts Bibles

Network Effects Rich markup and original content Information integration

Added Value Strategy Domain-specific focus Task-oriented guides that automate research Integrated tools and content Unique digital assets that integrate information and provide answers

Controlled Vocabularies Organized system for labeling content –Using English terms Consistent representation of content More effective search

Logos Controlled Vocabulary (LCV) Domain-specific (Biblical studies) Semantic organization of reference book content – not just terms Mitigates problems of ambiguity, homographs, synonyms, spelling variation

LCV Value Proposition Recognizes key terms in the knowledge domain Provides alternate search terms and query expansion Supports user-created content and reading lists Integrates reference content Provides semantic “glue” for the library

Example: Ambiguity

Example: Homographs

Example: Variation

Scope

TimBL's rules for Linked Data:Linked Data Use URIs to identify things (= Identity) –Use HTTP URIs so people can look things up Provide useful information in a standard format when someone references a URI (=Utility) Include links to other URIs (= Relationships)

LCV as Linked Data: Prisca Id:Prisca_PersonLabel:“Prisca” Type:PersonName:True PrefLabel:“Prisca”Extra- biblical: False AltLabel:“Priscilla” Entities:agent:Prisca.1 Articles:Anchor.PRISCAPERSON, Tyndale.L4559, … Topics: Wikipedia:Priscilla and Aquila Identity Utility Relationships

LCV as Linked Data: Deceit Id:deceitLabel:“Deceit” Type:Name:False PrefLabel:“Deceit”Extra- biblical: False AltLabel:“Deception”, “Deceitful”, “Deceive” Articles:ISBE.DECEIT, NBD.R494, … Topics: Identity Utility Relationships

Example Semantics lcvinst:Aaron_Person rdf:type skos:Concept ; skos:prefLabel ; lcv:isname "true"^^xsd:boolean ; lcv:termType lcv:Person ; skos:related lcvinst:aaronsRod ; lcv:bkentity bk:Aaron. res:anch.AARONPERSON rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person. res:TYNBIBDCT.L1 rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person. res:isbe.AARON rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person.

Semantic Inter-relationships Concrete Conceptua l

LCV Development Developed by merging content from 7 Bible dictionaries –Extract headwords –Do automatic alignment (conservative) –Review manually Reduced > 40k concepts down to ~10k Initial (>40k) Automati c (23k) Manual (10k)

LCV Development Continues Additional resources suggest new concepts: –Archaeol. Dict. of the Holy Land: 90/547 (16%) Mostly very specific locations (%EinSamiya_Place) –Nelson's Illus. Bible Dictionary: 200/4833 (4%) –Harper's Bible Dictionary: 81/2962 (3%) Adding alternate terms Subject areas for further expansion: –Individuals from church history –Specialized theological concepts

Use Case: Improved Topic Search Link to the same concept regardless of how originally labeled Provide consistent semantics for content Suggest alternate concepts for the same term Provide query expansions for full text search

Use Case: Information Discovery Automatically link –Reference to concepts –Concept to related concepts –Concept to references

Text Mining: Reference to Concepts Aggregate reference counts –Each article votes on most likely references –Each concept votes on the most likely concepts for a reference Reverse index from reference to concepts Estimates should improve with more content

Text Mining: Related Concepts Extract and aggregate key terms Cluster documents

Conclusions Controlled vocabulary coupled with parallel content Platform for text mining, user contribution Future Work –Continue adding resources –Additional content extraction –Add hierarchy (LCSH, WordNet) –Crowdsourcing

Resources A Controlled Vocabulary for Biblical Studies (Boisen). Presentation at BibleTech:2010.A Controlled Vocabulary for Biblical Studies Domain-Specific Tools to Add Value to E-Books (Pritchett). Presentation at O'Reilly Tools of Change for Publishing Conference 2010.Domain-Specific Tools to Add Value to E-Books Deploying Semantic Technologies for Digital Publishing (Boisen). Presentation at SemTech:2007.Deploying Semantic Technologies for Digital Publishing