Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen Logos Bible Software SemTech 2010 Slides:

Similar presentations


Presentation on theme: "Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen Logos Bible Software SemTech 2010 Slides:"— Presentation transcript:

1 Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides: http://semanticbible.com/other/talks/2010/semtech/lcv.html http://semanticbible.com/other/talks/2010/semtech/lcv.html

2 Outline Introduce the Logos digital library Logos Controlled Vocabulary (LCV) –What it is –How do we use it –What’s interesting about it Next steps

3 Who Am I? 19 years with BBN Technologies –Information extraction, human language technology –Scientist, technology manager 3+ years with Logos Bible Software –Senior Information Architect –Manager of Design & Editorial Dept. –Academic Products Manager

4 The Importance of the Bible The most widely distributed book –~83M per year worldwide The most widely translated work –> 2000 languages –50 languages at www.biblegateway.comwww.biblegateway.com Spans 1000s of years of ancient history

5 Logos Bible Software High-end desktop digital library –> 10k titles –> 100k users in 180 countries –Extensive cross-indexing and hyper linking –Resources in a dozen languages –Windows/Mac/iPhone/mobile Leading publisher and developer of digital resources for Bible study http://logos.com Original Language Tools Commentaries Dictionaries, Maps, Reference Works Other Related Texts Bibles

6 Network Effects Rich markup and original content Information integration

7 Added Value Strategy Domain-specific focus Task-oriented guides that automate research Integrated tools and content Unique digital assets that integrate information and provide answers

8 Controlled Vocabularies Organized system for labeling content –Using English terms Consistent representation of content More effective search

9 Logos Controlled Vocabulary (LCV) Domain-specific (Biblical studies) Semantic organization of reference book content – not just terms Mitigates problems of ambiguity, homographs, synonyms, spelling variation

10 LCV Value Proposition Recognizes key terms in the knowledge domain Provides alternate search terms and query expansion Supports user-created content and reading lists Integrates reference content Provides semantic “glue” for the library

11 Example: Ambiguity

12 Example: Homographs

13 Example: Variation

14 Scope

15 TimBL's rules for Linked Data:Linked Data Use URIs to identify things (= Identity) –Use HTTP URIs so people can look things up Provide useful information in a standard format when someone references a URI (=Utility) Include links to other URIs (= Relationships)

16 LCV as Linked Data: Prisca Id:Prisca_PersonLabel:“Prisca” Type:PersonName:True PrefLabel:“Prisca”Extra- biblical: False AltLabel:“Priscilla” Entities:agent:Prisca.1 Articles:Anchor.PRISCAPERSON, Tyndale.L4559, … Topics:http://topics.logos.com/Prisca Wikipedia:Priscilla and Aquila Identity Utility Relationships

17 LCV as Linked Data: Deceit Id:deceitLabel:“Deceit” Type:Name:False PrefLabel:“Deceit”Extra- biblical: False AltLabel:“Deception”, “Deceitful”, “Deceive” Articles:ISBE.DECEIT, NBD.R494, … Topics:http://topics.logos.com/deceit Identity Utility Relationships

18 Example Semantics lcvinst:Aaron_Person rdf:type skos:Concept ; skos:prefLabel "Aaron"@en ; lcv:isname "true"^^xsd:boolean ; lcv:termType lcv:Person ; skos:related lcvinst:aaronsRod ; lcv:bkentity bk:Aaron. res:anch.AARONPERSON rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person. res:TYNBIBDCT.L1 rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person. res:isbe.AARON rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person.

19 Semantic Inter-relationships Concrete Conceptua l

20 LCV Development Developed by merging content from 7 Bible dictionaries –Extract headwords –Do automatic alignment (conservative) –Review manually Reduced > 40k concepts down to ~10k Initial (>40k) Automati c (23k) Manual (10k)

21 LCV Development Continues Additional resources suggest new concepts: –Archaeol. Dict. of the Holy Land: 90/547 (16%) Mostly very specific locations (%EinSamiya_Place) –Nelson's Illus. Bible Dictionary: 200/4833 (4%) –Harper's Bible Dictionary: 81/2962 (3%) Adding alternate terms Subject areas for further expansion: –Individuals from church history –Specialized theological concepts

22 Use Case: Improved Topic Search Link to the same concept regardless of how originally labeled Provide consistent semantics for content Suggest alternate concepts for the same term Provide query expansions for full text search

23 Use Case: Information Discovery Automatically link –Reference to concepts –Concept to related concepts –Concept to references

24 Text Mining: Reference to Concepts Aggregate reference counts –Each article votes on most likely references –Each concept votes on the most likely concepts for a reference Reverse index from reference to concepts Estimates should improve with more content

25 Text Mining: Related Concepts Extract and aggregate key terms Cluster documents

26 Conclusions Controlled vocabulary coupled with parallel content Platform for text mining, user contribution Future Work –Continue adding resources –Additional content extraction –Add hierarchy (LCSH, WordNet) –Crowdsourcing

27 Resources A Controlled Vocabulary for Biblical Studies (Boisen). Presentation at BibleTech:2010.A Controlled Vocabulary for Biblical Studies Domain-Specific Tools to Add Value to E-Books (Pritchett). Presentation at O'Reilly Tools of Change for Publishing Conference 2010.Domain-Specific Tools to Add Value to E-Books Deploying Semantic Technologies for Digital Publishing (Boisen). Presentation at SemTech:2007.Deploying Semantic Technologies for Digital Publishing


Download ppt "Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen Logos Bible Software SemTech 2010 Slides:"

Similar presentations


Ads by Google