OCLC Research Library Partners, Works in Progress Series, 12 August 2015 Looking inside the Library Knowledge Vault Bruce Washburn Consulting Software.

Slides:



Advertisements
Similar presentations
Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
Advertisements

OCLC Grid Services Boot Camp About the WorldCat API Bruce Washburn Consulting Software Engineer, OCLC.
WorldCat Search API Bruce Washburn Consulting Software Engineer, OCLC.
WorldCat Search API Bruce Washburn Consulting Software Engineer OCLC.
Aligning BIBFRAME with The Schema/Bib Extend model
From content standards to RDF Gordon Dunsire Presented at AKM 15, Porec, 2011.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Bibliographic Framework Initiative Approach for MARC Data as Linked Data Sally McCallum Library of Congress.
Linked Data, Discovery and Discoverability John McCullough Senior Product Manager, OCLC December 3, 2014 UCL Discovery and Discoverability.
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
EXtensible Catalog: Tools for the creation and use of RDA, FRBRized and linked data David Lindahl eXtensible Catalog Organization University of Rochester,
SEMANTIC WEB INITIATIVES – MAKING LINKED OPEN DATA REAL EMMANUELLE DELMAS-GLASS November 21, 2014.
IFLA Namespaces Gordon Dunsire Chair, IFLA Namespaces Technical Group Session 204 — IFLA library standards and the IFLA Committee on Standards – how can.
The world’s libraries. Connected. Single-search access to Tenn-Share library resources through WorldCat Group Catalog September 28, 2012 Suzanne Butte.
Information Retrieval in Practice
Overview of Search Engines
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
Project Report Presentation and Update October 10, 2014 Jeff Mixter - OCLC Research Patrick OBrien - Montana State Univeristy Kenning Arlitsch - Montana.
ALA Annual 2015 OCLC Research Update Merrilee Proffitt, Senior Program Officer Bruce Washburn, Consulting Software Engineer Diane Vizine-Goetz, Senior.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
RDA and Linked Data Steve Henry University of Maryland March 2, 2013.
RDA and Linking Library Data VuStuff III Conference Villanova University, Villanova, PA October 18, 2012 Dr. Sharon Yang Rider University.
Making Library Collections Discoverable on the Web Axel Kaschte Product Strategy Director EMEA OCLC 04. July, 2015 ICSTI Workshops Hannover.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Society of American Archivists Research Forum 18 August 2015 A Deep Dive into the Archival MARC Records in WorldCat (and ArchiveGrid) Jackie Dooley Program.
ArchiveGrid home page ArchiveGrid search results.
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Virtual International Authority File – introduction & implications Basil Dewhurst Project Manager, ARDC Party Infrastructure Project | National Library.
The world’s libraries. Connected. Giving Voice to Content: A New Generation of Products and Services Managing Library Collections as a Graph Feb 19, 2014.
A Future for the Library Catalogue T. Hickey ACRL/DVC Bryn Mawr 3 November 2006.
RDA Toolkit is an integrated, browser-based, online product that allow user to interact with a collection of cataloging-related documents and resources.
The Future of Cataloging Codes and Systems: IME ICC, FRBR, and RDA by Dr. Barbara B. Tillett Chief, Cataloging Policy & Support Office Library of Congress.
Are We There Yet? A Look Back at The Future of Bibliographic Control Robert Wolven June 18, 2010.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
The Semantic Web and expert metadata: pull apart then bring together Presented at 12.seminar Arhivi, Knjižnice, Muzeji Nov 2008, Pore č, Croatia.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
LITA National Forum 2015 Data Designed for Discovery Roy Tennant OCLC Research.
San Juan, Puerto Rico (21 October 2015) RDA, Linked Data, BIBFRAME Eric Childress Consulting Project Manager OCLC Membership & Research.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Emerging Approaches to Subject Information Terry Willan Talis CIG Conference University of Strathclyde 4.
Sally McCallum Library of Congress
Renee Register Senior Product Manager OCLC Cataloging and Metadata Services Sandy Piver OCLC Publisher Services Consultant OCLC Services for the Publisher.
Combine_and_stir (Aleph data + RDF + Python + other things) IGeLU 2015 Developer’s Day Budapest, Hungary Laura Akerman.
Getting triples from records: the role of ISBD Gordon Dunsire Presented at Centar zu Stalno Stručno Usavršavanje (CSSU), Zagreb 21 Nov 2011.
Current initiatives in developing library linked data Gordon Dunsire Presented at the Cataloguing and Indexing Group Scotland seminar “Linked data and.
Thomas Hickey Chief Scientist, OCLC Research 2015 August VIAF Council State of VIAF VI AF.
CNI Spring 2016 Membership Meeting San Antonio TX Linked Data Implementations— Who, What and Why? Karen Smith-Yoshimura OCLC Research.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Subjects in the FR family
Information Retrieval in Practice
Xiaoli Li Co-head of Content Support Services
TRSS Terminology Registry Scoping Study
Search Engine Architecture
Ilya Zaslavsky Jeffrey Grethe amarnath Gupta burak Ozyurt
BIBFLOW Project Update
Linked Data and Libraries
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Extraction, aggregation and classification at Web Scale
Prototyping a Linked Data Platform for Production Cataloging Workflows
UNIMARC and linked data
Getting started With Linked Data.
Applications of IFLA Namespaces
A Future for the Library Catalogue
PREMIS Tools and Services
Name authority control in an evolving landscape
Dataset Search 王夏霞.
Using Wikibase to Manage Linked Data for Digital Collections
OCLC Project Passage User Interface Assisting the cataloging workflow
Presentation transcript:

OCLC Research Library Partners, Works in Progress Series, 12 August 2015 Looking inside the Library Knowledge Vault Bruce Washburn Consulting Software Engineer, OCLC Research Jeff Mixter Software Engineer, OCLC Research

Describing the Google Knowledge Vault Considering how the Knowledge Vault could apply to Library data Touring the experimental EntityJS application, for discovery of entities through the Library Knowledge Vault Summarizing our experimentation to date, and where we’re headed An Overview of Work in Progress

A Google blog post from 2012 describes the Knowledge Graph that supports searching for the things, people and places that Google knows about and suggestions for relevant related things. The Graph powers the Google Knowledge Panel in search results The Knowledge Graph

A series of recent Google Research papers describe the use of probabilistic models and machine learning to assess the truth of statements made by multiple sources. Li, X., Dong, X. L., Lyons, K., Meng, W., Srivastava, D. (2013). Truth Finding on the Deep Web: Is the Problem Solved? Truth Finding on the Deep Web: Is the Problem Solved? Dong, X. L., Gabrilovich, E., Heitz, G., Horn, W., Murphy, K., Sun, S., Zhang, W. (2013). From Data Fusion to Knowledge Fusion.From Data Fusion to Knowledge Fusion Dong, X. L., Murphy, K., Gabrilovich, E., Heitz, G., Horn, W., Lao, N.,... & Zhang, W. (2014). Knowledge Vault: A Web-scale approach to probabilistic knowledge fusionKnowledge Vault: A Web-scale approach to probabilistic knowledge fusion Dong, X. L., Gabrilovich, E., Murphy, K. Dang, V., Horn, W., … & Zhang, W. (2015). Knowledge-Based Trust: Estimating the Trustworthiness of Web SourcesKnowledge-Based Trust: Estimating the Trustworthiness of Web Sources Estimating Trustworthiness and Finding Truth

Understanding “RDF Triples” A triple is a statement that relates one thing to another, specifying a Subject, Predicate, and Object. RDF triples use URIs for those three elements. SubjectPredicateObject Place ast/ Barack ObamaWas born inHonolulu, Hawaii

1 -- Extractors The 3 Main Components of the Google Knowledge Vault Threshing the Crop,

2 – Graph-based Priors The 3 Main Components of the Google Knowledge Vault Students at Library reference desk at University of Illinois at Chicago Navy Pier Campus.

3 – Knowledge Fusion The 3 Main Components of the Google Knowledge Vault Hollerith Census Machine Dials

Extraction Graph-based Priors Knowledge Fusion

OCLC research scientists and software engineers are evaluating a similar model for bibliographic and authority data sources, in combination with user-contributed content and Linked Data from other providers, to evaluate a “knowledge vault” for statements about entities and their relationships, including people, groups, places, events, concepts, and works. A “Knowledge Vault” for Libraries?

WorldCat – thousands of libraries, museums and archives contribute to the aggregation, and OCLC adds FRBR clustering, algorithmically-deduced connections of strings to Linked Data identifiers, and new work entities. VIAF – 30 or more authority systems contribute, and OCLC merges and links records into new VIAF clusters. FAST – OCLC transforms Library of Congress subject headings into a new controlled vocabulary, friendly to faceted navigation. OCLC produces persistent identifiers and RDF Linked Data for all of these sources. Library data sources

Data Sources Extraction WorldCat VIAF FAST Knowledge Vault data flow Extractor

Data Sources Extraction Knowledge Triples WorldCat VIAF FAST Knowledge Vault data flow Extractor Graph- based Priors

Data Sources Extraction Scored Triples Fusion Knowledge Vault WorldCat VIAF FAST Knowledge Vault data flow Extractor Fusers Graph- based Priors Knowledge Triples

Creating Knowledge Triples from record-oriented data MARC Record Enhanced WorldCat MARC Record MARC Records FRBR Clustering String matching with controlled vocabularies Addition of standard identifiers

Creating Knowledge Triples from record-oriented data MARC Record Enhanced WorldCat MARC Record Persons Organizations Places Concepts Events Works MARC Records RDF Entities FRBR Clustering String matching with controlled vocabularies Addition of standard identifiers

Creating Knowledge Triples from record-oriented data MARC Record Enhanced WorldCat MARC Record Persons Organizations Places Concepts Events Works MARC Records RDF Entities Triples FRBR Clustering String matching with controlled vocabularies Addition of standard identifiers Subject PredicateObject Subject PredicateObject Subject PredicateObject Subject PredicateObject Subject PredicateObject Subject PredicateObject Subject PredicateObject Subject PredicateObject Subject PredicateObject Subject PredicateObject

Using the Library Knowledge Vault Triples in a library knowledge vault provide opportunities for applications supporting discovery, editing, visualization, and more OCLC Research is investigating what it’s like to assemble and work with this kind of data in an experimental discovery system we call “EntityJS”

The EntityJS Research Project Get some real-life experience with using Linked Data, test entity refinement and editing, and push triples back to the knowledge vault.

WorldCat Testing with a subset of Knowledge Just the “ArchiveGrid” WorldCat MARC records ArchiveGrid

Knowledge Triples Scored Triples Testing with a subset of Knowledge Just the “ArchiveGrid” WorldCat MARC records ArchiveGrid Extractor Extraction

Knowledge Triples Scored Triples Testing with a subset of Knowledge Just the “ArchiveGrid” WorldCat MARC records Vault Services EntityJS ArchiveGrid Extractor Extraction

Knowledge Triples Scored Triples WorldCat Testing with a subset of Knowledge Just the “ArchiveGrid” WorldCat MARC records Vault Services EntityJS Wikidata DBPedia VIAF FAST ArchiveGrid Extractor

Knowledge Triples Scored Triples WorldCat Testing with a subset of Knowledge Just the “ArchiveGrid” WorldCat MARC records Vault Services EntityJS Application Triples Wikidata DBPedia VIAF FAST ArchiveGrid Extractor Extraction

Knowledge Triples Scored Triples Knowledge Vault WorldCat Testing with a subset of Knowledge Just the “ArchiveGrid” WorldCat MARC records Vault Services EntityJS Application Triples Wikidata DBPedia VIAF FAST Fusers ArchiveGrid Extractor Extraction

Vault Services Streamline the interaction between the EntityJS client application and the Scored Triples on the server API to interact with the Triplestore API to interact with ElasticSearch Index “PageRank”-like sorting, for entity results

Search across entities

Show related entities

User-contributed “same as” relationships

INSERT DATA { GRAPH ;.}

User-contributed “same as” relationships

ExtractorsCollective Knowledge Triples Scored Triples Fusion Knowledge Vault WorldCat An end-to-end test of the Knowledge Vault Vault Services EntityJS Application Triples Wikidata DBPedia VIAF FAST Fusers ArchiveGrid Extractor

Continued Experimentation Build a way to assign confidence levels to data contributed by EntityJS Use confidence levels as input to a Fusion process to created Scored Triples Extend the EntityJS application to incorporate additional Linked Data resources and support further entity relationship refining and editing

SM Contact us Jeff Mixter Software Engineer, OCLC Research Looking inside the Library Knowledge Vault Bruce Washburn Consulting Software Engineer, OCLC Research