Lushan Han, Tim Finin, Cynthia Parr, Joel Sachs, and Anupam Joshi RDF123: from Spreadsheets to RDF.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Three-Step Database Design
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
XML: Extensible Markup Language
Visual Scripting of XML
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Database Systems: Design, Implementation, and Management Tenth Edition
Semantics Session 1 (mon 19, 16:30-18:00, Vulcania 1) Vocabularies: –Overview of vocabulary document (APM) –Discussion to resolve WD open issues (NG, AG,...)
Database Security and Auditing: Protecting Data Integrity and Accessibility Chapter 5 Database Application Security Models.
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 SWE Introduction to Software Engineering Lecture 23 – Architectural Design (Chapter 13)
Introduction to Databases Transparencies
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
Software Architecture Patterns (2). what is architecture? (recap) o an overall blueprint/model describing the structures and properties of a "system"
Chapter 5 Database Application Security Models
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Formalizing and Querying Heterogeneous Documents with Tables Krishnaprasad Thirunarayan and Trivikram Immaneni Department of Computer Science and Engineering.
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Logics for Data and Knowledge Representation
Introduction to MDA (Model Driven Architecture) CYT.
Building an Ontology of Semantic Web Techniques Utilizing RDF Schema and OWL 2.0 in Protégé 4.0 Presented by: Naveed Javed Nimat Umar Syed.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics.
Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.
Implementing an RDF Schema for Pathology Images, From the Association for Pathology Informatics Jules J. Berman, Ph.D., M.D. APIII, Pittsburgh, PA Monday,
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Description of Information Resources: RDF/RDFS (an Introduction)
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
The relational model A data model (in general) : Integrated collection of concepts for describing data (data requirements). Relational model was introduced.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
Object storage and object interoperability
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Department of Mathematics Computer and Information Science1 CS 351: Database Management Systems Christopher I. G. Lanclos Chapter 4.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
26/02/ WSMO – UDDI Semantics Review Taxonomies and Value Sets Discussion Paper Max Voskob – February 2004 UDDI Spec TC V4 Requirements.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
XBRL-CSV Overview.
Logical Database Design and the Rational Model
IST 220 – Intro to Databases
Cloud based linked data platform for Structural Engineering Experiment
Entity-Relationship Model
The BARTOC story: from blog to basic to full terminology registry
Chapter 2 Database Environment.
UMBC AN HONORS UNIVERSITY IN MARYLAND
PREMIS Tools and Services
Database Systems Instructor Name: Lecture-3.
LOD reference architecture
Overview of Oracle Site Hub
Attributes and Values Describing Entities.
Presentation transcript:

Lushan Han, Tim Finin, Cynthia Parr, Joel Sachs, and Anupam Joshi RDF123: from Spreadsheets to RDF

Motivation Related Work Translation Design Incorporating Metadata RDF123 Graphical Application RDF123 Web Service RDF123 Map Layer Problems and Future Work Road Map

Motivation One bottleneck of the Semantic Web is lack of data. We hope end users can participate in building the Semantic Web by contributing their own data. On the other hand, a significant amount of the world’s data is maintained in spreadsheets. easy to understand and use representational power adequate for many common purposes online spreadsheets support collaboration. Thus, Spreadsheets provide a good media that can be directly maintained by end users and automatically translated into RDF

Related Work Existing programs to convert spreadsheet to RDF, such as ConvertToRDF map only to star-shaped RDF graphs, not flexible enough for general purpose spreadsheets GRDDL Spreadsheet  XML  RDF :Involve an additional step to push the spreadsheet data to XML XSLT transform, which GRDDL relies on, is hard to create for users who are not XSLT specialists.

Translation Design – Overview RDF123’s translation from a spreadsheet to an RDF graph is driven by a map which permits a rich schema to apply to a row, rather than just creating a single instance of a RDF/OWL class. allows different rows to use fairly different schemata

Translation Design – In Detail Every row of a spreadsheet will generate a row graph. the RDF graph produced for the whole spreadsheet is the merge of all row graphs, eliminating duplicated resources and triples. If we overlap these row graphs by unifying similar vertices and edges, we end up with a graph that is a super graph of every row graph, with similar vertices/edges in different row graphs converging on a single vertex/edge. We name the super graph as map graph.

Translation Design – In Detail When the map graph should produce different labels for a converged vertex or edge in different row graphs, an expression is used for the vertex or edge rather than a static label. Expressions can use if-then-else sub-expressions and string manipulation operators to compute a label Since the map graph is a super graph of every row graph, for those vertices and edges which are in the map graph but absent from a row graph, the expressions will output empty strings, which signal that no vertex or edge should be created.

Translation Design – how to find the map graph Typically the map graph resembles a diagram of entities and their relationships that captures what users have interpreted from a spreadsheet. Spreadsheets provide a convenient way for users to capture the similarity of data, group and store similar data together in a succinct, informal but intuitive schema. RDF123 map graph can be a template that copies the intuitive schema of a spreadsheet and allows subtleties and dissimilarities within similarity to be expressed with RDF123 expressions.

Translation Design - Expression The role of an RDF123 expression is to produce a final label for a converged vertex or edge. Has a context-free grammar and is able to do branch, arithmetic and string processing operations. While string concatenation and equality use an infix notation, other operations employ a functional notation. such arg2; arg3) arg2) expressions can be recursively embedded in other expressions

Translation Design - Vertex Type We need know the RDF data type for a converged vertex before we can put the data as RDF. The potential type could be one of several data types (e.g., rdf:Resource, rdf:Literal, XML data types) or even composite data types like RDF container, collection and etc. We allow users to explicitly append a vertex type at the end of a static label or RDF123 expression. For example, Ex:$1ˆˆinteger. When lacking an explicit data type, we take the following heuristic: For those vertices which have outgoing edges, we make them rdf:Resource. For those leaf vertices, if the final label is a valid URI, we make it a rdf:Resource otherwise a rdf:Literal.

Translation Design – Example A simple spreadsheet for the members of a research club The corresponding map graph

Translation Design – Example This is the map graph serialized in RDF/XML syntax

Translation Design - Summary high expressiveness since the map graph can be arbitrary graph. More intuitive than an XSLT transformation because it is expressed as a graph and can be visualized and authored with RDF123 graphical application.

Incorporating Metadata RDF123 allows users to specify metadata both in map files and in spreadsheets. The metadata serves two functions. One is to provide parameters to the translation procedure, such as the spreadsheet region containing the table to be translated, the map file’s URL and etc. The other is to add RDF descriptions to the produced RDF graph, such as title, author, and comment. Besides functioning as annotations, the descriptions also provide an identifier via a map file or spreadsheet template to facilitate search.

Metadata in a Spreadsheet Spreadsheet metadata is embedded into a contiguous and isolated tabular area with two columns and a header rdf123:metadata’. This way of specifying metadata is preferred when you are the owner of the spreadsheet

Metadata in the Map Graph The RDF123 expression ’ Ex:?’ stands for the base URI of the online RDF document to be translated to. The properties’rdf123:startRow’ and ’rdf123:endRow’ are used to specify the translation metadata. This way of specifying metadata is prefered when the map file is applied to other people’s online spreadsheets

RDF123 Architecture RDF123 consists of two components, the RDF123 application and the RDF123 web service. The application provides a graphical interface for authoring RDF123 maps. The Web service is designed to automatically generate RDF documents from online spreadsheets either by specifying the location of RDF123 maps in the service or the spreadsheet itself.

RDF123 Graphical Application RDF123 application provides a graphical interface for creating, inspecting and editing RDF123 maps and using them to generate RDF documents from local spreadsheets

RDF123 Web Service RDF123 web service has a simple syntax. The service URL is and it takes three basic parameters: ’src’, ’map’ and ’out’. If a spreadsheet has an embedded link to its online map file, we just need to specify the URL of the spreadsheet with the ’src’ parameter. The parameter ’out’ is used to specify the output syntax. Default one is rdf/xml. Currently support two spreadsheet format: CSV and Google Spreadsheet Example:

RDF123 Map Layer Adding a map layer between the original data in spreadsheets and converted data in RDF can smooth data reusability and maintenance. By using RDF123 maps, the same spreadsheet data can be available in different domains just by associating it with different map files. Data maintenance is eased, since data is directly maintained by spreadsheet owners and the RDF data is always rendered current. Can play a role in integrating data from heterogeneous spreadsheets created by different organizations.

A Easy Way to Publish and Harvest RDF Data from Spreadsheets First, many RDF123 spreadsheet templates about different subjects can be distributed among end users. End users can fill in their own data and publish the instantiated spreadsheets online. Then, query Google for spreadsheet files using keywords that are particular to RDF123 metadata like ’rdf123:metadata’ and the identifiers in the templates Convert them to RDF through RDF123 Web service

Problems and Future Work Problem 1: Although drawing a map graph in the RDF123 application is not hard, choosing proper Semantic Web terms and dealing with URI would be very hard for end users. Problem 2: Different people, without communication between them, may use different sets of terms in authoring a map graph even though the concepts in their spreadsheets are the same. This makes data integration very hard. Future work: We are developing a system allowing users to simply use English words for class and property names in authoring their map graphs and the system can map the set of English names to the set of the most standard and consistent Semantic Web terms in spite of slightly different ways people may give names to their concepts. (Part of this work is published as a student abstract in AAAI 2008)

End Thank you!! Questions? RDF123 downloadable from ebiquity website (search ‘rdf123’ from Google).