Presentation on theme: "Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Team Submission 16 May 2005 Dominique Hazaël-Massieux, Dan Connolly Summarized by."— Presentation transcript:
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Team Submission 16 May 2005 Dominique Hazaël-Massieux, Dan Connolly Summarized by Umer Fareed
Background Knowledge Representation Document management and knowledge management systems are historically isolated from each other XML & RDF are contemporary manifestations of these methodologies Unified infrastructure for document and knowledge management is needed older ideas at corners newer ideas along sides IDS Lab. Seminar - 2
Where lies the Problem? How to make data presentable and available to the human as well as machine? Improving the quality and availability of data on the Web Making possible more intensive data-integration and more diverse applications that can scale to the size of the Web However, there lie two approaches that rely on (almost) completely different standards, languages, protocols and ideals Reactive Web – focus on community and usability (Web 2.0) Proactive Web – machine-processable semantics (Semantic Web) IDS Lab. Seminar - 3
GRDDL : the solution GRDDL (pronounced griddle) is a W3C recommendation Works as a bridge for turning data expressed in an XML format (such as XHTML) into Semantic Web data Makes web pages, microformat tags, XML documents, and data more readable to Semantic Web applications Opens more data to new possibilities and creative reuse Gleaning Resource Descriptions GRDDL uses markup – for declaring that an XML document includes gleanable data – for linking to an algorithm, typically represented in XSLT for gleaning the resource descriptions from the document IDS Lab. Seminar - 4
Dialects of Markup Languages Many dialects (domain-specific languages) are being used among the XML documents on the web IDS Lab. Seminar - 5 The Jimi Hendrix Experience Are You Experienced?... Are You Experienced? The Jimi Hendrix Experience... Are You Experienced? The Jimi Hendrix Experience The Jimi Hendrix Experience Example: Same musical work described in different XML dialects AtomiTunes Music Library Audioscrobbler Open Office
GRDDL Transformations GRDDL transformation is associated with a well-formed XML document by adding to the root element a grddl namespace declaration and a grddl:transformation attribute whose value is an IRI reference Suitable for use with any XML dialects that can accommodate an extra namespace-qualified attribute on the root element <html xmlns="http://www.w3.org/1999/xhtml" xmlns:grddl='http://www.w3.org/2003/g/data-view#' grddl:transformation="glean_title.xsl http://www.w3.org/2001/sw/grddl-wg/td/getAuthor.xsl" > Are You Experienced? [...]
GRDDL Transformations Namespace Transformation relates a namespace to a transformation for all documents in that namespace Profile Transformation relates a profile document to a transformation for all documents bearing that profile Transformation Property relates a transformation to the algorithm specified by the property that computes an RDF graph from an XML document node
GRDDL Transformations Single transformation of HTML data to RDF Multiple transformations
GRDDL Mechanisms GRDDL Mechanisms GRDDL defines various mechanisms for identifying which transformations may be used for extracting RDF data from XML documents. These can be used individually or together, with the results of any transformations on the original document being merged Mechanism #1 : Using a HTML Profile The easiest way of publishing GRDDL-friendly documents is to reuse an existing profile... Joe Lambda...
GRDDL Mechanisms Mechanism #2 : Using the GRDDL HTML Profile The data-view namespace can be used directly as a profile in a HTML document The XSLT transformation(s) to convert the document into RDF can then be specified in link and a elements within the document Some Document <link rel="transformation" href="http://www.w3.org/2000/06/dc- extract/dc-extract.xsl" />......
GRDDL Mechanisms Mechanism #3 : Transformation attribute on XML root node GRDDL can be directly added to well-formed XML Multiple transformations can be specified, space separated in the transformation attribute's value <feed xmlns="http://www.w3.org/2005/Atom" xmlns:grddl='http://www.w3.org/2003/g/data-view#' grddl:transformation="http://djpowell.net/atomrdf/0.1/files/atom2r df-16.xsl"> Example Atom Feed...
GRDDL Mechanisms Mechanism #4 : Transformation specified in XML Namespace Document Transformations can be associated not only with individual documents but also with whole dialects that share an XML namespace or XHTML profile The namespace document for P3Q relates the grokP3Q.xsl transformation to all P3Q documents
GRDDL-Aware Agent A GRDDL-aware agent is a software module that computes GRDDL results of information resources GRDDL-aware agent retrieve the namespace document and apply the GRDDL mechanisms to it This will yield RDF pointing to the transformation which can be used to convert all documents using that namespace into RDF <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dataview="http://www.w3.org/2003/g/data-view#"> <rdf:Description rdf:about= "http://example.org/myxml"> <dataview:namespaceTransformation rdf:resource= "http://example.org/myxml2rdfxml.xsl"/>
Security Considerations Due to the expressive power of XSLT, and the possibility to access external resources from a XSLT style sheet (e.g. through xsl:import mechanism), appropriate measures to prevent malicious usage should be taken Implementers should pay special attention to the security implications of any media types that can cause the remote execution of any actions in the recipient's environment
Conclusion A Bridge from Flexible Web Applications to the Semantic Web GRDDL makes web pages, microformat tags, XML documents, and data more readable to Semantic Web applications Increase the value of existing data by porting it to the Semantic Web, at very low cost Gleaning Web 2.0 applications based on combining (in mashups) various types of data that are spread all around on the Web