@ eBiquity Lab, CSEE, UMBC Swoogle Tutorial (Part I: Swoogle R & D) A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle.

Slides:



Advertisements
Similar presentations
1 Search and Navigate Web Ontologies Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute Aug 22, 2008.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
An Introduction to RDF(S) and a Quick Tour of OWL
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Context Dependent Reasoning.
Roi Adadi David Ben-David.  Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages,
Information Retrieval in Practice
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
Dr. Alexandra I. Cristea RDF.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.
Web Mining Research: A Survey
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang.
Overview of Search Engines
CSE 428 Semantic Web Topics Introduction Jeff Heflin Lehigh University.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
Chapter 6 Understanding Each Other CSE 431 – Intelligent Agents.
@ On Boosting Semantic Web Data Access Li Ding Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County Advisor:
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Logics for Data and Knowledge Representation
@ Swoogle Tutorial (Part II: Swoogle Demo) A canned demo Use-case: UMBC tree survey Presented by eBiquity Lab, CSEE, UMBC.
UMBC an Honors University in Maryland 1 Search Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore County Joint work with Li.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
@ Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 SwoogleSwoogle SwoogleSwoogle search and metadata for the semantic web Partial research support.
Semantic Web Ontology Design Pattern Li Ding Department of Computer Science Rensselaer Polytechnic Institute October 3, 2007 Class notes for CSCI-6962.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
UMBC an Honors University in Maryland 1 Search Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore County Joint work with Li.
UMBC an Honors University in Maryland 1 Information Integration and the Semantic Web Finding knowledge, data and answers Tim Finin University of Maryland,
UMBC an Honors University in Maryland 1 Finding knowledge, data and answers on the Semantic Web Tim Finin University of Maryland, Baltimore County
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
OWL Representing Information Using the Web Ontology Language.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Dr. Lowell Vizenor Ontology and Semantic Technology Practice Lead Alion Science and Technology Semantic Technology: A Basic Introduction.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Important Concepts from the W3C RDF Vocabulary/Schema Sungtae Kim SNU OOPSLA Lab. August 19, 2004.
ELIS – Multimedia Lab PREMIS OWL Sam Coppens Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University.
UMBC an Honors University in Maryland 1 Finding and Ranking Knowledge on the Semantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam.
Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
UMBC an Honors University in Maryland 1 Searching for Knowledge and Data on the Semantic Web Tim Finin University of Maryland, Baltimore County
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
1 Web Services for Semantic Interoperability and Integration Tim Finin University of Maryland, Baltimore County Dagstuhl, 20 September 2004
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Semantic Web in Depth RDF Schema Dr Nicholas Gibbins –
@ How the Semantic Web is Being Used: An Analysis of FOAF Documents Li Ding, Lina Zhou, Tim Finin, Anupam Joshi eBiquity Lab, Department of CSEE University.
Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel.
Information Retrieval in Practice
Charlie Abela Department of Intelligent Computer Systems
Search Engine Architecture
Finding knowledge, data and answers on the Semantic Web
Information Retrieval and the Semantic Web
SWD = SWO + SWI SWD Rank SWD IR Engine
Web Services for Semantic Interoperability and Integration
Presented by ebiqity UMBC Nov, 2004
Visit Swoogle web site at
Data Mining Chapter 6 Search Engines
OntoRank for RDF documents
Semantic-Web, Triple-Strores, and SPARQL
Presentation transcript:

@ eBiquity Lab, CSEE, UMBC Swoogle Tutorial (Part I: Swoogle R & D) A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development Presented by eBiquity Lab, CSEE, UMBC

1. Introduction Motivation Swoogle in the Semantic Web Glossary Swoogle Architecture SwoogleSwoogle SwoogleSwoogle

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Motivation (Google + Web) has made us all smarter something similar is needed by people and software agents for information on the semantic web

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC The Role of Swoogle in Semantic Web Semantic Web Services Semantic web data Software Agents, Applications SW data service database (Web) document RDF document uses Directory/Digest Service Service Finder digests searches Data Finder SwoogleSwoogle

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Concepts Explained wordNet:Agent rdf:type rdfs:Class rdfs:subClassOf foaf:Person foaf:mbox rdfs:domain rdf:type rdf:Property Property Class SWO foaf:mbox rdf:type foaf:Person SWI Individual SWD Term NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows rdf: => rdfs: => foaf: => wordNet: =>

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Glossary Document  A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL).  An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic.  An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic. Term  A term is a non-anonymous RDF resource which is the URI reference of either a class or a property. Individual  An individual refers to a non-anonymous RDF resource which is the URI reference of a class member. In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple. *JENA is a Java framework for writing Semantic Web applications. rdf:type rdfs:Class foaf:Person rdf:type foaf:Person

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Swoogle Architecture metadata creation data analysis interface SWD discovery SWD Metadata Web Service Web Server SWD Cache The Web Candidate URLs Web Crawler SWD Reader IR analyzerSWD analyzer Agent Service

2. Swoogle Research Discovery Digest Search & Navigation Rank Statistics SwoogleSwoogle SwoogleSwoogle

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Discovery - research Discovering URLs of possible SWD automatically  Google-crawler  Focused-crawler  Semantic-Web-crawler, e.g. scutter Revisiting URLs

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Discovery -- results Crawler performance  Google crawler is the best  Focused crawler needs to be improved Verified pure SWDs are only 1/3 of discovered URLs Some NSWDs contains embedded RDF graph. SWDNSWDUndecidedTOTAL Focused Crawler1,4657%10,58052%8,29220,337 google crawler273,02336%369,37149%110,794753,188 swd_crawler61,87015%285,50670%57,709405,085 TOTAL336, , ,7951,178,610 Source: Swoogle (2005-Jan-05) SELECT `discovered_by`, sum(isRDF), sum(1-isRDF), count(*) FROM `digest_url` WHERE 1 group by discovered_by

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Digest -- research Document metadata  Annotative General metadata SWD metadata Ontology metadata  Inter-document relations  Document-term relations Term metadata  Term Definition  Inter-term Relation Class-property bond (C-P bond): rdfs:domain Property-Class bond (P-C bond): rdfs:range

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Document Metadata Web document metadata  When/how discovered/fetched  Suffix of URL  Last modified time  Document size SWD metadata  Language features OWL species RDF encoding  Statistical features # of Defined/used terms # of Declared/used namespaces Ontology Ratio  Ontology Rank Ontology annotation  Label  Version  Comment Relations  Links to other SWDs Imported SWDs Referenced SWDs Extended SWDs Prior version  Links to terms Classes/properties defined Classes/properties used

Digest “Time” Ontology (document view) Demo 2(a)

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Document-Term Relation foaf:mbox rdf:type foaf:Person wordNet:Agent rdf:type rdfs:Class rdfs:subClassOf foaf:Person foaf:mbox rdfs:domain rdf:type rdf:Property populated Class defined Class populated Property defined Property foaf:mbox rdf:type foaf:Person defined Individual

Digest “Time” Ontology (term view) Demo 2(b) ………….

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Term Metadata Term Definition rdfs:subClassOf -- foaf:Agent rdfs:label – “Person” C-P bond (from SWI) foaf:name dc:title C-P bond (from SWO) foaf:mbox foaf:name foaf:mbox rdfs:domain Onto 1 owl:Class rdf:type “Person” rdfs:label foaf:Agent rdfs:subClassOf Onto 2 foaf:name rdf:type “Tim Finin” SWD3 foaf:Person

Digest Term “Person” Demo 4

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Term Distribution (grouped by local name) case-insensitivecase-sensitive Name656 1 name source129 Person399 2 Person Title349 3 title Book124 Location334 4 description address121 Description288 5 location Event117 Date257 6 type Location114 Type242 7 date author111 country236 8 value Animal111 Address212 9 Organization Country104 organization country language103 total total 76827

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Digest -- result typePop.Def.# term Total Terms# populated Total populated class01 83,60288% 00% 10 3,9544% 1,002,96113% 11 7,0657%94,6216,483,48587%7,486,446 property01 42,85373% 00% 10 8,31214% 2,438,4556% 11 7,83613%59,00136,899,84294%39,338,297 Ontological Term Distribution (populated, defined) Source: Swoogle (2005-Jan-05) SELECT res_type,sign(cnt_instance_populate>0), sign(cnt_swd_def>0),count(*), sum(cnt_instance_populate) FROM `digest_term` WHERE 1 group by res_type, sign(cnt_instance_populate>0), sign(cnt_swd_def>0)

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Search & Navigation -- research The Semantic Web is not the Web Search service  Document search – RDF document is not free text  Term search – URIref and compound local name Navigation service  The RDF graph – Typed links  The web of RDF documents – Few hyperlinks  The social network of agents – trust & provenance

Find “Time” Ontology We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology. Demo 1

Find Term “Person” Demo 3 Not capitalized! URIref is case sensitive!

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Current Swoogle Navigation Model A URIref refers to  A term, i.e. instance of RDFS class/property  An individual, i.e. populated terms A SWD could be  SWO: term definition  SWI: individuals Observations  RDF Resources are semantically linked in RDF graph  SWDs are poorly linked due to the absence of explicit hyperlink concept  Ontologies are more interesting Approach  Build inter-document relations  Rational surfing model SWOs SWIs HTML documents Images Audio files Video files

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC URL URIref Semantic Web Navigation Model new! Resource RDF Document populatesClass populatesProperty refersClass refersProperty definesClass definesProperty rdfsOntology owldlOntology owl:imports owl:priorVersion owl:backwardCompatibleWith owl:imcompatiableWith rdfs:seeAlso rdfs:isDefinedBy Ontology Namespace isDefinedBy isUsedBy usesNamespace rdfs:subClassOf sameNamespace sameLocalname RDF Graph Navigation … Term Search Document Search

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Ranking -- research Surfing models Ranking method  PageRank variation What to rankScopeIdea Rational surfing modelSWDSemantic WebSummarize inter-document relation as EX, TM, IM, PV Plain Graph ModelResourceRDF graphRDF graph is browsed as a weighted directed graph RDFS-based ModelResourceRDF graphRDF graph is browsed only with RDFS semantics SW navigation modelResource & SWD Semantic WebAssume Swoogle is used in navigation

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Ranking with Rational Surfing Model: An Example foaf:mbox rdf:type foaf:Person wordNet:Person rdf:type rdfs:Class rdfs:subClassOf foaf:Person TM rdfs:subClassOf rdf:Property rdf:type rdfs:Class rdf:type wordNet:Individual rdfs:subClassOf wordNet:Person EX

Demo 6 Swoogle’ top 10 This report is dynamically generated based on the latest data, and it will take 5 to 10 seconds. Swoogle use PageRank like algorithm to rank semantic web documents. Well-known ontologies are highly ranked.

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Statistics – research Summarize the dataset collected by Swoogle  Swoogle Watch Swoogle Today Distribution of visited URLs Document discovery log Term discovery log  Semantic Web Watch SWD distribution by last-modified month SWD distribution by website SWD distribution by suffix  Ontology Watch Term (class/property) usage Namespace usage

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Demo 5(a) Swoogle Today

Demo 5(b) Swoogle Statistics FOAF Trustix W3C Stanford

Demo 5(c) Swoogle Statistics

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Miscellaneous Submit URL for focused Crawler Swoogle Web Service (Delivered in Sept.)  Search document  Search term  Term digest

When you can’t find your ontologies in Swoogle, it may be the case that your ontologies are not indexed by swoogle yet. Please submit it and increase its visibility. From site map When your query fails Demo 7 Submit URL for focused crawler

3. Summary Summary Current Status SwoogleSwoogle SwoogleSwoogle

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Summary Swoogle (Mar, 2004) Swoogle2 (Sep, 2004) Swoogle3  Automated SWD discovery  SWD metadata creation and search  Ontology rank (rational surfer model)  Swoogle watch  Web Interface  Ontology dictionary  Swoogle statistics  Web service interface (WSDL)  Bag of URIref IR search  Better discovery & revisit strategies  Better navigation models  Semantic web dataset  Index Instance data  More metadata (ontology mapping)  Better web service interfaces

@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Current Status Swoogle Watch reported (Jan 6, 2005)  46.7 M triples  336 K SWDs: 4k ontologies  153 K terms: 94K classes & 59K properties Ongoing work  Research Self-adaptive SWD Discovery Efficient SWD digest and RDF Graph Abstract Semantic Web navigation model  Engineering Enhancing Web Service interface