Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Wikis: Fusing Two Strands of the Semantic Web Dr. Mark Greaves Vulcan Inc. © 2008 Vulcan Inc.

Similar presentations


Presentation on theme: "Semantic Wikis: Fusing Two Strands of the Semantic Web Dr. Mark Greaves Vulcan Inc. © 2008 Vulcan Inc."— Presentation transcript:

1 Semantic Wikis: Fusing Two Strands of the Semantic Web Dr. Mark Greaves Vulcan Inc. markg@vulcan.com © 2008 Vulcan Inc.

2 2 Talk Outline The Argument for Semantic Wikis –Two Strands of the Semantic Web –Semantic Wikis: Bridging the Gap –Lessons from the Design of SMW+ Semantic Wiki Experience with Vulcan’s Project Halo –Question Answering in Science –Wikis for Question Answering Semantic MediaWiki+

3 3 Talk Outline The Argument for Semantic Wikis –Two Strands of the Semantic Web –Semantic Wikis: Bridging the Gap –Lessons from the Design of SMW+ Semantic Wiki Experience with Vulcan’s Project Halo –Question Answering in Science –Wikis for Question Answering Semantic MediaWiki+

4 4 Strand 1: The Semantic Strand of the Semantic Web Semantic Web as RDBMS Integration Technology –Semantic representation of schema relations –Centralized workflows for ontology/data definition and management –Powerful reasoning and inference –Enterprise-oriented Rooted in the original software/tools of the Semantic Web –Initial triplestores and authoring systems were (mostly) stand-alone or within the confines of a controlled data set –Early DARPA use cases were oriented around data integration EII-style applications: BBN’s Foreign Clearance Guide for AMC More XML-oriented than Web-oriented The Primary Commercial use of Semantic Web for many years –Examples: Siderean Seamark, Oracle RDF –Still the most well-understood use cases for the semantic web –Still extremely important commercially

5 5 Strand 2: The Web Strand of the Semantic Web Semantic Web as a web-scale knowledge publishing technology –Uncontrolled data dynamics, imperfect and voluminous data –Anyone can publish with limited/no knowledge engineer involvement –A massive base of socially-curated semantic data –Balance between quantity and purity (issue with owl:sameAs links) –Semantic data doesn’t have to be associated with HTML web text Rooted in the original vision of the Semantic Web –Took several years to start to be realized –Difficulty conceiving of massive numbers of overlapping ontologies and class hierarchies, and uncoordinated data publishing –Hard problem is maintaining a set of informal, evolving, and partial agreements on vocabularies and ontologies An exciting and emerging data set –Examples: Yahoo!, Sindice, Linking Open Data –Fairly poorly understood use cases (especially commercially) –Web-oriented and web-scale is extremely attractive

6 6 What do Strand 2 Semantic Web Applications Do? Strand 1 semantic web applications have enterprise use cases –EII, E-science, Enterprise content management... –Success of use cases requires unified data models, familiar to DB thinking Strand 2 semantic web applications address a brand new use case type –“Semantic Web should allow people to have a better online experience” – Alex Iskold, CEO of AdaptiveBlue –Enhance the human activities of content creation, publishing, linking my data to other data, forming community, purchasing satisfying things, browsing, etc. –Strongly linked to Web 2.0 business models (such as they are) Improve the effectiveness/targeting of advertising Knowledge management tools for communities Strand 2 use cases still require Strand 1-style data consistency and vocabulary agreement Can Strand 2 Semantic Web Applications Overcome the Data Chaos of the Emerging Semantic Web?

7 7 Semantic Wikis are in both Strands Wikis are tools for Publication and Consensus MediaWiki (software for Wikipedia, Wikimedia, Wikibooks, etc.) –Most successful Wiki software High performance: 10K pages/sec served, scalability demonstrated LAMP web server architecture, GPL license –Publication: simple distributed authoring model Wikipedia: >2.5M English articles, >250M edits, >2.5M images, #8 Alexa traffic rank in August –Consensus achieved by global editing and rollback Fixpoint hypothesis, although consensus is not static Gardener/admin role for contentious cases Semantic Wikis apply the wiki idea to structured (typically RDFS) information –Authoring includes instances, data types, vocabularies, classes –Natural language text used for explanations –Automatic list generation from structured data, basic analytics, database imports –See e.g., http://wiki.ontoprise.com for one powerful semantic wikihttp://wiki.ontoprise.com Semantic Wiki Hypotheses: (1) Significant interesting non-RDBMS Semantic Data can be collected cheaply (2) Wiki mechanisms can be used to maintain consensus on vocabularies and classes

8 8 Example: Semantic MediaWiki with Halo Extensions (SMW+) Knowledge Authoring Capabilities –Syntax highlighting when editing a page –Semantic toolbar in edit mode Displays annotations present on the page that is edited Allows changing annotation values without locating the annotation in the wiki text –Autocompletion for all instances, properties, categories and templates –Increased expressivity through n-ary relations (available with the SMW 1.0 release) Semantic MediaWiki+

9 9 Semantic Navigation Capabilities –GUI-based ontology browser, enables browsing of the wiki's taxonomy and lookup of instance and property information –Linklist in edit mode, enables quick access of pages that are within the context of the page being currently edited –Search input field with autocompletion, to prevent typing errors and give a fast overview of relevant content Semantic MediaWiki+ Example: Semantic MediaWiki with Halo Extensions (SMW+)

10 10 Example: Semantic MediaWiki with Halo Extensions (SMW+) Knowledge Retrieval Capabilities –Combined text-based and semantic search –Basic reasoning in queries with sub-/super-category/-property reasoning and resolution of redirects (equality reasoning) –GUI-based query formulation interface Web service integration and import/export support for popular formats Rule system developed for OWL-DLP and most of OWL-R Fully open source under GPL, supported by Ontoprise Semantic MediaWiki+

11 11 Cool Idea... But Does it Work? User tests were performed in Chemistry –20 graduate students were each paid for 20 hours (over 1 month) to collaborate on semantic annotation for chemistry –~700 Wikipedia base articles –US high-school AP exams were provided as content guidance Initial Results (SMW+ 1.0) –Sparse: 1164 pages (entites), avg 5 assertions per entity 226 Relations (1123 relation-statements) and 281 attributes (4721 attribute-statements) –Many bizarre attributes and relations –Very difficult to use with a reasoner User testing and quality results for (SMW+ 1.1) extensions –Initial SUS scoring (6 SMEs, AP science task) went from 43 to 61; final scores in the 70s –3 sessions using the Intrinsic Motivation Inventory (interest/value/usefulness); up 14% –Aided by the consistency bot, users corrected 2072 errors (80% of those found) over 3 months We have continued to build on this framework Gardening Statistics for Test Wiki

12 12 Some Lessons Learned from SMW+ (and Freebase) User Interface design matters –This is core to MediaWiki’s success –Formal usability testing with SMEs matters a lot –Zero-training matters a lot Gardening matters –Users need support for debugging –Gardeners can do large scale ontology editing –Supports “Schema Last” data engineering User-created ontologies are not always well-designed –Flatter than normal –Cheaper than normal Natural language is necessary to augment bare RDF(S) semantics –Supplemental semantics can be usefully carried in natural language

13 13 From Strand 2 Web to Strand 1 Semantics Well-designed semantic wikis make possible certain Strand 2 applications –They enable local consensus-building on socially-published data –They allow Strand 2 knowledge publication to go beyond search Strand 1 semantic data can certainly support Strand 2 applications –Example: use of other triplestore data in SMW+ How can you use Strand 2-collected data to support Strand 1 applications? –Corporate uses of socially-curated data (Metaweb) –Project Halo: Scientific question-answering

14 14 Talk Outline The Argument for Semantic Wikis –Two Strands of the Semantic Web –Semantic Wikis: Bridging the Gap –Lessons from the Design of SMW+ Semantic Wiki Experience with Vulcan’s Project Halo –Question Answering in Science –Wikis for Question Answering Semantic MediaWiki+

15 15 Envisioning the Digital Aristotle for Scientific Knowledge Inspired by Dickson’s Final Encyclopedia, the HAL-9000, and the broad SF vision of computing –The “Big AI” Vision of computers that work with people The volume of scientific knowledge has outpaced our ability to manage it –This volume is too great for researchers in a given domain to keep abreast of all the developments –Research results may have cross-domain implications that are not apparent due to terminology and knowledge volume “Shallow” information retrieval and keyword indexing systems are not well suited to scientific knowledge management because they cannot reason about the subject matter –Example: “What are the reaction products if metallic copper is heated strongly with concentrated sulfuric acid?” (Answer: Cu 2+, SO 2 (g), and H 2 O) Response to a query should supply the answer (possibly coupled with conceptual navigation) rather than simply list 1000s of possibly relevant documents

16 16 The Halo Project in One Slide Project Halo: SME-based Authoring for scientific question- answering systems Project Halo Goal: To determine whether tools can be built to facilitate robust knowledge formulation, query and evaluation by domain experts, with ever-decreasing reliance on knowledge engineers –Can SMEs build robust question-answering systems that demonstrate excellent coverage of a given syllabus, the ability to answer novel questions, and produce readable domain appropriate justifications using reasonable computational resources? –Will SMEs be capable of posing questions and complex problems to these systems? –Do these systems address key failure, scalability and cost issues encountered in expert systems? Experimental Scope: Selected portions of the AP syllabi for chemistry, biology and physics –Example: Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination (a)C 4 H 10 + O 2  CO 2 + H 2 O (b)KClO 3  KCl + O 2 (c)CH 3 CH 2 OH + O 2  CO 2 + H 2 O (d)P 4 + O 2  P 2 O 5 (e)N 2 O 5 + H 2 O  HNO 3

17 17 AURA – Automated User-centered Reasoning and Acquisition System Aura is a tool to help users formalize AP-level scientific knowledge Aura can then reason with that knowledge So users can ask questions and understand the answers

18 18 2006 Experimental Results for the Aura System Professional KE KBs No natural language ~$10K per syllabus page Domain Number of questions Percentage correct SME1SME2AvgKE Bio14652%24%38%51% Chem8642%33%37.5%40% Phy13116%22%19%21% Halo Pilot System Percent correct Cycorp37% SRI44% Ontoprise47% Time for KF –Concept: ~20 mins for all SMEs –Equation: ~70 s (Chem) to ~120 sec (Physics) –Table: ~10 mins (Chem) –Reaction: ~3.5 mins (Chem) –Constraint: 14s Bio; 88s (Chem) SME need for help –68 requests over 480 person hours (33%/55%/12%) = 1/day VS. Science grad student KBs Extensive natural lang ~$100 per syllabus page Knowledge Formulation Avg time for SME to formulate a question –2.5 min (Bio) –4 min (Chem) –6 min (Physics) –Avg 6 reformulation attempts Usability –SMEs requested no significant help –Pipelined errors dominated failure analysis Question Formulation Biology: 90% answer < 10 sec Chem: 60% answer < 10 sec Physics: 45% answer < 10 sec System Responsiveness Interpretation (Median/Max) Answer (Median/Max) Bio3s / 601s1s / 569s Chem7s / 493s7s / 485s Phy34s / 429s14s / 252s SME Group Pilot Group How Can We Increase the Efficiency of SME Authoring?

19 19 Symbiosis Between Aura and SMW+ Classical Knowledge Engineering –Expressive knowledge representation –Sophisticated testing and debugging Knowledge Engineering in Aura –Acquires knowledge for deductive Q/A that can be used for answering AP questions in sciences Uses a DL style class taxonomy, and logic programming style rules with many extensions –Requires 40 hours of training for knowledge formulation Semantic Web Knowledge Engineering –Simple knowledge representation –Quantity at some expense of quality Knowledge Engineering in SMW+ –Tool for online authoring and consensus- building around semantic web content –Captures knowledge at the level of RDFS –Collective editing for quality control –Gardening appropriate for scientific knowledge –Almost walk up and use system Can we use the Semantic Media Wiki to capture knowledge that could be used for Q/A in AURA? –Factual knowledge (e.g., atomic number for carbon is 6, solubility constraints, etc.) –Taxonomic knowledge (e.g., eukaryotic and prokaryotic are two types of cells) Knowledge creation would be faster, distributed, and cheaper

20 20 Example: Wikipedia Article on Organelle

21 21 Source Text of Article on Organelle in SMW+

22 22 Fact Box Summarizing the Annotations in SMW+

23 23 Ontology Browser for Test Biology Data in SMW+

24 24 Aura/SMW+ Use Case Semantic Wiki includes relevant knowledge Aura knowledge formulation engineer searches for knowledge during knowledge formulation The KFE notices useful information in SMW+ The KFE maps the knowledge into Aura –Currently uses a derivative of Ontomap –Experimenting with FOAM support –ETL workflow The knowledge is translated into Aura and available for querying

25 25 AURA User Searches for Information

26 26 Aura User Notices Useful Information in Wiki

27 27 Aura User Maps Wiki Knowledge into Aura KB

28 28 Wiki Knowledge Available in Aura for Question-Answering

29 29 Conclusions Two strands of semantic web applications –Strand 1: Structured, enterprise-quality semantic data Designed for powerful analytics and easier data fusion –Strand 2: Lightweight web-scale semantic publishing A revolution in AI if we can keep the quality up Semantic Wikis have features from both strands –Easy to see how semantic wikis can leverage Strand 1 data for Strand 2 support –Harder to see how semantic wikis can leverage Strand 2 data for Strand 1 support Vulcan’s Project Halo –Use of SMW+ to use web-collected data in a question-answering application –Addresses very hard AI problems in scaling up knowledge authoring –Full evaluation of SMW+ and Aura in early 2009 Is mapping easier than authoring?

30 30 Thank You Disclaimer: The preceding slides represent the views of the author only. All brands, logos and products are trademarks or registered trademarks of their respective companies.


Download ppt "Semantic Wikis: Fusing Two Strands of the Semantic Web Dr. Mark Greaves Vulcan Inc. © 2008 Vulcan Inc."

Similar presentations


Ads by Google