Presentation is loading. Please wait.

Presentation is loading. Please wait.

26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

Similar presentations


Presentation on theme: "26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva."— Presentation transcript:

1 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva

2 University of Sheffield NLP SWESE'08 2 26/10/2008 Outline Motivation The GATE case study Semantic-based prototype Data collection Automatic content augmentation Storing implicit annotations Querying using text-based queries Example Conclusion and Future work

3 University of Sheffield NLP SWESE'08 3 26/10/2008 Motivation Large software frameworks: hard to maintain: never enough documentation hard to find specific information significant learning curve for new developers working on software extensions software engineers who integrate relevant parts into their applications

4 University of Sheffield NLP SWESE'08 4 26/10/2008 Can semantic technologies help? forum post Web site source code forum post forum post Web site paper source code forum post forum post Web site Web site source code source code Software documentation

5 University of Sheffield NLP SWESE'08 5 26/10/2008 The GATE case study GATE (gate.ac.uk): open-source, General Architecture for Text Engineering development team over 15 people at present, over 30 over the years documentation about GATE software: dispersed on the Web: not easy to find by new/existing developers/users no unified interface: Google, gate.ac.uk, gmane mailing list search, etc.

6 University of Sheffield NLP SWESE'08 6 26/10/2008 The GATE case study: requirements Automatic generation of reference pages from the ontology: provide users with a single point of access to all knowledge, continuously kept up to date. generate automatically a web page: shown on its own or alongside the ontology tree, where searched concept is selected

7 University of Sheffield NLP SWESE'08 7 26/10/2008 Semantic-based prototype Software documentation learn domain ontology annotate content Semantic repository store text-based query

8 University of Sheffield NLP SWESE'08 8 26/10/2008 Semantic-based prototype: detailed view Content Augmentation Service annotat e annotations Content Augmentation Index Semantic repository

9 University of Sheffield NLP SWESE'08 9 26/10/2008 Data collection Downloaded around 10000 software artefacts about GATE: source code, source documentation, GATE manual, forum posts, publications.

10 University of Sheffield NLP SWESE'08 10 26/10/2008 Annotate content

11 University of Sheffield NLP SWESE'08 11 26/10/2008 Export annotations Merge document metadata and annotations into the owl file using an information-extraction ontology: PROTON KM ( http://proton.semanticweb.org/2005/04/protonkm )

12 University of Sheffield NLP SWESE'08 12 26/10/2008 Information-extraction ontology Document class resourceType property: refers to the type of the document, informationResourceIdentier property: refers to the URL of the annotated document. Mention class: occursIn Document hasStartOffset and hasEndOffset: storing position of the annotation (new) refersAnything: to preserve the URI of the resource to which the mention is referring to

13 University of Sheffield NLP SWESE'08 13 26/10/2008 Export annotations

14 University of Sheffield NLP SWESE'08 14 26/10/2008 Document class <rdf:Description rdf:about= "gate:id_ee7ba66b-cd71-4993-9635-777b24f46372"> <rdf:type rdf:resource= "http://proton.semanticweb.org/2005/04/protont#Document"/> http://gate.ac.uk/gate/doc/java2html/gate/creole/gazetteer/ FlexibleGazetteer.java.html Source Code

15 University of Sheffield NLP SWESE'08 15 26/10/2008 Mention class <rdf:Description rdf:about= "gate:mention_0c45b1dc-efab-48a2-8242-bb78c1ddd3b5"> <rdf:type rdf:resource= "http://proton.semanticweb.org/2005/04/protonkm#Mention"/> <protonkm:occursIn rdf:resource= "gate:id_ee7ba66b-cd71-4993-9635-777b24f46372"/> 404 409 <gate:refersAnything rdf:resource=" http://gate.ac.uk/ns/gate-ontology#NA"/>

16 University of Sheffield NLP SWESE'08 16 26/10/2008 Access knowledge using text-based queries QuestIO (Question- based interface to ontologies): keyword-based queries full-blown questions

17 University of Sheffield NLP SWESE'08 QuestIO: Text-based query >> SeRQL select c0,"[inverseProperty]", p1, c2,"[inverseProperty]", p3, c4,"[inverseProperty]", p5, i6 from {c0} rdf:type { }, {c2} p1 {c0}, {c2} rdf:type { }, {c4} p3 {c2}, {c4} rdf:type { }, {i6} p5 {c4}, {i6} rdf:type { } where p1=http://gate.ac.uk/ns/gate- ontology#parameterHasType and p3=http://gate.ac.uk/ns/gate- ontology#hasRunTimeParameter and p5=http://gate.ac.uk/ns/gate-ontology#containsResource and i6= http://gate.ac.uk/ns/gate- ontology#parameterHasTypehttp://gate.ac.uk/ns/gate- ontology#hasRunTimeParameterhttp://gate.ac.uk/ns/gate-ontology#containsResource Java Class for parameters for processing resources in ANNIC?

18 University of Sheffield NLP SWESE'08 18 26/10/2008 An example

19 University of Sheffield NLP SWESE'08 19 26/10/2008 Demo http://gate.ac.uk/document-search

20 University of Sheffield NLP SWESE'08 Evaluation on coverage and correctness 36 questions extracted from GATE list 22 out of 36 questions were answerable (the answer was in the knowledge base): 12 correctly answered (54.5%) 6 with partially corrected answer (27.3%) system failed to create a SeRQL query or created a wrong one for 4 questions (18.2%) Total score: 68% correctly answered 32% did not answer at all or did not answer correctly In similar evaluation AquaLog correctly answered 58%.

21 University of Sheffield NLP SWESE'08 26/10/2008 Comparison with Aqualog removed 6 questions not supported by Aqualog: 1 conjunction query What are the run parameters of POS Tagger and Sentence splitter? 1 query with brackets Does GATE have a coreference resolution component (PR)? 1 query starting with How many... 3 queries not full-blown questions, e.g. I cannot get Wordnet plugin to work.

22 University of Sheffield NLP SWESE'08 22 26/10/2008 Future Work optimise query execution time: migrate from SeRQL >> SPARQL include simple ontology-driven data in the interface evaluation to follow: user-centric evaluation with GATE users

23 University of Sheffield NLP SWESE'08 23 26/10/2008 Thank you! Questions?


Download ppt "26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva."

Similar presentations


Ads by Google