Download presentation
Presentation is loading. Please wait.
1
Technical environment for the project
Andrea Giovanni Nuzzolese (1,2) Jönköping, Januray (!) Semantic Technology Laboratory ISTC-CNR (2) Dipartimento di Scienze dell’Informazione, Università di Bologna
2
Overview OWL API
3
What is a Search Engine? a software that a search engine offers
builds an index on text answers queries using that index a search engine offers scalability relevance ranking integration of different data sources
4
Indexing process convert document extract text and meta data
normalize text write inverted index document 1: “Apache Lucene at Jönköping” document 2: “The Jönköping University” apache -> 1 university -> 2 lucene -> 1 Jönköping -> 1,2
5
Apache Lucene a Java library for indexing and searching
stores index as files on disk input for indexing are Documents each document is composed by a set of Fields no document converter no web crawler
6
Index creation Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); Directory directory = new SimpleFSDirectory( new File("/tmp/index")); IndexWriter indexWriter = new IndexWriter(directory, analyzer, true, MaxFieldLength.UNLIMITED); Document document = new Document(); document.add(new Field("title", "This is my title", Field.Store.YES, Field.Index.ANALYZED)); document.add(new Field("body", "This is the body of my document about Lucene", Field.Store.YES, Field.Index.ANALYZED)); indexWriter.addDocument(document); indexWriter.optimize(); indexWriter.close();
7
Text search Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); Directory directory = new SimpleFSDirectory( new File("/tmp/index")); IndexSearcher indexSearcher = new IndexSearcher(directory); QueryParser qp = new QueryParser(Version.LUCENE_30, "body", analyzer); String userSearch = "document AND lucene"; Query query = qp.parse(userSearch); TopDocs topDocs = indexSearcher.search(query, 10); ScoreDoc[] scoreDocs = topDocs.scoreDocs; for(ScoreDoc scoreDoc : scoreDocs){ System.out.println(scoreDoc.doc + " - " + scoreDoc.score); } indexSearcher.close();
8
References Lucene Home page Lucene Wiki getting started
Lucene Wiki getting started source code (SVN) javadoc
9
What is Solr? a Web application
text index text search a full text search server based on Lucene Lucene is a Java Library Solr is a Web wrapper around Lucene is a REST layer for Apache Lucene REST means Representational State Transfer requests and responses are built around the transfer of representations of resources
10
Why a Web wrapper? Lucene’s full text search offers
no backup/restore mechanism no remote query interface no administration user interface for trouble shooting a lot of Lucene consultants write infrastructure code looks like reinventing wheels!!! RESTful is beautiful
11
Solr in a Nutshell with Solr you can
put documents in the local index via XML/CSV over HTTP query the index via HTTP GET and receive XML/Json results be the administrator via an HTML GUI support replication
12
Solr architecture
13
XML commands add/update documents using <add>
delete documents by id or query using <delete> commit your changes using <commit/> optimize the Lucene index using <optimize/>
14
Let’s index iTunes
15
iTunes XML index
16
Add command schema defined for indexing document with Solr
add trucks to the index via XML
17
iTunes Sample: Solr queries
search for all tracks sounding like "mozilla” type:track track:mozilla~ get all albums from 'Nick Cave' sorted by release year type:album artist:(nick cave); year asc get all tracks containing/starting 'rose' from 'Nick Cave' sorted by album name type:track track:(rose*) artist:(nick cave); album asc get all tracks from Nick Cave's album 'The Best Of' sorted by track nr type:track artist:(nick cave) album:(the best of); tracknr asc
18
References home page wiki tutorial javadoc source code (SVN)
wiki tutorial javadoc source code (SVN)
19
What is Nutch? is an open source web-search software
is an Apache project builds on Lucene and Solr provides crawling services parsing services a link-graph database content management services on top of the indexing capability provided by Lucene allows you to sand up a web-scale infra
20
Why Nutch? value-added capabilities
improving fetching speed parsing and handling of the hundreds of different content types available on the internet handling different protocols for obtaining content better ranking algorithms (OPIC, PageRank) more or less, in Nutch, these capabilities all map to extension points available via Nutch’s plugin framework
21
Nutch’s Architecture Nutch Core facilities Nutch’s extension points
parsing indexing crawling content Acquisition querying plugin Framework Nutch’s extension points scoring, parsing, indexing, querying, URL filtering
22
Nutch’s Architecture
23
Getting Started dowload Nutch (current version is 1.2) indexing
add start URLs to a text file configure conf/crawl-urlfilter.txt configure conf/nutch-site.xml command line call bin/nutch crawl urls –dir crawl –depth 3 –topN 50
24
Nutch Search
25
Nutch Search
26
References home page wiki tutorial code (svn) http://nutch.apache.org/
tutorial code (svn)
27
What is Jena? open-source Java implementation of core Semantic Web
RDF graph manipulation API RDFS and OWL reasoning API Includes the de facto reference RDF/XML parser RDF/XML, N3 and N-triple Input/Output SPARQL query engine rule-based inference engine in-memory or persistent storage Jena TDB Jena SDB developed by HP Laboratories now an Apache Incubator project
28
Key concepts the Jena API builds on RDF core API classes are
Model: the representation of an RDF model and is composed by a set of statements Statement: is a triple subject-object-predicate the subject is a resource the object is a property the predicate is a value or a resource Resource: an object identified by an URI Property: a particular resource that describes relations between resources
29
RDF API // basic definitions
String uri = " String fullName = "Tim Berners-Lee"; // create an empty (default) model via ModelFactory Model model = ModelFactory.createDefaultModel(); // create the resource that identifies Tim Berners-Lee as a person Resource timBernersLee = model.createResource(uri, FOAF.Person); // add the name property to the resource identifying TBL timBernersLee.addProperty(FOAF.name, fullName); Tim Berners-Lee rfd:type foaf:name foaf:Person
30
SPARQL API Model model = …; // define the SPARQL query string
String queryString = "SELECT ?person ?name WHERE {" + "?person < ?name" + "}"; // create the query for Jena Query query = QueryFactory.create(queryString); QueryExecution qExec = QueryExecutionFactory.create(query, model); // execute the query ResultSet rs = qExec.execSelect(); while(rs.hasNext()){ QuerySolution qSolution = rs.next(); // get the person and the name Resource person = qSolution.getResource("person"); Literal name = qSolution.getLiteral("name"); System.out.println(person.getURI() + " has name " + name.getString()); }
31
Storage and named graphs
// basic definitions String bernersLeeName = " String stebeJobsName = " // create models named and respectively Model bernersLeeModel = TDBFactory.createNamedModel(bernersLeeName, "tmp/storage"); Model steveJobsModel = TDBFactory.createNamedModel(stebeJobsName, "tmp/storage"); // get the reference to the union graph of the graphs into the triple store Model unionModel = TDBFactory.createNamedModel("urn:x-arq:UnionGraph", "tmp/storage");
32
What is the OWL API? a Java framework for OWL2
a reference implementation for creating, manipulating and serializing OWL Ontologies is open source and is available under the LGPL License provides an implementation of SWRL is open to reasoners
33
Key concepts OWL API completely builds on OWL2
the root object is the OWLOntologyManager, that manages in-memory a set of OWL ontologies an OWLOntology is a set of OWL axioms an OWLAxiom is a proposition that is not proved but considered to be self-evident
34
Adding OWL axioms OWL API
// create the IRI for the ontology String iriString = " IRI ontologyIRI = IRI.create(iriString); // create the ontology manager OWLOntologyManager manager = OWLManager.createOWLOntologyManager(); // create the ontology identified the ontologyIRI OWLOntology ontology = manager.createOntology(ontologyIRI); // get the factory that allows to specify axioms OWLDataFactory factory = manager.getOWLDataFactory(); OWLClass clsA = factory.getOWLClass(IRI.create(ontologyIRI + "#A")); OWLClass clsB = factory.getOWLClass(IRI.create(ontologyIRI + "#B")); // create the axiom OWLAxiom axiom = factory.getOWLSubClassOfAxiom(clsA, clsB); // add the axiom to the ontology AddAxiom addAxiom = new AddAxiom(ontology, axiom); // apply the change using the manager manager.applyChange(addAxiom);
35
Reasoning OWL API // get the simple configuration for the reasoner OWLReasonerConfiguration config = new SimpleConfiguration(); // we need a concrete reasoner. // here there is a dependency with HermiT ReasonerFactory risfactory = new ReasonerFactory(); // create the reasoner OWLReasoner reasoner = risfactory.createReasoner(ontology,config); // ask the reasoner to do all the necessary work now reasoner.precomputeInferences(); // ask if the ontology is consistent boolean consistent = reasoner.isConsistent(); if(consistent){ NodeSet<OWLClass> subClses = reasoner.getSubClasses(clsA, true); } else{ Node<OWLClass> bottomNode = reasoner.getUnsatisfiableClasses();
36
References http://owlapi.sourceforge.net/index.html
OWL API Matthew Horridge, Sean Bechhofer. The OWL API: A Java API for Working with OWL 2 Ontologies. OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia, October 2009.
37
What is Stanbol? a modular software stack
a reusable set of components for semantic content management it offers: semantic lifting (FISE) knowledge representation and reasoning (KReS) reference infrastructure and entity hub (RICK) is an Apache Incubator project comes from the IKS project
38
The IKS stack
39
Semantic lifting (FISE)
FISE is the Furtwangen IKS Semantic Engine created during the IKS Hackathon at Furtwangen (Germany) in March 2010 it implements a simple OSGi-based RESTful engine that can enhance textual content, using pluggable enhancement engines.
40
KReS
41
Ontology Network
42
RICK
43
References the IKS wiki the Apache Stanbol home the source code
the Apache Stanbol home the source code
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.