Technical environment for the project

Technical environment for the project
Andrea Giovanni Nuzzolese (1,2) Jönköping, Januray (!) Semantic Technology Laboratory ISTC-CNR (2) Dipartimento di Scienze dell’Informazione, Università di Bologna

Overview OWL API

What is a Search Engine? a software that a search engine offers
builds an index on text answers queries using that index a search engine offers scalability relevance ranking integration of different data sources

Indexing process convert document extract text and meta data
normalize text write inverted index document 1: “Apache Lucene at Jönköping” document 2: “The Jönköping University” apache -> 1 university -> 2 lucene -> 1 Jönköping -> 1,2

Apache Lucene a Java library for indexing and searching
stores index as files on disk input for indexing are Documents each document is composed by a set of Fields no document converter no web crawler

Index creation Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); Directory directory = new SimpleFSDirectory( new File("/tmp/index")); IndexWriter indexWriter = new IndexWriter(directory, analyzer, true, MaxFieldLength.UNLIMITED); Document document = new Document(); document.add(new Field("title", "This is my title", Field.Store.YES, Field.Index.ANALYZED)); document.add(new Field("body", "This is the body of my document about Lucene", Field.Store.YES, Field.Index.ANALYZED)); indexWriter.addDocument(document); indexWriter.optimize(); indexWriter.close();

Text search Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); Directory directory = new SimpleFSDirectory( new File("/tmp/index")); IndexSearcher indexSearcher = new IndexSearcher(directory); QueryParser qp = new QueryParser(Version.LUCENE_30, "body", analyzer); String userSearch = "document AND lucene"; Query query = qp.parse(userSearch); TopDocs topDocs = indexSearcher.search(query, 10); ScoreDoc[] scoreDocs = topDocs.scoreDocs; for(ScoreDoc scoreDoc : scoreDocs){ System.out.println(scoreDoc.doc + " - " + scoreDoc.score); } indexSearcher.close();

References Lucene Home page Lucene Wiki getting started
Lucene Wiki getting started source code (SVN) javadoc

What is Solr? a Web application
text index text search a full text search server based on Lucene Lucene is a Java Library Solr is a Web wrapper around Lucene is a REST layer for Apache Lucene REST means Representational State Transfer requests and responses are built around the transfer of representations of resources

Why a Web wrapper? Lucene’s full text search offers
no backup/restore mechanism no remote query interface no administration user interface for trouble shooting a lot of Lucene consultants write infrastructure code looks like reinventing wheels!!! RESTful is beautiful

Solr in a Nutshell with Solr you can
put documents in the local index via XML/CSV over HTTP query the index via HTTP GET and receive XML/Json results be the administrator via an HTML GUI support replication

Solr architecture

XML commands add/update documents using <add>
delete documents by id or query using <delete> commit your changes using <commit/> optimize the Lucene index using <optimize/>

Let’s index iTunes

iTunes XML index

Add command schema defined for indexing document with Solr
add trucks to the index via XML

iTunes Sample: Solr queries
search for all tracks sounding like "mozilla” type:track track:mozilla~ get all albums from 'Nick Cave' sorted by release year type:album artist:(nick cave); year asc get all tracks containing/starting 'rose' from 'Nick Cave' sorted by album name type:track track:(rose*) artist:(nick cave); album asc get all tracks from Nick Cave's album 'The Best Of' sorted by track nr type:track artist:(nick cave) album:(the best of); tracknr asc

References home page wiki tutorial javadoc source code (SVN)
wiki tutorial javadoc source code (SVN)

What is Nutch? is an open source web-search software
is an Apache project builds on Lucene and Solr provides crawling services parsing services a link-graph database content management services on top of the indexing capability provided by Lucene allows you to sand up a web-scale infra

Why Nutch? value-added capabilities
improving fetching speed parsing and handling of the hundreds of different content types available on the internet handling different protocols for obtaining content better ranking algorithms (OPIC, PageRank) more or less, in Nutch, these capabilities all map to extension points available via Nutch’s plugin framework

Nutch’s Architecture Nutch Core facilities Nutch’s extension points
parsing indexing crawling content Acquisition querying plugin Framework Nutch’s extension points scoring, parsing, indexing, querying, URL filtering

Nutch’s Architecture

Getting Started dowload Nutch (current version is 1.2) indexing
add start URLs to a text file configure conf/crawl-urlfilter.txt configure conf/nutch-site.xml command line call bin/nutch crawl urls –dir crawl –depth 3 –topN 50

Nutch Search

References home page wiki tutorial code (svn) http://nutch.apache.org/
tutorial code (svn)

What is Jena? open-source Java implementation of core Semantic Web
RDF graph manipulation API RDFS and OWL reasoning API Includes the de facto reference RDF/XML parser RDF/XML, N3 and N-triple Input/Output SPARQL query engine rule-based inference engine in-memory or persistent storage Jena TDB Jena SDB developed by HP Laboratories now an Apache Incubator project

Key concepts the Jena API builds on RDF core API classes are
Model: the representation of an RDF model and is composed by a set of statements Statement: is a triple subject-object-predicate the subject is a resource the object is a property the predicate is a value or a resource Resource: an object identified by an URI Property: a particular resource that describes relations between resources

RDF API // basic definitions
String uri = " String fullName = "Tim Berners-Lee"; // create an empty (default) model via ModelFactory Model model = ModelFactory.createDefaultModel(); // create the resource that identifies Tim Berners-Lee as a person Resource timBernersLee = model.createResource(uri, FOAF.Person); // add the name property to the resource identifying TBL timBernersLee.addProperty(FOAF.name, fullName); Tim Berners-Lee rfd:type foaf:name foaf:Person

SPARQL API Model model = …; // define the SPARQL query string
String queryString = "SELECT ?person ?name WHERE {" + "?person < ?name" + "}"; // create the query for Jena Query query = QueryFactory.create(queryString); QueryExecution qExec = QueryExecutionFactory.create(query, model); // execute the query ResultSet rs = qExec.execSelect(); while(rs.hasNext()){ QuerySolution qSolution = rs.next(); // get the person and the name Resource person = qSolution.getResource("person"); Literal name = qSolution.getLiteral("name"); System.out.println(person.getURI() + " has name " + name.getString()); }

Storage and named graphs
// basic definitions String bernersLeeName = " String stebeJobsName = " // create models named and respectively Model bernersLeeModel = TDBFactory.createNamedModel(bernersLeeName, "tmp/storage"); Model steveJobsModel = TDBFactory.createNamedModel(stebeJobsName, "tmp/storage"); // get the reference to the union graph of the graphs into the triple store Model unionModel = TDBFactory.createNamedModel("urn:x-arq:UnionGraph", "tmp/storage");

What is the OWL API? a Java framework for OWL2
a reference implementation for creating, manipulating and serializing OWL Ontologies is open source and is available under the LGPL License provides an implementation of SWRL is open to reasoners

Key concepts OWL API completely builds on OWL2
the root object is the OWLOntologyManager, that manages in-memory a set of OWL ontologies an OWLOntology is a set of OWL axioms an OWLAxiom is a proposition that is not proved but considered to be self-evident

Adding OWL axioms OWL API
// create the IRI for the ontology String iriString = " IRI ontologyIRI = IRI.create(iriString); // create the ontology manager OWLOntologyManager manager = OWLManager.createOWLOntologyManager(); // create the ontology identified the ontologyIRI OWLOntology ontology = manager.createOntology(ontologyIRI); // get the factory that allows to specify axioms OWLDataFactory factory = manager.getOWLDataFactory(); OWLClass clsA = factory.getOWLClass(IRI.create(ontologyIRI + "#A")); OWLClass clsB = factory.getOWLClass(IRI.create(ontologyIRI + "#B")); // create the axiom OWLAxiom axiom = factory.getOWLSubClassOfAxiom(clsA, clsB); // add the axiom to the ontology AddAxiom addAxiom = new AddAxiom(ontology, axiom); // apply the change using the manager manager.applyChange(addAxiom);

Reasoning OWL API // get the simple configuration for the reasoner OWLReasonerConfiguration config = new SimpleConfiguration(); // we need a concrete reasoner. // here there is a dependency with HermiT ReasonerFactory risfactory = new ReasonerFactory(); // create the reasoner OWLReasoner reasoner = risfactory.createReasoner(ontology,config); // ask the reasoner to do all the necessary work now reasoner.precomputeInferences(); // ask if the ontology is consistent boolean consistent = reasoner.isConsistent(); if(consistent){ NodeSet<OWLClass> subClses = reasoner.getSubClasses(clsA, true); } else{ Node<OWLClass> bottomNode = reasoner.getUnsatisfiableClasses();

References http://owlapi.sourceforge.net/index.html
OWL API Matthew Horridge, Sean Bechhofer. The OWL API: A Java API for Working with OWL 2 Ontologies. OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia, October 2009.

What is Stanbol? a modular software stack
a reusable set of components for semantic content management it offers: semantic lifting (FISE) knowledge representation and reasoning (KReS) reference infrastructure and entity hub (RICK) is an Apache Incubator project comes from the IKS project

The IKS stack

Semantic lifting (FISE)
FISE is the Furtwangen IKS Semantic Engine created during the IKS Hackathon at Furtwangen (Germany) in March 2010 it implements a simple OSGi-based RESTful engine that can enhance textual content, using pluggable enhancement engines.

Ontology Network

References the IKS wiki the Apache Stanbol home the source code
the Apache Stanbol home the source code

Technical environment for the project

Similar presentations

Presentation on theme: "Technical environment for the project"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Technical environment for the project

Similar presentations

Presentation on theme: "Technical environment for the project"— Presentation transcript:

Similar presentations

About project

Feedback