Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technical environment for the project

Similar presentations


Presentation on theme: "Technical environment for the project"— Presentation transcript:

1 Technical environment for the project
Andrea Giovanni Nuzzolese (1,2) Jönköping, Januray (!) Semantic Technology Laboratory ISTC-CNR (2) Dipartimento di Scienze dell’Informazione, Università di Bologna

2 Overview OWL API

3 What is a Search Engine? a software that a search engine offers
builds an index on text answers queries using that index a search engine offers scalability relevance ranking integration of different data sources

4 Indexing process convert document extract text and meta data
normalize text write inverted index document 1: “Apache Lucene at Jönköping” document 2: “The Jönköping University” apache -> 1 university -> 2 lucene -> 1 Jönköping -> 1,2

5 Apache Lucene a Java library for indexing and searching
stores index as files on disk input for indexing are Documents each document is composed by a set of Fields no document converter no web crawler

6 Index creation Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); Directory directory = new SimpleFSDirectory( new File("/tmp/index")); IndexWriter indexWriter = new IndexWriter(directory, analyzer, true, MaxFieldLength.UNLIMITED); Document document = new Document(); document.add(new Field("title", "This is my title", Field.Store.YES, Field.Index.ANALYZED)); document.add(new Field("body", "This is the body of my document about Lucene", Field.Store.YES, Field.Index.ANALYZED)); indexWriter.addDocument(document); indexWriter.optimize(); indexWriter.close();

7 Text search Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); Directory directory = new SimpleFSDirectory( new File("/tmp/index")); IndexSearcher indexSearcher = new IndexSearcher(directory); QueryParser qp = new QueryParser(Version.LUCENE_30, "body", analyzer); String userSearch = "document AND lucene"; Query query = qp.parse(userSearch); TopDocs topDocs = indexSearcher.search(query, 10); ScoreDoc[] scoreDocs = topDocs.scoreDocs; for(ScoreDoc scoreDoc : scoreDocs){ System.out.println(scoreDoc.doc + " - " + scoreDoc.score); } indexSearcher.close();

8 References Lucene Home page Lucene Wiki getting started
Lucene Wiki getting started source code (SVN) javadoc

9 What is Solr? a Web application
text index text search a full text search server based on Lucene Lucene is a Java Library Solr is a Web wrapper around Lucene is a REST layer for Apache Lucene REST means Representational State Transfer requests and responses are built around the transfer of representations of resources

10 Why a Web wrapper? Lucene’s full text search offers
no backup/restore mechanism no remote query interface no administration user interface for trouble shooting a lot of Lucene consultants write infrastructure code looks like reinventing wheels!!! RESTful is beautiful

11 Solr in a Nutshell with Solr you can
put documents in the local index via XML/CSV over HTTP query the index via HTTP GET and receive XML/Json results be the administrator via an HTML GUI support replication

12 Solr architecture

13 XML commands add/update documents using <add>
delete documents by id or query using <delete> commit your changes using <commit/> optimize the Lucene index using <optimize/>

14 Let’s index iTunes

15 iTunes XML index

16 Add command schema defined for indexing document with Solr
add trucks to the index via XML

17 iTunes Sample: Solr queries
search for all tracks sounding like "mozilla” type:track track:mozilla~ get all albums from 'Nick Cave' sorted by release year type:album artist:(nick cave); year asc get all tracks containing/starting 'rose' from 'Nick Cave' sorted by album name type:track track:(rose*) artist:(nick cave); album asc get all tracks from Nick Cave's album 'The Best Of' sorted by track nr type:track artist:(nick cave) album:(the best of); tracknr asc

18 References home page wiki tutorial javadoc source code (SVN)
wiki tutorial javadoc source code (SVN)

19 What is Nutch? is an open source web-search software
is an Apache project builds on Lucene and Solr provides crawling services parsing services a link-graph database content management services on top of the indexing capability provided by Lucene allows you to sand up a web-scale infra

20 Why Nutch? value-added capabilities
improving fetching speed parsing and handling of the hundreds of different content types available on the internet handling different protocols for obtaining content better ranking algorithms (OPIC, PageRank) more or less, in Nutch, these capabilities all map to extension points available via Nutch’s plugin framework

21 Nutch’s Architecture Nutch Core facilities Nutch’s extension points
parsing indexing crawling content Acquisition querying plugin Framework Nutch’s extension points scoring, parsing, indexing, querying, URL filtering

22 Nutch’s Architecture

23 Getting Started dowload Nutch (current version is 1.2) indexing
add start URLs to a text file configure conf/crawl-urlfilter.txt configure conf/nutch-site.xml command line call bin/nutch crawl urls –dir crawl –depth 3 –topN 50

24 Nutch Search

25 Nutch Search

26 References home page wiki tutorial code (svn) http://nutch.apache.org/
tutorial code (svn)

27 What is Jena? open-source Java implementation of core Semantic Web
RDF graph manipulation API RDFS and OWL reasoning API Includes the de facto reference RDF/XML parser RDF/XML, N3 and N-triple Input/Output SPARQL query engine rule-based inference engine in-memory or persistent storage Jena TDB Jena SDB developed by HP Laboratories now an Apache Incubator project

28 Key concepts the Jena API builds on RDF core API classes are
Model: the representation of an RDF model and is composed by a set of statements Statement: is a triple subject-object-predicate the subject is a resource the object is a property the predicate is a value or a resource Resource: an object identified by an URI Property: a particular resource that describes relations between resources

29 RDF API // basic definitions
String uri = " String fullName = "Tim Berners-Lee"; // create an empty (default) model via ModelFactory Model model = ModelFactory.createDefaultModel(); // create the resource that identifies Tim Berners-Lee as a person Resource timBernersLee = model.createResource(uri, FOAF.Person); // add the name property to the resource identifying TBL timBernersLee.addProperty(FOAF.name, fullName); Tim Berners-Lee rfd:type foaf:name foaf:Person

30 SPARQL API Model model = …; // define the SPARQL query string
String queryString = "SELECT ?person ?name WHERE {" + "?person < ?name" + "}"; // create the query for Jena Query query = QueryFactory.create(queryString); QueryExecution qExec = QueryExecutionFactory.create(query, model); // execute the query ResultSet rs = qExec.execSelect(); while(rs.hasNext()){ QuerySolution qSolution = rs.next(); // get the person and the name Resource person = qSolution.getResource("person"); Literal name = qSolution.getLiteral("name"); System.out.println(person.getURI() + " has name " + name.getString()); }

31 Storage and named graphs
// basic definitions String bernersLeeName = " String stebeJobsName = " // create models named and respectively Model bernersLeeModel = TDBFactory.createNamedModel(bernersLeeName, "tmp/storage"); Model steveJobsModel = TDBFactory.createNamedModel(stebeJobsName, "tmp/storage"); // get the reference to the union graph of the graphs into the triple store Model unionModel = TDBFactory.createNamedModel("urn:x-arq:UnionGraph", "tmp/storage");

32 What is the OWL API? a Java framework for OWL2
a reference implementation for creating, manipulating and serializing OWL Ontologies is open source and is available under the LGPL License provides an implementation of SWRL is open to reasoners

33 Key concepts OWL API completely builds on OWL2
the root object is the OWLOntologyManager, that manages in-memory a set of OWL ontologies an OWLOntology is a set of OWL axioms an OWLAxiom is a proposition that is not proved but considered to be self-evident

34 Adding OWL axioms OWL API
// create the IRI for the ontology String iriString = " IRI ontologyIRI = IRI.create(iriString); // create the ontology manager OWLOntologyManager manager = OWLManager.createOWLOntologyManager(); // create the ontology identified the ontologyIRI OWLOntology ontology = manager.createOntology(ontologyIRI); // get the factory that allows to specify axioms OWLDataFactory factory = manager.getOWLDataFactory(); OWLClass clsA = factory.getOWLClass(IRI.create(ontologyIRI + "#A")); OWLClass clsB = factory.getOWLClass(IRI.create(ontologyIRI + "#B")); // create the axiom OWLAxiom axiom = factory.getOWLSubClassOfAxiom(clsA, clsB); // add the axiom to the ontology AddAxiom addAxiom = new AddAxiom(ontology, axiom); // apply the change using the manager manager.applyChange(addAxiom);

35 Reasoning OWL API // get the simple configuration for the reasoner OWLReasonerConfiguration config = new SimpleConfiguration(); // we need a concrete reasoner. // here there is a dependency with HermiT ReasonerFactory risfactory = new ReasonerFactory(); // create the reasoner OWLReasoner reasoner = risfactory.createReasoner(ontology,config); // ask the reasoner to do all the necessary work now reasoner.precomputeInferences(); // ask if the ontology is consistent boolean consistent = reasoner.isConsistent(); if(consistent){ NodeSet<OWLClass> subClses = reasoner.getSubClasses(clsA, true); } else{ Node<OWLClass> bottomNode = reasoner.getUnsatisfiableClasses();

36 References http://owlapi.sourceforge.net/index.html
OWL API Matthew Horridge, Sean Bechhofer. The OWL API: A Java API for Working with OWL 2 Ontologies. OWLED 2009, 6th OWL Experienced and Directions Workshop, Chantilly, Virginia, October 2009.

37 What is Stanbol? a modular software stack
a reusable set of components for semantic content management it offers: semantic lifting (FISE) knowledge representation and reasoning (KReS) reference infrastructure and entity hub (RICK) is an Apache Incubator project comes from the IKS project

38 The IKS stack

39 Semantic lifting (FISE)
FISE is the Furtwangen IKS Semantic Engine created during the IKS Hackathon at Furtwangen (Germany) in March 2010 it implements a simple OSGi-based RESTful engine that can enhance textual content, using pluggable enhancement engines.

40 KReS

41 Ontology Network

42 RICK

43 References the IKS wiki the Apache Stanbol home the source code
the Apache Stanbol home the source code


Download ppt "Technical environment for the project"

Similar presentations


Ads by Google