Compass Semantic search www.ovitas.no
Basics Knowledge model based information retrieval Fulltext search enhanced with Topic Maps = Semantic search Search driven navigation 12.10.2006 TMRA '06
Search technologies Semantic search Level of precision ("Intelligence") Conceptual search Full-text search For å sette dette i sammenheng med forskjellige søke metoder: Fulltekst: stort domene (Internett), rask og enkel implementasjon Konseptuelle: Analyse av kombinasjon av ord Semantiske: analyse av søkestreng og data basert på en kunnskapsmodell Problemer med "tradisjonelle søk": basert på statistikk, for mange treff, upresise svar,lite fleksibel mhht formulering/feilstavelser/etc Ex: Skiferie i Nordnorge? Data volume (Domain size) Compass 12.10.2006 TMRA '06
Given... a web site with a lot of text, which is unstructured (no markup, no tags), a controlled domain (we know what the discourse domain is), and non-adequate search engine... 12.10.2006 TMRA '06
We would like to... get relevant hits within a meaningful context, spare the work of structuring our data, add semantics to the content by defining a knowledge model. 12.10.2006 TMRA '06
Compass-bowl: Take a fulltext search engine. Take a Topic Maps engine. Add a hint of semantics. Define the correct processes for orchestrating the components. Mix them thoroughly. Serve to public! 12.10.2006 TMRA '06
Full text search engine Apache Lucene (open source) Possible to index most file formats html, asp, php, jsp, pdf, rtf, txt, doc, ppt, xls, pst… The index is independent of the model No need to re-index when changes are made to the model Small index size typically less than 10% of the size of the data Fast index lookup less than 20 ms for index size >20000 12.10.2006 TMRA '06
The knowledge model Based on the ISO International Standard for Topic Maps Semantic model of the discourse domain Concept words = topic names/synonyms Semantic relationships through associations Compass Weight defines “closeness” between topics property on association types 12.10.2006 TMRA '06
Example Ovitas hasProduct hasEmployee Compass Christopher CW=0.8 type 12.10.2006 TMRA '06
Compass orchestrator Guides the processes of the search: Search for term in the topic map Expand the map for relevant/related topics Send all these terms off to a fulltext search Calculates relevance (based on the combination of CW and Lucene weights) and prepares the result list as an XML instance Render XML as wished 12.10.2006 TMRA '06
Hits in the fulltext gruouped by the related topics Search term Hits in the fulltext gruouped by the related topics Topic Map expansion Relevant documents ranked by the weighting result
Search term in the topic map, but not in the text Relevant information about ”Chris Searle”
Synonym search
Creating/maintaining the model An MS Excel plug-in serves as the topic map editor Can be put under version control Import the model into the topic map engine: one click only For complex topic maps a custom user interface can be used to enter instance data 12.10.2006 TMRA '06
Navigation Navigation through the associations between topics Navigation by search 12.10.2006 TMRA '06
User configurations What pages to index What topic map to use The number of hops to perform The threshold for relevance 12.10.2006 TMRA '06
Content lifecycle management Easy to integrate with content repositories A content management or publishing system can send a request to the indexer to re-index a particular resource Incremental indexing: add, update or delete documents HTTP is used as the basic mechanism to address content 12.10.2006 TMRA '06
Architecture SOA (service oriented architecture), no dependency on platform or components Web service interface (HTTPRest) .NET platform Integrated components: TMCore Topic Maps engine by NetworkedPlanet Apache Lucene: full text engine 12.10.2006 TMRA '06
Publishing System Services Architecture diagram TM Nav TM Core Full Text Excel Editor Compass Service TM editor person Publishing System Services User 12.10.2006 TMRA '06
Compass - Summary Semantic search based on Topic Maps Search in any document formats Organize information in a topic-oriented manner Link to relevant information without touching the data content Conceptual navigation by Topic Maps Tools for maintaining/evolving the classification Fast and easy implementation 12.10.2006 TMRA '06