Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

Lucene/Solr Architecture
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
EasySearch Technical Overview. Ever seen a website without a full text search? BUT – Search is expensive Financially Computationally – Search is complicated.
Apache Solr Yonik Seeley 29 June 2006 Dublin, Ireland.
A partnership of Truman Presidential Museum & Library, Truman Institute, and the MU Design Team at CTIE Project Whistlestop.
 Apache Solr Apache Solr – Introduction David Shemer.
Web Applications Development Using Coldbox Platform Eddie Johnston.
For ITCS 6265 Professor: Wensheng Wu Present by TA: Xu Fei.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Solr has a lot of extensive features Solr Integration and Enhancements Todd Hatcher.
Information Retrieval in Practice
Introduction to Open Source Search with Apache Lucene and Solr Grant Ingersoll.
Building Enterprise Information Portal using Oracle Portal 3
1 Web Search Interfaces. 2 Web Search Interface Web search engines of course need a web-based interface. Search page must accept a query string and submit.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Overview of Search Engines
Understanding and Managing WebSphere V5
Implementing search with free software An introduction to Solr By Mick England.
Full-Text Search with Lucene Yonik Seeley 02 May 2007 Amsterdam, Netherlands.
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
Powerful Full-Text Search with Solr Yonik Seeley Web 2.0 Expo, Berlin 8 November 2007 download at
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Introduction to Apache Lucene/Solr CSCI 572: Information Retrieval and Search Engines Summer 2010.
Search Search Drupal with Apache Solr with CERN Web Communications Group – Copyright 2013.
Configuration Management and Server Administration Mohan Bang Endeca Server.
Nutch in a Nutshell (part I) Presented by Liew Guo Min Zhao Jin.
© 2006 IBM Corporation IBM WebSphere Portlet Factory Architecture.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Overview Scale out architecture Servers, services, and topology in Central Administration.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Revolutionizing enterprise web development Searching with Solr.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Searching Business Data with MOSS 2007 Enterprise Search Presenter: Corey Roth Enterprise Consultant Stonebridge Blog:
Indexing UMLS concepts with Apache Lucene Julien Thibault University of Utah Department of Biomedical Informatics.
Chapter 6 Server-side Programming: Java Servlets
CERN - IT Department CH-1211 Genève 23 Switzerland t DB Development Tools Benthic SQL Developer Application Express WLCG Service Reliability.
U.S Geological Survey National Biological Information Infrastructure Technical Overview: NBII Metadata Clearinghouse May 2008 Mike Frame.
Module 10 Administering and Configuring SharePoint Search.
0 SharePoint Search 2013 Rafael de la Cruz SharePoint Developer Seneca Resources twitter.com/delacruz_rafael
Search Engine Architecture
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Server-side Programming The combination of –HTML –JavaScript –DOM is sometimes referred to as Dynamic HTML (DHTML) Web pages that include scripting are.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Design a full-text search engine for a website based on Lucene
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
Lucene Jianguo Lu.
ASSIGNMENT 2 Salim Malakouti. Ticketing Website  User submits tickets  Admins answer tickets or take appropriate actions.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Information Retrieval in Practice
Introduction to YouSeer
Search Engine Architecture
Search Engine Architecture
Custom search forms with Apache Solr David Hernández
Building Search Systems for Digital Library Collections
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
CS6604 Digital Libraries IDEAL Webpages Presented by
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Lucene/Solr Architecture
Getting Started With Solr
Indexing with ElasticSearch
Intro to Azure Search Julie Smith 2019.
SDMX IT Tools SDMX Registry
Intro to Azure Search Julie Smith 2019.
Presentation transcript:

Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010

Outline History What’s Lucene What’s Solr Getting Starting with Solr (Indexing, updating, deleting) Querying Data Other features of Solr IR Concepts and Solr Light demo of Solr Questions

History Search for a replacement search platform commercial: high license fees open-source: no full solutions CNET grants code to Apache, Solr enters Incubator 17 Jan 2006 Solr is a Lucene sub-project

What is Lucene? Solr uses the Lucene Search library and extends it. Open source, high-performance text search engine library. Lucene is not a server and not a web crawler either. Uses scoring algorithms based on Information Retrieval principles. Uses rich set of text analyzers and query syntax with a parser.

Lucene’s index (conceptual) Index Document Field NameValue Figure 1: Lucene index (Kataria S., Khabsa M.,Document Indexing and Scoring algorithm, 2010)

What is Solr Solr is an open source enterprise search platform. Used by ITunes,CNET, Zappos, Netflix as well as intranet sites. Written in Java. XML/HTTP interface. Schema to define types and fields. Web administration interface. DB Solr Web Data Figure 2: Common Solr Usage Data

Major Features of Solr Powerful full-text search Hit highlighting Faceted search Dynamic clustering Database integration

Architecture of Solr Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Figure 3: Architecture of Solr (Seeley Y., Apache Solr, 2006)

Solr Documents Solr accepts well formatted XML documents CNN Breaking News – Obama wins Barack Obama is the 44 th president of the USA T23:59:59.999Z

Getting Started with Solr How to run Solr on the IBM cloud system Log in to the system Using putty and generated private key Go to team1->apache-solr->example Start Solr server Load the in your web browserhttp://localhost:8983/solr/admin/

Indexing Data Solr server is up and running. To index data: Open a new terminal Follow path team1/apache-solr/example/example-docs/ Run "java -jar post.jar" on some of the XML files in that directory

Indexing Data Cont’d To index all data: Run “java –jar post.jar *.xml” Indexed all sample files in the example directory

Solr Admin page Run in your web browserhttp://localhost:8983/solr/admin

Updating Data User can edit the existing XML file to change data Run “java -jar post.jar” command

Deleting Data Delete operation can be done by: Posting a delete command and specifying the value of a document’s unique key field. java -Ddata=args -Dcommit=no -jar post.jar " SP2514N ” Posting a delete command and a query that matches multiple documents. java -Ddata=args -jar post.jar " name:DDR ” Don’t forget to update data “java -jar post.jar”!!!

Querying Data Searches are done with the query string in the q parameter. Example query: q=video Can pass a number of request parameters to control what information is returned. Example: “fl" parameter to control what stored fields are returned Example query: q=video&fl=name,id,score (return estimated relevancy score)

Querying Data cont’d Example query : q=video Number of documents found in the collection Different fields from the retrieved document query

Querying Data cont’d Example query : q=name:video

Querying Data cont’d Example query : q=video&fl=name,id,score

Querying Data cont’d Example query : q=video&fl=*,score (return all stored fields, as well as estimated relevancy score) Estimated relevancy score

Querying Data cont’d Example query : q=video&sort=price desc&fl=name,id,price

Querying Data cont’d Example query : q=video&wt=json Can be python php, ruby, xml

Highlighting Example query :...&q=video card&fl=name,id&hl=true&hl.fl =name,features Highlighted fields are listed at the bottom of the page

Faceted Search It’s a dynamic clustering of search results into categories Allow users to refine their search result Generates counts for various properties or categories. Also called faceted browsing, faceted navigation The benefits: Superior feedback No surprises or dead ends No selection hierarchy is imposed

Faceted Search Example : CNET website

Faceted Search Example query:...&q=*:*&facet=true&facet.field=cat Generated counts Refers all documents

Faceted Search Example query:...&q=ipod&facet=true&facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] Generated counts

Search Relevancy PowerShot SD 500 PowerShotSD500 SD500 Power Shot PowerShot sd500powershot powershot WhitespaceTokenizer WordDelimiterFilter catenateWords=1 LowercaseFilter power-shot sd500 power-shotsd500 sd500powershot sd500powershot WhitespaceTokenizer WordDelimiterFilter catenateWords=0 LowercaseFilter Query Analysis A Match! Document Analysis Figure 4 : Search Relevancy (Seeley Y., Apache Solr, 2006)

What we’ve Covered Basic information about Solr Structure of Solr How to run Solr instance Adding, deleting, updating documents Make changes to the index Make a query and run it Use Solr admin interface

Other features of Solr Distributed search Numeric field statistic Search result clustering Function queries Boosting More Like This

Relation with IR Concepts Tokenization Scoring tf-idf(Lucene Class Similarity) Lucene Practical Scoring: Boosting – documents, queries Wildcard queries (te?t,test*, te*t) Clustering(result clustering via Carrot2) Lucene’s Conjunctive Search Algorithm uses skip pointers

Relation with IR Concepts Figure 5 : Chapter 7,Information Storage and Retrieval (Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze) Figure 6 : Chapter 1, Lucene In Action (Otis Gospodnetic and Erik Hatcher)

Video file:///C:/Users/Sethi/Documents/Camtasia%20Studi o/Apache-solr-team1/Apache-solr-team1.html file:///C:/Users/Sethi/Documents/Camtasia%20Studi o/Apache-solr-team1/Apache-solr-team1.html

Questions Any questions??? Are you ready for exercises???