Presentation is loading. Please wait.

Presentation is loading. Please wait.

Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.

Similar presentations


Presentation on theme: "Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010."— Presentation transcript:

1 Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010

2 Outline History What’s Lucene What’s Solr Getting Starting with Solr (Indexing, updating, deleting) Querying Data Other features of Solr IR Concepts and Solr Light demo of Solr Questions

3 History Search for a replacement search platform commercial: high license fees open-source: no full solutions CNET grants code to Apache, Solr enters Incubator 17 Jan 2006 Solr is a Lucene sub-project

4 What is Lucene? Solr uses the Lucene Search library and extends it. Open source, high-performance text search engine library. Lucene is not a server and not a web crawler either. Uses scoring algorithms based on Information Retrieval principles. Uses rich set of text analyzers and query syntax with a parser.

5 Lucene’s index (conceptual) Index Document Field NameValue Figure 1: Lucene index (Kataria S., Khabsa M.,Document Indexing and Scoring algorithm, 2010)

6 What is Solr Solr is an open source enterprise search platform. Used by ITunes,CNET, Zappos, Netflix as well as intranet sites. Written in Java. XML/HTTP interface. Schema to define types and fields. Web administration interface. DB Solr Web Data Figure 2: Common Solr Usage Data

7 Major Features of Solr Powerful full-text search Hit highlighting Faceted search Dynamic clustering Database integration

8 Architecture of Solr Solr Core Lucene Admin Interface Standard Request Handler Disjunction Max Request Handler Custom Request Handler Update Handler Caching XML Update Interface Config Analysis HTTP Request Servlet Concurrency Update Servlet XML Response Writer Replication Schema Figure 3: Architecture of Solr (Seeley Y., Apache Solr, 2006)

9 Solr Documents Solr accepts well formatted XML documents www.cnn.com CNN Breaking News – Obama wins Barack Obama is the 44 th president of the USA 2008-11-06T23:59:59.999Z

10 Getting Started with Solr How to run Solr on the IBM cloud system Log in to the system Using putty and generated private key Go to team1->apache-solr->example Start Solr server Load the http://localhost:8983/solr/admin/ in your web browserhttp://localhost:8983/solr/admin/

11 Indexing Data Solr server is up and running. To index data: Open a new terminal Follow path team1/apache-solr/example/example-docs/ Run "java -jar post.jar" on some of the XML files in that directory

12 Indexing Data Cont’d To index all data: Run “java –jar post.jar *.xml” Indexed all sample files in the example directory

13 Solr Admin page Run http://localhost:8983/solr/admin in your web browserhttp://localhost:8983/solr/admin

14 Updating Data User can edit the existing XML file to change data Run “java -jar post.jar” command

15 Deleting Data Delete operation can be done by: Posting a delete command and specifying the value of a document’s unique key field. java -Ddata=args -Dcommit=no -jar post.jar " SP2514N ” Posting a delete command and a query that matches multiple documents. java -Ddata=args -jar post.jar " name:DDR ” Don’t forget to update data “java -jar post.jar”!!!

16 Querying Data Searches are done with the query string in the q parameter. Example query: q=video Can pass a number of request parameters to control what information is returned. Example: “fl" parameter to control what stored fields are returned Example query: q=video&fl=name,id,score (return estimated relevancy score)

17 Querying Data cont’d Example query : q=video Number of documents found in the collection Different fields from the retrieved document query

18 Querying Data cont’d Example query : q=name:video

19 Querying Data cont’d Example query : q=video&fl=name,id,score

20 Querying Data cont’d Example query : q=video&fl=*,score (return all stored fields, as well as estimated relevancy score) Estimated relevancy score

21 Querying Data cont’d Example query : q=video&sort=price desc&fl=name,id,price

22 Querying Data cont’d Example query : q=video&wt=json Can be python php, ruby, xml

23 Highlighting Example query :...&q=video card&fl=name,id&hl=true&hl.fl =name,features Highlighted fields are listed at the bottom of the page

24 Faceted Search It’s a dynamic clustering of search results into categories Allow users to refine their search result Generates counts for various properties or categories. Also called faceted browsing, faceted navigation The benefits: Superior feedback No surprises or dead ends No selection hierarchy is imposed

25 Faceted Search Example : CNET website

26 Faceted Search Example query:...&q=*:*&facet=true&facet.field=cat Generated counts Refers all documents

27 Faceted Search Example query:...&q=ipod&facet=true&facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] Generated counts

28 Search Relevancy PowerShot SD 500 PowerShotSD500 SD500 Power Shot PowerShot sd500powershot powershot WhitespaceTokenizer WordDelimiterFilter catenateWords=1 LowercaseFilter power-shot sd500 power-shotsd500 sd500powershot sd500powershot WhitespaceTokenizer WordDelimiterFilter catenateWords=0 LowercaseFilter Query Analysis A Match! Document Analysis Figure 4 : Search Relevancy (Seeley Y., Apache Solr, 2006)

29 What we’ve Covered Basic information about Solr Structure of Solr How to run Solr instance Adding, deleting, updating documents Make changes to the index Make a query and run it Use Solr admin interface

30 Other features of Solr Distributed search Numeric field statistic Search result clustering Function queries Boosting More Like This

31 Relation with IR Concepts Tokenization Scoring tf-idf(Lucene Class Similarity) Lucene Practical Scoring: Boosting – documents, queries Wildcard queries (te?t,test*, te*t) Clustering(result clustering via Carrot2) Lucene’s Conjunctive Search Algorithm uses skip pointers

32 Relation with IR Concepts Figure 5 : Chapter 7,Information Storage and Retrieval (Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze) Figure 6 : Chapter 1, Lucene In Action (Otis Gospodnetic and Erik Hatcher)

33 Video file:///C:/Users/Sethi/Documents/Camtasia%20Studi o/Apache-solr-team1/Apache-solr-team1.html file:///C:/Users/Sethi/Documents/Camtasia%20Studi o/Apache-solr-team1/Apache-solr-team1.html

34 Questions Any questions??? Are you ready for exercises???


Download ppt "Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010."

Similar presentations


Ads by Google