Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will.

Similar presentations

Presentation on theme: "Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will."— Presentation transcript:

1 Practical Solr Guide for Developers

2 First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will be using Solr or text search technology in their upcoming projects?

3 Why am I here speaking to you about this? Several projects in 2011/2012 involving search technology One of most visited recipe sites un the US with 200,000 hits per hour during peak times Resource portal for worlds leading vendor of large format printers First encounter was with Lucene.NET which lead to Solr Second encounter with Solr on Azure Afterwards Jetty and Tomcat configurations Currently working on

4 Solr and Lucene Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full- text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java

5 Not Frictionless Java Complex configuration Still evolving documentation Too many brief tutorials

6 What we will talk about today. Getting up and running Setting up as service Importing data Spelling Stopwords, Synonyms, Elevate Facets Replication, Zoo Keeper (Cloud setup) Integration deep dives Etc.

7 Solr and Lucene

8 Web ClientsWeb Server Solr web application (Solr.war) Core1 (recipes) data-config.xml solrconfig.xml schema.xml CMS Bash/PowerShell etc. PHP Core2 (food articles) data-config.xml solrconfig.xml schema.xml Core3 (etc.) data-config.xml solrconfig.xml schema.xml Document Repositories

9 Solr Terminology Solr Core: Also referred to as just a "Core" This is a running instance of a Solr index along with all of its configuration (SolrConfigXml, SchemaXml, etc...). A single Solr application can contain 0 or more cores which are run largely in isolation but can communicate with each other if necessary via the CoreContainer. From a historical perspective: Solr initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality at the "core" of Solr. When support was added for creating and managing multiple Cores on the fly, the class was refactored to no longer be a Singleton, but the name stuck.SolrConfigXmlSchemaXmlcreating and managing multiple Cores Facet: A distinct feature or aspect of a set of objects; "a way in which a resource can be classified" (*)* Request Handler: A Solr component that processes requests. For example, the DisMaxRequestHandler processes search queries by calling the DisMax Query Parser. Request Handlers can perform other functions, as well.DisMaxRequestHandlerDisMax

10 Solr Terminology Solr Core: Searchable grouping of documents (index). E.g. Core 1 = Recipes Core 2 = Articles about Food Facet: categorisation Request Handler: Functional grouping under a URL, a lot like a route under PHP frameworks e.g /core1/search -> searches recipes /core1/importxml-> triggers importing from XML files

11 Starting Solr under 1 minute Requirements: Downloaded and unpackaged Solr JRE Installed 1.Via command line Navigate to /apache-solr-3.6.1/example 2.Run java -Dsolr.solr.home=multicore -jar start.jar * Also see README.txt in /apache-solr-3.6.1/example

12 Solr With Tomcat C:\Program Files\Apache Software Foundation\Tomcat 6.0\conf\Catalina\localhost

13 Files and Directories solr core0 conf schema.xml solrconfig.xml data-config.xml data core1 solr.xml

14 Tip: use sharedLib="global_libs attribute Other options: Solr web application settings, Define your cores here along a few global settings.

15 schema.xml Schema XML is there you describe your data. Lucene Field definitions with analysis chain Column names and their respective Lucene types Unique key Default search field Default operator (AND/OR) – being deprecated in the future FieldTypesIncludedwithSolr Gotcha: Multivalued fields cannot be sorted

16 Status file Managed by solr Contains import information such as last import etc. Contains core specific settings assigned by developer Settings can be passed to data import definition file and mycore.languagegroup=en mycore.filenamefilter=.*(en|eew|enw|eez|eep)\.(xml) In data config, these options can be retrieved as: ${ mycore.languagegroup } $ {mycore.filenamefilter} Etc.

17 Importing Gotcha: The XPathEntityProcessor implements a streaming parser which supports a subset of xpath syntax. Complete xpath syntax is not supported but most of the common use cases are covered as follows:- xpath="/a/b/c" Gotcha: SQL Timeouts From XML XML can originate in a single file, multiple files (same schema) or HTTP Solr with loop over common data nodes using its for-each mechanism From Database You will need a JDBC driver for your database Can run multiple queries with reference variables passed from one entity to another

18 JDBC Timeouts

19 Stop Words a an and are as at be but by for if in into is it no not of on or s such t that the their then there these they this to was will with Stop words list in /apache-solr-3.6.1/example/example-DIH/solr/solr/conf You can find more stopwords using schema browser

20 Spellcheck Solr will build a spell index from existing index Spell index will be a separate set of index files and its building needs to be triggered Spell index generation is called only once, do not call with every query rows=10&indent=on& Note: the which is needed only once to build the spellcheck index from the main Solr index. It takes time and should not be specified with each request. Note: Combine multiple fields into single spell field using Gotcha: solr.PorterStemFilterFactory

21 Faceting Just Facets: start=0&rows=5&indent=on&facet=true&facet.field=ProductScale&facet.field=Prod uctLine For predictive search: start=0&rows=0&indent=on&facet=true&facet.field=Keywords&facet.prefix=a More with Facets:

22 Transformers RegexTransformer ScriptTransformer DateFormatTransformer NumberFormatTransformer TemplateTransformer HTMLStripTransformer ClobTransformer LogTransformer

23 beefstew = Beef stew bring certain documents to the top based on query Synonyms Query Elevate

24 Documentation +Types#SolrFieldTypes-FieldTypesIncludedwithSolr +Types#SolrFieldTypes-FieldTypesIncludedwithSolr

25 Gotchas Form content type query-in-solr-select query-in-solr-select application/xml (not application/x-www-form-urlencoded) Mutlivalue fields cannot be sorted Dates (use date transformers) JDBC Timeouts Slow indexing with multiple database entities XPath Limitations Can you recreate your updates? Are you storing enough data?


27 Thank You! Radek Zajkowski

Download ppt "Practical Solr Guide for Developers. First…some questions. How many of you in the room know what Solr is? How many have worked with Solr? How many will."

Similar presentations

Ads by Google