Presentation is loading. Please wait.

Presentation is loading. Please wait.

Word Up! Using Lucene for full-text search of your data set.

Similar presentations


Presentation on theme: "Word Up! Using Lucene for full-text search of your data set."— Presentation transcript:

1 Word Up! Using Lucene for full-text search of your data set

2 Full-text search Review of full-text search options Focus on Lucene Integrating Lucene with JPA/Hibernate

3 Full-text search options ‘LIKE’ queries SQL extensions Kludge with web search engine Kludge with web search appliance Embeddable search library

4 ‘LIKE’ queries

5 Simple, straightforward Fast, easy to implement Large result set Limited fuzziness (wildcard or regex)

6 Full-text search extensions No standard syntax (Sybase, MSSQL, DB2, etc. all different) Administrative overhead for text search indices Other limitations

7 Kludge with search engine External indexing/search software ht://Dig mnoGoSearch Sphinx Xapian Not necessarily pure Java Can be database-intensive Lag in updating search index

8 Kludge with search appliance “Black-box” solutions Thunderstone Google Search Appliance Your data set mixes with public content Doesn’t always work as advertised Can’t fine-tune search

9 Embeddable search library

10 Search library Example: Apache Lucene Deploys as part of your application 100% Java Fuzzy full-text search (Levenshtein algorithm) Searches against text, numeric, boolean fields with multiple options Can be integrated with JPA/Hibernate via Hibernate Search, Compass

11 About Lucene Search index stored on file system (also JDBC and BDB options) Can store/retrieve data to/from search index (Lucene Projections) Can index HTML, XML, Office docs, PDFs, Exchange mail with external tools Supports extended and multi-byte character sets by default

12 More about Lucene Indexes records as Lucene Document object Lucene Document doesn’t have to be a literal document – can be any arbitrary object Document can have any number of name- value pairs Synchronizing your data with search index is someone else’s problem …

13 Integrating with JPA / Hibernate Most common method: Hibernate Search Supports only Hibernate provider Automatically updates search index when object persisted to database Entity classes mapped to separate indexes Entity fields mapped to Lucene index fields using Java annotations

14 Integrating with JPA/Hibernate … Alternate method: Compass Project Supports Hibernate, OpenJPA, others No release since 2009 – effectively unsupported

15 Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @Indexed – tells Hibernate that this entity class should be indexed

16 Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @Field – tells Hibernate to create a matching name-value pair in the search index for this entity class Store.YES – stores the value for retrieval directly from the index, without touching the database

17 Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @NumericField – index as a numeric value, enables greater than / less than / range searches

18 Let’s take a Luke at the index …

19 Practical search exercise

20 Questions!

21


Download ppt "Word Up! Using Lucene for full-text search of your data set."

Similar presentations


Ads by Google