Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lucene/SOLR 2: Lucene search API TU Delft Library Digitale Productontwikkeling Egbert Gramsbergen voorgerecht: Searcher, Term, Sort, Filter hoofdgerecht:

Similar presentations


Presentation on theme: "Lucene/SOLR 2: Lucene search API TU Delft Library Digitale Productontwikkeling Egbert Gramsbergen voorgerecht: Searcher, Term, Sort, Filter hoofdgerecht:"— Presentation transcript:

1 Lucene/SOLR 2: Lucene search API TU Delft Library Digitale Productontwikkeling Egbert Gramsbergen voorgerecht: Searcher, Term, Sort, Filter hoofdgerecht: Query, Similarity, QueryParser toetje: Hits, Highlighter, SpellChecker

2 org.apache.lucene.search. Searcher Searcher * doc docFreq explain search getSimilarity setSimilarity +lower level methods (performance tuning) Document Term ([]) Query Explanation Hits Filter Sort Similarity int i int ([]) int doc class methods constructor argument --- return value --> Verbasterd UML class diagram optional... Document int i

3 org.apache.lucene.search. Searcher Searcher IndexSearcher * Directory IndexReader String path MultiSearcher * Searcheable [] ParallelMultiSearcher * FilterIndexReader MultiReader ParallelReader FSDirectory RAMDirectory DbDirectory JEDirectory RemoteSearcheable

4 org.apache.lucene.index. Term Term * createTerm field text compareTo String field String text int Gebruik: o.a. bouwsteen van Query en Filter

5 org.apache.lucene.search. Sort Sort * * setSort String field boolean reverse SortField int AUTO, CUSTOM, DOC, SCORE, INT, LONG, FLOAT, DOUBLE, STRING * ([]) String field boolean reverse int type SortComparatorSource Locale * String language String country String variant setSort getSort ([]) [] N.B. Lucene kent geen strongly typed fields, SOLR wel

6 org.apache.lucene.search. Filter Filter TermsFilter * addTerm BooleanFilter ChainedFilter DuplicateFilter PrefixFilter QueryWrapperFilter RangeFilter SpanFilter CachingWrapperFilter Term more… gebruik: bijv. in faceted search voorbeeld:

7 org.apache.lucene.search. Query Query TermQuery MultiTermQuery BooleanQuery WildcardQuery PhraseQuery PrefixQuery MultiPhraseQuery FuzzyQuery RegexQuery SpanFirstQuery SpanNearQuery SpanNotQuery SpanOrQuery SpanRegexQuery SpanTermQuery BoostingTermQuery FieldScoreQuery RangeQuery SpanQuery BoostingQuery ConstantScoreQuery ConstantScoreRangeQuery DisjunctionMaxQuery FilteredQuery FuzzyLikeThisQuery MatchAllDocsQuery ValueSourceQuery MoreLikeThisQuery CustomScoreQuery

8 org.apache.lucene.search. Query TermQuery * getTerm Query setBoost getBoost rewrite Float boost PhraseQuery * add getTerms setSlop Term IndexReader int slop int position [ ]

9 org.apache.lucene.search. BooleanQuery BooleanQuery * add getClauses setMinimumNumberShouldMatch boolean disableCoord BooleanClause * BooleanClause.Occur int MUST, MUST_NOT, SHOULD Query int [ ]  and/or-ish query //example BooleanQuery bq; float andNess = 0.5; // 0.:OR(default), 1.:AND … BooleanClause[] clauses = bq.getClauses(); int numOpt = 0; for (int 1 = 0; i

10 org.apache.lucene.search.tunction. CustomScoreQuery CustomScoreQuery * customScore Query ValueSourceQuery ([]) FieldScoreQuery * FieldScoreQuery.Type int BYTE, SHORT, INT, FLOAT String field int doc float subQueryScore float([]) valSrcScore(s) float override Use cases: * Meewegen pub. type+jaar (bibliotheek) * Geografische nabijheid (search “pizza”) Default: subQueryScore * valSrcScores[0] * valSrcScores[1] * … Pub.jaar: score = 1+a/(1+τ), τ=(t-t p )/t 0 a 1 t 0 t-t p

11 org.apache.lucene.search. Similarity Hier wordt het echte werk verricht: org/apache/lucene/search/Similarity.html Query, Document  Score volgens Vector Space model

12 org.apache.lucene.queryParser. QueryParser String  Query (hoera!) ::=def. ()nesting *repetition []optional |or | | | | | Query ::= ( Clause )* | | Clause ::= ["+"|"-"] [ ":"] ( | "(" Query ")" ) | | | | | AND NOT field | nested query single term or phrase Voorbeelden: aaa bbb cccyear:[2000 TO 2005] (inclusive) +aaa bbb –cccprice:{020 TO 100} (not inclusive) "aaa bbb" aaa^3 bbb (boost) title:aaa"aaa bbb"^0.5 title:(+aaa bbb) AND author:"ddd e f" 1/+1 (/ escape char) aaa* bb*b cc?c aaa~0.8 (fuzzy/min.similarity) "aaa bbb"~10 (proximity/slop)  Strings: 20<100 Lucene: alleen Strings SOLR: strongly typed fields!  NIET: "aaa* bbb"  NIET: *aaa, ?aaa gaat ook nog door Analyzer

13 org.apache.lucene.queryParser. QueryParser Niet iedere Query kan door QueryParser worden gemaakt (te ingewikkeld of bescherming performance) “New Yor*” *ork “New York” binnen 10 woorden afstand van “Broadway” en max. 5 woorden na het begin van het veld Niet iedere Query wil door QueryParser worden gemaakt Doe aan Interface ontwerp, bijv. * vrije text invoer (geQueryParsed) * aparte interface elementen voor: * velden * ranges * facetten, more like this, …

14 org.apache.lucene.queryParser. QueryParser QueryParser * parse setDefaultOperator setPhraseSlop setFuzzyMinSim … String defaultField Analyzer String query Query DutchAnalyzer * File stopwords String[] stopwords HashSet stopwords QueryParser.Operator AND_OPERATOR, OR_OPERATOR BrazilianAnalyzer StandardAnalyzer RussianAnalyzer … float int

15 org.apache.lucene.search. Hits Hits doc score iterator length Searcher search Document get getFields … int n float score HitIterator next hasNext length Hit getDocument getScore boolean hasNext int length String fieldName String value List fields Field name getValue … N.B. gebruik HitCollector (low-level API) voor grote aantallen hits

16 org.apache.lucene.search.highlight. Highlighter Highlighter * setTextFragmenter getBestFragments … String fieldName Scorer (fragmentScorer) QueryScorer * Query IndexReader Formatter SimpleHTMLFormatter * SpanGradientFormatter * String preTag String postTag Float maxScore String minForegroundcolor String maxForegroundcolor String minBackgroundcolor String maxBackgroundcolor GradientFormatter SimpleFragmenter * Fragmenter int fragmentSize Analyzer String fieldName String text int maxNumFragments String[] bestFragments

17 org.apache.lucene.search.spell. SpellChecker SpellChecker * indexDictionary suggestSimilar setAccuracy … Directory (spellIndex) Dictionary LuceneDictionary * PlainTextDictionary * IndexReader File InputStream Reader String field boolean morePopular String word int numSug String[] words float minScore N-gram index


Download ppt "Lucene/SOLR 2: Lucene search API TU Delft Library Digitale Productontwikkeling Egbert Gramsbergen voorgerecht: Searcher, Term, Sort, Filter hoofdgerecht:"

Similar presentations


Ads by Google