Presentation is loading. Please wait.

Presentation is loading. Please wait.

Querying for Metadata 13th November 2013 Andy Hind, Alfresco.

Similar presentations


Presentation on theme: "Querying for Metadata 13th November 2013 Andy Hind, Alfresco."— Presentation transcript:

1 Querying for Metadata 13th November 2013 Andy Hind, Alfresco

2 What’s in the box?

3 Where is the box? Context matters Stuff may be added to your query

4 The bottom of the box … Java Search API Search sub-system
Query languages AFTS / CMIS QL Abstract form Index Engine SOLR/Lucene/Nothing Lucene/Xpath/… IN: Query + Context OUT: List of nodes you can see Abstract form can be used for other query engines Concepts: Logical operations, Exact match, term match, order, type 4.1 Alfresco/SOLR

5 The new tool in the box … Java Search API Search sub-system
Query languages AFTS / CMIS QL Abstract form DB Engine DB Index Engine SOLR/Lucene/Nothing Lucene/Xpath/… 4.2

6 1 2 A choice … Java Search API Search sub-system Query languages
AFTS / CMIS QL Abstract form DB Engine DB Index Engine SOLR/Lucene/Nothing Lucene/Xpath/… 1 2 DB Specific QL Index Specific QL Switching

7 DB SOLR Consistency DB – transactional - immediate consistency
Lucene one node Lucene in a cluster Special SOLR index DB engine Could replace canned queries Existing schema

8 CMIS QL FROM SELECT JOIN ORDER BY WHERE CMIS QL Key use case
Easy to define what is supported

9 CMIS QL: SOLR vs DB Main restrictions Index size Performance
Full text OR decimal boolean IN_TREE() DB Main restrictions Index size Performance Pain/Difficulty

10 Virtual tables Not cmis:item, cmis:policy, cmis:relationship
cmis:document cmis:folder cmis:secondary Not cmis:item, cmis:policy, cmis:relationship

11 Columns/data types Supported String <= 1024 integer id datetime Unsupported boolean decimal uri Html String > 1024 DB Boolean, long, float, double, string, BLOB Things not in the node properties table – UUID/ID, mimetype, content length Properties in general Properties on cmis:document and cmis:folder Mimetype Size String length - data dependent

12 Logical Operators Good AND Bad NOT ANY NOT ANY Ugly OR
AND – generally selective NOT – unselective ANY – implies multi-valued – could match multiple rows NOT ANY – will most likely match multiple rows OR – excluded – optimisation is difficult - consider all rows – SOLR good Semi-join (reduce the row count but can not reuse the join) Select and order

13 Predicates All that apply to the type as in the spec Comparison .. ANY
= <> < <= > >= All that apply to the type as in the spec

14 Predicates Clarity: Caution NOT, IS NULL LIKE – leading wildcards
ANY … (NOT) IN IN_FOLDER() IS (NOT) NULL LIKE (NOT) IN IN_TREE() SCORE() CONTAINS() Clarity: Caution NOT, IS NULL LIKE – leading wildcards

15 Ordering DB variation Beware large result sets
Don’t order IDs on the DB ATM

16 An Example … select * from cmis:document where cmis:name like '%e%' and cmis:createdBy in ('System', 'admin') and cmis:creationDate < TIMESTAMP ' T00:00:00.000Z' and cmis:lastModifiedBy not in ('me') and cmis:lastModificationDate > TIMESTAMP ' T00:00:00.000Z' and cmis:contentStreamLength > 2 and cmis:contentStreamFileName LIKE '_%' order by cmis:contentStreamLength DESC, cmis:creationDate ASC, cmis:name DESC Virtual Folders Be selective – specific type, PARENT, JOINs TO ASPECT, =

17 CMIS QL: SOLR vs DB Now we understand this a bit better SOLR Full text
OR decimal boolean IN_TREE() DB Now we understand this a bit better

18 The two are not the same …
SOLR DB The two are not the same … If both can do the same query the answer may not be the same Default DB order, index order, score order not a big effect but short matches better than long ones

19 Permissions In Query HIDDEN After Query

20 Localisation Case sensitivity – collation d:mltext - ignore locale
Localised order Case sensitivity d:mltext DB Collation Case sensitivity – collation d:mltext - ignore locale

21 Why I get up in the morning …
Impatient, occasional, technical, new to ECM, too busy for the training Fire: google docs broken out into Alfresco Semantic search in the future Generic query : Content: scoring AND Created:2012

22 Alfresco FTS + DB? Go though each then:
=name TYPE ASPECT PARENT AND PATH OR Implicit OR Go though each then: UI queries will not go to the DB Context - adds a PATH constraint Beware the implicit OR even if you put + in front of everything AFTS = for exact match IN: Term, phrase, prefix

23 Is it for you? SOLR OR Eventual FTS DB …. Restricted Now

24 system.metadata-query-indexes.ignored
Optional patch system.metadata-query-indexes.ignored true Upgrade No MDQ false Optional patch DB +25% MDQ ignored New Install Repeat to emphasise

25 Upgrade 4.0.2 -> 4.2 10M InnoDB 1 Hour, patch is 10 minutes
Extra indexes + 25% (21G – 25 ) BM: Performance impact of extra indexes minimal – may be a few %

26 Configuration Java API solr.query.fts.queryConsistency
solr.query.cmis.queryConsistency EVENTUAL TRANSACTIONAL TRANSACTIONAL_IF_POSSIBLE Java API

27

28 Is this a box of worms? Transfer Permission Ordering text
Performance Large result sets (~100k) Left outer join ORDER BY DB ≈ SOLR Transfer Permission Ordering text disk – RAM disk Subtle differences

29 Do share queries use the DB?
NO Context +PATH Implicit OR Node browser JAVA API Unadulterated search Chemistry/OpenCMIS Workbench

30 The mystery box … UPPER() LOWER() name:woof FTS Syntax db-cmis
Admin console

31

32 Summary SOLR Full text OR Float/double Boolean String > 1024
Structure DB

33 Permission evaluation
Future IN_TREE() SOLR 4+ Permission evaluation Performance More DBs Simple OR Date math Schema Hybrid I am not making any promises Structure SOLR

34

35


Download ppt "Querying for Metadata 13th November 2013 Andy Hind, Alfresco."

Similar presentations


Ads by Google