Presentation is loading. Please wait.

Presentation is loading. Please wait.

CQL – a Common Query LanguageMike Taylor CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4. Applications 5. Implementation.

Similar presentations


Presentation on theme: "CQL – a Common Query LanguageMike Taylor CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4. Applications 5. Implementation."— Presentation transcript:

1 CQL – a Common Query LanguageMike Taylor CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4. Applications 5. Implementation

2 CQL – a Common Query LanguageMike Taylor Chapter 1: What CQL is CQL is a query language: – For humans to type – For query forms to generate – For translating other languages into The only query language of SRW/SRU Also applicable in other contexts: – Z39.50 (instead of the Type-1 Query) – Query boxes for web searches

3 CQL – a Common Query LanguageMike Taylor Chapter 2: Motivation Most query languages fall into one of two camps: Complex and powerful, but cryptic and hard to learn – SQL, Prefix Query Format (PQF), XML Query Easy to learn and use, but lacking in power – Google, AltaVista, CCL CQL aims to make simple queries easy, and complex queries possible (to paraphrase Larry Wall, of Perl)

4 CQL – a Common Query LanguageMike Taylor Learning curves for query languages Power of query that can be expressed Effort in learning query language SQL

5 CQL – a Common Query LanguageMike Taylor Learning curves for query languages Power of query that can be expressed Effort in learning query language SQL Google

6 CQL – a Common Query LanguageMike Taylor Learning curves for query languages Power of query that can be expressed Effort in learning query language SQL Google CQL

7 CQL – a Common Query LanguageMike Taylor Chapter 3: Examples and explanation Important concepts Simple terms Quoting Booleans Parentheses Pattern matching Word anchoring Indexes Prefixes Context sets Relations Esoteric concepts Proximity Relation modifiers Boolean modifiers Prefix mapping

8 CQL – a Common Query LanguageMike Taylor CQL features: simple terms Here are some perfectly good CQL queries: fish Churchill dinosaur comp.sources.misc

9 CQL – a Common Query LanguageMike Taylor CQL features: quoting Double-quote marks remove the special meanings of special characters like space (which otherwise separates tokens) and of keywords such as and and or. "dinosaur" "the complete dinosaur" "ext–>u.generic" "and" "the \"nuxi\" problem" (Backslash removes the special meaning of following double-quote characters.)

10 CQL – a Common Query LanguageMike Taylor CQL features: booleans The keywords and and or are boolean operators. The keyword not is an and-not binary operator. There is no unary negation operator. Case is not significant, so AND and aNd also work. dinosaur or bird dinosaur not reptile dinosaur and bird and reptile dinosaur and bird or dinobird dinosaur not theropod not ornithischian

11 CQL – a Common Query LanguageMike Taylor CQL features: boolean precedence The and, or and not booleans all have equal precedence and are evaluated left-to-right. dinosaur and bird or dinobird MEANS (dinosaur and bird) or dinobird dinosaur or bird and dinobird MEANS (dinosaur or bird) and dinobird NOT dinosaur or (bird and dinobird)

12 CQL – a Common Query LanguageMike Taylor CQL features: parentheses Parentheses may be used to override the default left-to-right parsing of boolean operators. dinosaur and (bird or dinobird) dinosaur or (bird and dinobird) (bird or dinosaur) and (feathers or scales) "feathered dinosaur" and (yixian or jehol) (((a and b) or (c not d) not (e or f and g)) and h not i) or j

13 CQL – a Common Query LanguageMike Taylor CQL features: pattern matching There are two pattern-matching characters: * matches any number of characters ? matches any single character A preceding backslash removes their special meaning. dinosaur*– matches dinosaurs, dinosauria *sauria– matches dinosauria, carnosauria man?raptor– matches maniraptor, manuraptor man?raptor*– matches the plurals of these "the comp*saur"– matches the complete dinosaur char\*– matches literal char*

14 CQL – a Common Query LanguageMike Taylor CQL features: word anchoring A word beginning with ^ must occur at the start of its field. A word ending with ^ must occur at the end of its field. dinosaur– matches the complete dinosaur dinosaur^– also matches ^dinosaur– does not match the– matches the complete dinosaur ^the– also matches the^– does not match

15 CQL – a Common Query LanguageMike Taylor CQL features: indexes A term of the form name=value is a query for the specified value occurring within the named index. title=Churchill– finds biographies of Churchill author=Churchill– finds books written by him title=dinosaur and author=farlow title=(dinosaur and bird) subject=(dinosaur* or pterosaur*) Index names are case-insensitive, so title is the same index as TITLE, Title or tiTLe.

16 CQL – a Common Query LanguageMike Taylor CQL features: prefixes The meaning of an index can be specified more fully by a prefix indicating what context set it is from. The meaning of title is different in cross-domain searching (Dublin Core), bibliographic searching (Bath Profile) and heraldry. dc.title="the complete dinosaur" property.title=freehold heraldry.title=(viscount or duke) cql.serverChoice=fruit cql.resultSet=YXJjaGJpc2hvcAp Prefixes are case-insensitive.

17 CQL – a Common Query LanguageMike Taylor CQL features: context sets A context set is a set of indexes that are related to a particular area (plus some other more esoteric stuff that you can ignore). For example, the Dublin Core context set contains indexes for searching against the fifteen DC elements: title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights. The context set prose must define their semantics.

18 CQL – a Common Query LanguageMike Taylor CQL features: some context sets A few core sets created by the SRW editorial board: CQL – for core indexes such as resultSet DC – for metadata searching with Dublin Core Rec – metadata about the record, not the resource Net – network concepts such as hostname and port Also, many application-specific sets: Bath, Zthes, CCG, Music Rel – deep voodoo for relevance matching GILS is in development Where do context sets come from? You can just make them up! No-one can stop you!

19 CQL – a Common Query LanguageMike Taylor A digression on the CQL context set The CQL context set is special. It contains some magic indexes: cql.anywhere – searches in all the indexes available cql.serverChoice – allows the server to choose whatever index or indexes are suitable cql.resultSetId – finds the records obtained in a previous search, e.g. for refinement by combining with other query terms.

20 CQL – a Common Query LanguageMike Taylor CQL features: relations Usually = connects an index with its relation, but all the other obvious numeric relations are supported: Height = 13 numberOfWheels <= 3 numberOfPlates = 18 lengthOfFemur > 2.4 BioMass >= 100 NumberOfToes <> 3(inequality)

21 CQL – a Common Query LanguageMike Taylor CQL features: special relations The keywords any and all can be used as relations, indicating that any one of, or all of, the words specified in the term must be found in the index: author all "kernighan ritchie" – shorthand for author=kernighan and author=ritchie author any "kernighan ritchie thompson" – shorthand for author=kernighan or author=ritchie or author=thompson

22 CQL – a Common Query LanguageMike Taylor CQL features: esoterica You are not expected to understand this. – comment in the Unix Version 7 source code. The point is that new users are not required to understand this, and may happily use CQL for many years – perhaps forever – without needing to.

23 CQL – a Common Query LanguageMike Taylor CQL esoterica: proximity The prox boolean, by default, requires its operands to be next to each other, in either order: cervical prox vertebra – equivalent to "cervical vertebra" or "vertebra cervical" (cervical or dorsal) prox vertebra – equivalent to "cervical vertebra" or "dorsal vertebra" or "vertebra cervical" or "vertebra dorsal"

24 CQL – a Common Query LanguageMike Taylor CQL esoterica: proximity II Modifiers can generalise the semantics of proximity: cervical prox/distance<=5/ vertebrae – within five words of each other cervical prox/distance=0/unit=sentence vertebrae – within the same sentence cervical prox/distance>0/unit=paragraph vertebrae – in different paragraphs cervical prox/ordered vertebrae – in the specified order: exactly equivalent to "cervical vertebra"

25 CQL – a Common Query LanguageMike Taylor CQL esoterica: relation modifiers Modifiers can refine the semantics of relations: title =/stem dig – finds dig, digging, dug, etc. title any/relevant "dinosaur bird reptile" – finds sauropods, avian, crocodile, snake, etc. author =/fuzzy tailor – finds Mike Taylor phoneNumber exact/fuzzy " " – finds

26 CQL – a Common Query LanguageMike Taylor CQL esoterica: relation modifiers II Relation modifiers can be overloaded to specify extra information about the term that the relation joins to the index: createdDate >/isoDate " :45:00" – the term is in ISO 8601 format. Location within/geom.polygon "(12,46) (15,52)" – the term indicates a polygon of two points (i.e. a straight line) rather than the corners of a rectangle.

27 Modifiers can refine the semantics of boolean operators. We've already seen some examples of this in proximity. cervical prox/distance<=5/ vertebrae – within five words of each other cervical or/exclusive vertebrae – one or the other, but not both. "denenberg or/rel.mean "information retrieval" "denenberg or/rel.sum "information retrieval" "denenberg or/rel.max "information retrieval" – average, total or maximum relevance of operands CQL – a Common Query LanguageMike Taylor CQL esoterica: boolean modifiers

28 So far, we have been free and easy with index prefixes such as dc. But how do we know what they mean? Why should dc mean Dublin Core rather than Deep Custard? dc.custardDepth <= 20 Why should bath mean the Bath Profile for bibliographic searching instead of plumbing supplies? bath.capacityInGallons > 45 CQL – a Common Query LanguageMike Taylor CQL esoterica: prefix mapping

29 Prefixes are just convenient, easy-to-type abbreviations. The real identifier of a context set is its URI. For example, the Dublin Core context set is info:srw/cql-context-set/1/dc-v1.1 but we map that URI to a prefix for convenience. This is exactly like XML namespaces: they are identified by URIs, but the URIs do not appear in the names of elements or attributes: short prefixes are used instead. CQL – a Common Query LanguageMike Taylor CQL esoterica: prefix mapping II

30 In XML, a prefix is associated with a namespace using: In CQL, a prefix is associated with a namespace using: >prefix=http://example.org/xyz/ and the rest of the query follows. The following queries are exactly equivalent: >dc=info:srw/cql-context-set/1/dc-v1.1 dc.title=fish >yx=info:srw/cql-context-set/1/dc-v1.1 yx.title=fish Most applications will have established default mappings. CQL – a Common Query LanguageMike Taylor CQL esoterica: prefix mapping III

31 It is possible to establish the context set from which indexes with no explicit prefix are taken by omitting the prefix= part from the mapping: >http://example.org/heraldry/ title=baron and side=sinister So the following queries are exactly equivalent: >info:srw/cql-context-set/1/dc-v1.1 title=fish >yx=info:srw/cql-context-set/1/dc-v1.1 yx.title=fish CQL – a Common Query LanguageMike Taylor CQL esoterica: prefix mapping IV

32 Finally... Finally! :-) Prefix mappings can be stacked up: >dc = info:srw/cql-context-set/1/dc-v1.1 >bath=http://zing.z3950.org/cql/bath/2.0/ >rec=info:srw/cql-context-set/2/rec-1.0 rec.created < and dc.title=ecology and bath.conferenceName=dinosaur (Yes, this is all one query.) CQL – a Common Query LanguageMike Taylor CQL esoterica: prefix mapping V

33 Don't try this at home. CQL – a Common Query LanguageMike Taylor CQL esoterica: prefix mapping VI

34 CQL – a Common Query LanguageMike Taylor Chapter 4: Applications CQL has been deployed in many kinds of application: Google-like structureless searching Simple metadata searching with the Dublin Core Bath Profile for bibliographic data Zthes profile for hierarchical thesaurus navigation CCG for collectable card games Music – musicalKey, arranger, duration, etc. GILS (Global Information Locator Service)... your application goes here!

35 CQL – a Common Query LanguageMike Taylor Chapter 5: Implementations There are good-quality free CQL implementations in several important languages: Java (Mike Taylor's CQL-Java package) C/C++ (Adam Dickmeiss in Index Data's YAZ) Python (Rob Sanderson in Cheshire) Perl (Ed Summers' CQL::Parser module) Visual Basic is in development (Thomas Habing)... your language goes here!

36 CQL – a Common Query LanguageMike Taylor Conclusion: What to take home CQL makes easy queries easy and hard ones possible You can use it well without learning the hard bits It is used in SRW/SRU but also applicable elsewhere It is extensible through context sets Existing context sets support lots of applications There are free implementations in several languages Tutorial on-line at:

37 CQL – a Common Query LanguageMike Taylor CQL esoterica: relation modifiers II Relation modifiers can be used to define essentially new relations. Some hypothetical examples: location /proj.prerequisite uiDesign – tasks that must be performed before the design of the user interface location =/geography.sameState "Las Vegas" – places in the same state as Las Vegas


Download ppt "CQL – a Common Query LanguageMike Taylor CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4. Applications 5. Implementation."

Similar presentations


Ads by Google