? CQL – a Common Query LanguageMike Taylor CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4.

Slides:



Advertisements
Similar presentations
CIDOC 2000 Using GEM Metadata to Access Education Resources Nancy Virgil Morgan Coordinator
Advertisements

Dublin Core for Digital Video: Overview of the ViDe Application Profile.
EPrints Web Configuratio n Management. SQL database Web server Scripts to configure repository activities Configuration files EPrints - the Administrator's.
OCLC Online Computer Library Center SRW & DSpace Ralph LeVan OCLC Research.
CQL – a Common Query LanguageMike Taylor CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4. Applications 5. Implementation.
Searching very large bodies of data using a transparent peer-to-peer proxy Mike Taylor and Marc Cromme, Index Data
CQL – a Common Query LanguageMike Taylor Implementing SRW/U and CQL: Tools 1. Implementing a simple SRU client 2. Implementing serious SRW and SRU clients.
Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data
?! Advanced CQL and ProfilingMike Taylor Advanced CQL and Profiling 1. Esoteric CQL features: – Word Anchoring – Proximity – Relation.
ZeeRex – an Explain Mechanism for SRW/UMike Taylor ZeeRex – an Explain Mechanism for SRW/U 1. What ZeeRex is 2. How we got where we.
Distributed Service Registries Workshop, July 2005 Slide 1 NISO Metasearch Initiative Registries Robert Sanderson Dept. of Computer Science University.
UKOLN, University of Bath
Canada The Bath Profile and The Journey To Interoperability Carrol D Lunau Bath Profile Maintenance Agency July 7, 2003
When worlds collide Metasearching meets central indexes Mike Taylor – Index Data –
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
RDF Tutorial.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
CS 430 / INFO 430 Information Retrieval
CQL “Common Query Language” Ray Denenberg March 2005.
Information Retrieval in Practice
Search Engines and Information Retrieval
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
1 CS 430 / INFO 430 Information Retrieval Lecture 7 String Processing.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
JavaScript, Third Edition
Overview of Search Engines
With Windows 7 Comprehensive© 2012 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Windows 7 Comprehensive.
A Lightweight Approach To Support of Resource Discovery Standards The Problem Dublin Core is an international standard for resource discovery metadata.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Search Engines and Information Retrieval Chapter 1.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
CIS Computer Programming Logic
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Z39.50 for Finding It All William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton,
Computer Science 101 Introduction to Programming.
A Web Services Approach for Search and Retrieve The Next Generation Z39.50 Access 2004, October 13-16, 2004, Halifax, Nova Scotia William E. Moen School.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Information Retrieval CSE 8337 Spring 2007 Query Languages & Matching Material for these slides obtained from: Modern Information Retrieval by Ricardo.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
The Internet 8th Edition Tutorial 4 Searching the Web.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Metadata for the Web Andy Powell UKOLN University of Bath
Instructor: Craig Duckett Lecture 08: Thursday, October 22 nd, 2015 Patterns, Order of Evaluation, Concatenation, Substrings, Trim, Position 1 BIT275:
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
CNI, 4th April 2006 Slide 1 Key Standards Update: SRU (“Technical” Details) Dr. Robert Sanderson Dept. of Computer Science University of Liverpool
Saving the world through the wonder that is >>> CQL
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Technology for E-commerce Helena Ahonen-Myka. In this part... n search tools n metadata n personalization n collaborative filtering n data mining.
SRW/U: Re-Introduction SRW is a Web Services based Information Retrieval Protocol Motivations: Create an easy to implement protocol with the power of Z39.50.
Next Generation Z39.50 A Web Services Approach for Search and Retrieve 6 th Annual State GILS Conference, March 31 – April 3, 2004, Raleigh, NC William.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Z39.50 and the ZING Initiatives: MAVIS Users Conference, 2003 November 6, 2003 Larry E. Dixson Library of Congress.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Searching the Web for academic information Ruth Stubbings.
Building Search Systems for Digital Library Collections
OUTLINE Basic ideas of traditional retrieval systems
Searching for and Accessing Information
CS 430 / INFO 430 Information Retrieval
IL Step 3: Using Bibliographic Databases
Attributes and Values Describing Entities.
Information Retrieval and Web Design
Presentation transcript:

? CQL – a Common Query LanguageMike Taylor CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4. Applications 5. Implementation

CQL – a Common Query LanguageMike Taylor Chapter 1: What CQL is CQL is a query language: – For humans to type – For query forms to generate – For translating other languages into

CQL – a Common Query LanguageMike Taylor Chapter 1: What CQL is CQL is a query language: – For humans to type – For query forms to generate – For translating other languages into The only query language of SRW/SRU

CQL – a Common Query LanguageMike Taylor Chapter 1: What CQL is CQL is a query language: – For humans to type – For query forms to generate – For translating other languages into The only query language of SRW/SRU Also applicable in other contexts: – Z39.50 (instead of the Type-1 Query) – Vendor-neutral format for Metasearch

CQL – a Common Query LanguageMike Taylor Specifications and implementations CQL is a specification for expressing queries abstractly. – you don't need to know the database schema.

CQL – a Common Query LanguageMike Taylor Specifications and implementations CQL is a specification for expressing queries abstractly. – you don't need to know the database schema. It has to be parsed by a CQL parser. – parser produces a form easy to program with.

CQL – a Common Query LanguageMike Taylor Specifications and implementations CQL is a specification for expressing queries abstractly. – you don't need to know the database schema. It has to be parsed by a CQL parser. – parser produces a form easy to program with. It has to be executed by some specific database engine. – implementations will vary in what they support.

CQL – a Common Query LanguageMike Taylor Chapter 2: Motivation Most query languages fall into one of two camps: Complex and powerful, but cryptic and hard to learn – SQL, Prefix Query Format (PQF), XML Query

CQL – a Common Query LanguageMike Taylor Chapter 2: Motivation Most query languages fall into one of two camps: Complex and powerful, but cryptic and hard to learn – SQL, Prefix Query Format (PQF), XML Query Easy to learn and use, but lacking in power – Google, AltaVista, CCL

CQL – a Common Query LanguageMike Taylor Chapter 2: Motivation Most query languages fall into one of two camps: Complex and powerful, but cryptic and hard to learn – SQL, Prefix Query Format (PQF), XML Query Easy to learn and use, but lacking in power – Google, AltaVista, CCL CQL aims to make simple queries easy, and complex queries possible (to paraphrase Larry Wall, of Perl)

CQL – a Common Query LanguageMike Taylor Learning curves for query languages Power of query that can be expressed Effort in learning query language SQL

CQL – a Common Query LanguageMike Taylor Learning curves for query languages Power of query that can be expressed Effort in learning query language SQL Google

CQL – a Common Query LanguageMike Taylor Learning curves for query languages Power of query that can be expressed Effort in learning query language SQL Google CQL

CQL – a Common Query LanguageMike Taylor Chapter 3: Examples and explanation Core concepts Simple terms Quoting Booleans Parentheses Pattern matching Indexes Prefixes Context sets Relations

CQL – a Common Query LanguageMike Taylor Chapter 3: Examples and explanation Core concepts Simple terms Quoting Booleans Parentheses Pattern matching Indexes Prefixes Context sets Relations Esoteric concepts (Next session!) Word anchoring Proximity More on relations Relation modifiers Boolean modifiers Profiles Prefix mapping

CQL – a Common Query LanguageMike Taylor CQL features: simple terms Here are some perfectly good CQL queries: fish Churchill dinosaur comp.sources.misc

CQL – a Common Query LanguageMike Taylor CQL features: quoting Double-quote marks remove the special meanings of special characters like space (which otherwise separates tokens) and of keywords such as and and or. "dinosaur" "the complete dinosaur" "ext–>u.generic" "and"

CQL – a Common Query LanguageMike Taylor CQL features: quoting Double-quote marks remove the special meanings of special characters like space (which otherwise separates tokens) and of keywords such as and and or. "dinosaur" "the complete dinosaur" "ext–>u.generic" "and" (Backslash removes the special meaning of following double-quote characters.) "the \"nuxi\" problem"

CQL – a Common Query LanguageMike Taylor CQL features: booleans The keywords and and or are boolean operators. The keyword not is an and-not binary operator. There is no unary negation operator. Case is not significant, so AND and aNd also work. dinosaur or bird dinosaur not reptile dinosaur and bird and reptile dinosaur and bird or dinobird dinosaur not theropod not ornithischian

CQL – a Common Query LanguageMike Taylor CQL features: boolean precedence The and, or and not booleans all have equal precedence and are evaluated left-to-right. dinosaur and bird or dinobird MEANS (dinosaur and bird) or dinobird dinosaur or bird and dinobird MEANS (dinosaur or bird) and dinobird NOT dinosaur or (bird and dinobird)

CQL – a Common Query LanguageMike Taylor CQL features: parentheses Parentheses may be used to override the default left-to-right parsing of boolean operators. dinosaur and (bird or dinobird) dinosaur or (bird and dinobird) (bird or dinosaur) and (feathers or scales) "feathered dinosaur" and (yixian or jehol) (((a and b) or (c not d) not (e or f and g)) and h not i) or j

CQL – a Common Query LanguageMike Taylor CQL features: pattern matching There are two pattern-matching characters: * matches any number of characters ? matches any single character dinosaur*– matches dinosaurs, dinosauria *sauria– matches dinosauria, carnosauria man?raptor– matches maniraptor, manuraptor man?raptor*– matches the plurals of these "comp* *saur"– matches complete dinosaur

CQL – a Common Query LanguageMike Taylor CQL features: pattern matching There are two pattern-matching characters: * matches any number of characters ? matches any single character dinosaur*– matches dinosaurs, dinosauria *sauria– matches dinosauria, carnosauria man?raptor– matches maniraptor, manuraptor man?raptor*– matches the plurals of these "comp* *saur"– matches complete dinosaur A preceding backslash removes their special meaning. char\*– matches literal char*

CQL – a Common Query LanguageMike Taylor CQL features: indexes A term of the form name=value is a query for the specified value occurring within the named index.

CQL – a Common Query LanguageMike Taylor CQL features: indexes A term of the form name=value is a query for the specified value occurring within the named index. title=Churchill– finds biographies of Churchill author=Churchill– finds books written by him title=dinosaur and author=farlow title=(dinosaur and bird) subject=(dinosaur* or pterosaur*) Index names are case-insensitive, so title is the same index as TITLE, Title or tiTLe.

CQL – a Common Query LanguageMike Taylor CQL features: prefixes The meaning of an index can be specified more fully by a prefix indicating what context set it is from. The meaning of title is different in cross-domain searching (Dublin Core), bibliographic searching (Bath Profile) and heraldry.

CQL – a Common Query LanguageMike Taylor CQL features: prefixes The meaning of an index can be specified more fully by a prefix indicating what context set it is from. The meaning of title is different in cross-domain searching (Dublin Core), bibliographic searching (Bath Profile) and heraldry. dc.title="the complete dinosaur" property.title=freehold heraldry.title=(viscount or duke) cql.serverChoice=fruit cql.resultSet=YXJjaGJpc2hvcAp Prefixes are case-insensitive.

CQL – a Common Query LanguageMike Taylor CQL features: context sets A context set is a set of indexes that are related to a particular area (plus some other more esoteric stuff that you can ignore). For example, the Dublin Core context set contains indexes for searching against the fifteen DC elements: title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights. The context set prose must define their semantics.

CQL – a Common Query LanguageMike Taylor CQL features: some context sets A few core sets created by the SRW editorial board: CQL – for core indexes such as resultSetId DC – for metadata searching with Dublin Core Rec – metadata about the record, not the resource Net – network concepts such as host-name and port

CQL – a Common Query LanguageMike Taylor CQL features: some context sets A few core sets created by the SRW editorial board: CQL – for core indexes such as resultSetId DC – for metadata searching with Dublin Core Rec – metadata about the record, not the resource Net – network concepts such as host-name and port Also, many application-specific sets: Bath, Zthes, CCG, Music Rel – deep voodoo for relevance matching GILS and GEO are in development

CQL – a Common Query LanguageMike Taylor A digression on the CQL context set The CQL context set is special. It contains some magic indexes:

CQL – a Common Query LanguageMike Taylor A digression on the CQL context set The CQL context set is special. It contains some magic indexes: cql.anywhere – searches in all the indexes available cql.serverChoice – allows the server to choose whatever index or indexes are suitable cql.resultSetId – finds the records obtained in a previous search, e.g. for refinement by combining with other query terms.

CQL – a Common Query LanguageMike Taylor CQL features: relations Usually = connects an index with its relation, but all the other obvious numeric relations are supported: Height = 13 numberOfWheels <= 3 numberOfPlates = 18 lengthOfFemur > 2.4 BioMass >= 100 NumberOfToes <> 3(inequality)

CQL – a Common Query LanguageMike Taylor CQL features: special relations The keywords any and all can be used as relations, indicating that any one of, or all of, the words specified in the term must be found in the index:

CQL – a Common Query LanguageMike Taylor CQL features: special relations The keywords any and all can be used as relations, indicating that any one of, or all of, the words specified in the term must be found in the index: author all "kernighan ritchie" – shorthand for author=kernighan and author=ritchie

CQL – a Common Query LanguageMike Taylor CQL features: special relations The keywords any and all can be used as relations, indicating that any one of, or all of, the words specified in the term must be found in the index: author all "kernighan ritchie" – shorthand for author=kernighan and author=ritchie author any "kernighan ritchie thompson" – shorthand for author=kernighan or author=ritchie or author=thompson

CQL – a Common Query LanguageMike Taylor CQL features: whole-field searching The keywords exact can be used as a relation, indicating a search for the value of a whole field rather than words within it:

CQL – a Common Query LanguageMike Taylor CQL features: whole-field searching The keywords exact can be used as a relation, indicating a search for the value of a whole field rather than words within it: title=jaws – finds Jaws and The Jaws of Fate. title exact jaws – finds Jaws but NOT The Jaws of Fate.

CQL – a Common Query LanguageMike Taylor CQL features: whole-field searching The keywords exact can be used as a relation, indicating a search for the value of a whole field rather than words within it: title=jaws – finds Jaws and The Jaws of Fate. title exact jaws – finds Jaws but NOT The Jaws of Fate. title exact "The Jaws of Fate" – finds The Jaws of Fate but NOT Jaws.

CQL – a Common Query LanguageMike Taylor Chapter 4: Applications CQL has been deployed in many kinds of application: Google-like structureless searching Simple metadata searching with the Dublin Core Bath Profile for bibliographic data Zthes profile for hierarchical thesaurus navigation CCG for collectable card games Music – musicalKey, arranger, duration, etc. GILS (Global Information Locator Service)... your application goes here!

CQL – a Common Query LanguageMike Taylor Chapter 5: Implementations There are good-quality free CQL implementations in several important languages: Java (Mike Taylor's CQL-Java package) C/C++ (Adam Dickmeiss in Index Data's YAZ) Python (Rob Sanderson in Cheshire) Perl (Ed Summers' CQL::Parser module) Visual Basic is in development (Thomas Habing)... your language goes here!

CQL – a Common Query LanguageMike Taylor Conclusion: What to take home CQL makes easy queries easy and hard ones possible You can use it well without learning the hard bits It is used in SRW/SRU but also applicable elsewhere It is extensible through context sets Existing context sets support lots of applications There are free implementations in several languages Tutorial on-line at: