CQL Context Set
Version 1.1, 12th January 2004
Introduction
The CQL context set is reserved for features which are broadly applicable across multiple domains or protocols. It supplies a default set of indexes, relations and relation modifiers. The indexes supplied are 'utility' indexes which do not directly reference any data. These utility indexes are for instances when CQL is required to be able to express a concept that is not directly related to the records.
Historical note: In CQL version 1.0, this was the 'srw' index set. Implementers may wish to accept the 'srw' as a reserved name for the identifier 'http://www.loc.gov/zing/cql/srw-indexes/v1.0/' with the same semantics as below. srw.resultSetName has been renamed to cql.resultSetId for consistency.
The well known name for this context set is: cql
The identifier for this context set is: http://www.loc.gov/zing/cql/context-sets/cql/v1.1/
Indexes
- resultSetId
A search clause may be a result set name. This is a special case, where the index and relation are expressed as "cql.resultSetId =" and the term is the result set name returned by the server in the 'resultSetName' parameter of the response. It may be used by itself in a query to refer to an existing result set from which records are desired. It may also be used in conjunction with other resultSetName clauses or other indexes, combined by boolean operators. The semantics of resultSetId with relations other than "=" is undefined.
- serverChoice
This is the default when the index and relation is omitted from a search clause. 'cql.serverChoice' means that the server will choose an index for the given term. The relation used is 'scr', hence 'cql.serverChoice scr "term"' is an equivalent search clause to '"term"'.
- anywhere
This means "search all indexes from all indexsets you know". (By contrast, srw.serverChoice means essentially "search any index -- your choice -- from any indexset you know".)
Relations
Implicit Relations
These relations are defined as such in the grammar of CQL. The cql context set only defines their meaning, rather than their existence.
- <, >, <=, and >= retain their regular meanings as relations pertaining to ordered terms
- = is used:
- For word adjacency, when the term is a list of words. That is to say that the words appear in that order with no others intervening.
- Otherwise, for exact equality of value.
- <> is 'not equal to'.
Default Relations
These relations are defined as being widely useful as part of a default context set.
- scr is used to mean "server choice relation". It is
used when the client wishes the server to choose the most
appropriate relation for the index or term. It is assumed
when relation is omitted.
- exact is used for exact string matching, when
the term is a character string. =/cql.string is synonymous.
- all and any may be used when the term contains multiple items to indicate "all of these items" or "any of these items". These queries could be expressed using boolean AND and OR respectively. These relations have an implicit relation modifier of 'cql.word'.
- within may be used with a search term that has multiple dimensions. It matches if the database's term falls completely within the range, area or volume described by the search term. For example: dc.date within "2002 2003"
- encloses may be used when the index's data has multiple dimensions. It matches if the database's term fully encloses the search term. For example: xxx.dateRange encloses 2002
Relation Modifiers
Term Functions
These relation modifiers request that the server perform some algorithm on each item within the term before processing. If named algorithms are required, then further context sets should define relation modifiers for these.
- stem
The server should apply a stemming algorithm to the words within the term. For example such that computing and computer both match the stem of 'compute'.
- relevant
The server should use a relevancy algorithm for determining matches and the order of the result set.
- phonetic
The server should use a phonetic algorithm for determining words which sound like the term.
- fuzzy
The server should use a 'fuzzy' algorithm for determining matches.
Relation Qualifiers
These modifiers qualify the relation to more precisely determine its semantics.
- partial
When used with within or encloses, there may be some section which extends without the term. This permits for the database term to be partially enclosed, or fall partially within the search term.
Term Format
These relation modifiers describe the format or structure of the term in some fashion.
- word
The term should be broken into words, according to the server's definition of a 'word'
- string
The term is a single item, and should not be broken up.
- isoDate
Each item within the term conforms to the ISO 8601 specification for expressing dates.
- number
Each item within the term is a number.
- uri
Each item within the term is a URI.
- masked (default modifier)
The following masking rules and special characters apply for search terms, unless overridden in a profile via a relation modifier. To explicitly request this functionality, add 'cql.masked' as a relation modifier.
- A single asterisk (*) is used to mask zero or more
characters.
- A
single question mark (?) is used to mask a single character,
thus N consecutive question-marks means mask N characters.
- Carat/hat (^) is used as
an anchor character for terms that are word lists, that is,
where the relation is 'all' or 'any', or '=' when used for
word adjacency. It may not be used to anchor a string, that
is, when relation is 'exact' (string matches are, by default,
anchored). It may occur at the beginning or end of a word
(with no intervening space) to mean right or left anchored."^"
has no special meaning when it occurs within a word (not at
the beginning or end) or string but must be escaped
nevertheless.
- Backslash (\) is used to escape '*', '?', quote (") and
'^' , as well as itself. Backslash not followed immediately by
one of these characters is an error.
Masking examples:
-
dc.title = c*t (matches cat and coast etc.)
dc.title = "*fish food*" (matches unanchored 'fish food')
-
dc.title = c?t (matches cat and cot, not coast or ct)
"?" (matches any single character)
-
dc.title = "^cat in the hat" (matches 'cat in the hat' where it is at the beginning of the field)
dc.title any "^cat ^dog eats rat" (matches 'cat eats rat', 'dog eats rat' but not 'cat eats dog' as dog is not at the beginning)
-
dc.title = "\"Of Couse\" she said"
dc.identifier exact "\\\"\^\*\?andSomeMoreCharacters"
Boolean Modifiers
The CQL context set defines four boolean modifiers, which are only used with the prox boolean operator.
- distance
The distance that the two terms should be separated by. Can have any comparison, and any non negative integer as a value.
- unit
The type of unit for the distance. CQL defines the following values: 'paragraph', 'sentence', 'word' and 'element'. The value may come from another context set.
- ordered
The order of the two terms must be as per the query.
- unordered
The order of the two terms is unimportant. This is the default.