Relevance Ranking Context Set

Version 1.0, 4th June 2004

Introduction

The default ordering of a result set is left up to the server, including a lack of any explicit ordering. This is addressed in SRW for the most part through the use of the sortKeys parameter, however, for sophisticated relevance based ranking, boolean operands might be treated differently, and specific methods might be requested to combine the results of evaluating each operand. This context set attempts to address this issue by defining relation and boolean modifiers for the various known algorithms. The algorithms have their documentation linked in the tables below.

If you wish to have an algorithm added to this set, please contact the maintainer. If you wish to use another algorithm without having it added, then you should create a new context set, but please reference this base set to avoid duplication.

If the 'relevant' relation modifier from the cql context set is given, but no named algorithm, then the server should continue to use the basic semantics -- the server may decide which algorithm to use. It is also legal to include both cql.relevant along with an algorithm from this set, in which case that algorithm should be used. Hence there is no need to include an 'any algorithm' relation modifier in this set.

Also, please note that, as with all context sets, these modifiers are case insensitive. "rel.CORI" and "rel.cori" are to be treated the same. This is especially true as most of the modifiers are acronyms so may be entered in upper case into queries, even though they are listed in lower case below.

To return relevancy information attached to a record, please see the record metadata extension. (To be written up, ala 'rec' context set)

Context Set

The identifier for the context set is: info:srw/cql-context-set/2/relevance-1.0
The recommended short name is: rel
The maintainer of the context set is: Rob Sanderson, azaroth@liv.ac.uk

There are no indexes defined in this context set.

Relation Modifiers

Modifier NameDescription
lrLogistic Regression algorithm from UC Berkeley
coriCORI algorithm of Callan et al. (Carnegie Mellon)
okapiOKAPI BM-25 of Robertson et al. (City University, London)
gloss Glossary of Servers of Gravano et al. (Stanford)
gglossGeneralised Glossary of Servers
dtf-coriDecision-Theoretic Framework extension to CORI of Fuhr, Nottelmann (University of Duisburg-Essen)
reddeRelevant Document Distribtion Estimation of Callan et al. (Carnegie Mellon)
cdrCover Density Ranking
pagerankGoogle's PageRank algorithm of Brin, Page (ex Stanford)
hilltopThe Hilltop algorithm of Bharat, Milahila (Google, University of Toronto)
const_*A named constant relevant to the algorithm, eg const_k=0.7 This allows constants to be overridden for specific queries or indexes in order to either ensure consistency across servers or to fine tune the results.

Boolean Modifiers

Modifier NameDescription
sum Add the values
mean Average the values
nsum Normalised the summed values
cmbz Normalise and rescale values
max Select maximum value
min Select minimum value
nprv Normalise values and privilege high ranked documents
pivot Normalise sub-record retrieval scores based on document scores
const_*A named constant relevant to the algorithm, as above

Examples

Some examples of how the context set might be used.


  dc.title any/rel.lr "fish squid burger cheese"

  cql.anywhere all/rel.cori "sanderson denenberg" or/rel.mean dc.description any/rel.cori "information retrieval"

  dc.title any/rel.lr/rel.const_c0=-0.705 "logistic regression relevance ranking techniques"