The default ordering of a result set is left up to the server, including a lack of any explicit ordering. This is addressed in SRU for the most part through the use of the 'sort' / 'sortKeys' parameter in SRU v1.1 and by the 'sortBy' keyword in SRU v1.2 queries. However,for sophisticated relevance based ranking, different algorithms are available, and specific methods might be requested to combine the results of evaluating each operand or clause. This context set attempts to address this issue by defining relation and boolean modifiers for the various known algorithms, and combinations of their results. Several known algorithms have their documentation linked in the table in Appendix A below.
If the 'relevant' relation modifier from the cql context set is given, but no named algorithm, then the server should continue to use the basic semantics -- the server may decide which algorithm to use. It is also legal to include both cql.relevant along with an algorithm from this set, in which case that algorithm should be used. Hence there is no need to include an 'any algorithm' relation modifier in this set.
Also, please note that, as with all context sets, these modifiers are case insensitive. "rel.algorithm=CORI" and "rel.algorithm=cori" are to be treated the same. This is especially true as most of the modifiers are acronyms so may be entered in upper case into queries, even though they are listed in lower case below.
To return relevancy information attached to a record, please see the record metadata extension. (To be written up, ala 'rec' context set)
There are no indexes defined in this context set.
There are no relations defined in this context set.
| Modifier Name | Description |
|---|---|
| algorithm | The algorithm to be used to assign relevance scores to results (see table in Appendix A for examples). |
| combine | The method to be used to combine scores generated for individual operands (see table in Appendix B for examples). |
| feedback | Apply blind relevance feedback to increase recall. |
| minRaw | The minimum raw score that must be achieved (after scores from individual operands have been combined) to be included in results. |
| minScaled | The minimum scaled score that must be achieved (after scores from individual operands have been combined) to be included in results. Scaled scores are proportionate to the highest score. 0 <= scaledScore <= 1 . |
| const_* | A named constant relevant to the algorithm, eg const_k=0.7 This allows constants to be overridden for specific queries or indexes in order to either ensure consistency across servers or to fine tune the results. |
There are no booleans defined in this context set.
| Modifier Name | Description |
|---|---|
| combine | Method to be used to combine scores generated for individual clauses. |
| minRaw | The minimum raw score that must be achieved (after scores from individual clauses have been combined) to be included in results. |
| minScaled | The minimum scaled score that must be achieved (after scores from individual clauses have been combined) to be included in results. Scaled scores are proportionate to the highest score. 0 <= scaledScore <= 1 . |
| const_* | A named constant relevant to the algorithm, as in Relation Modifiers. |
Some examples of how the context set might be used.
dc.title any/rel.algorithm=lr "fish squid burger cheese" cql.anywhere all/rel.algorithm=cori "sanderson denenberg" or/rel.combine=mean dc.description any/rel.algorithm=cori "information retrieval" dc.title any/rel.algorithm=lr/rel.const_c0=-0.705 "logistic regression relevance ranking techniques" |
| Modifier Value | Description |
|---|---|
| lr | Logistic Regression algorithm from UC Berkeley |
| cori | CORI algorithm of Callan et al. (Carnegie Mellon) |
| okapi | OKAPI BM-25 of Robertson et al. (City University, London) |
| gloss | Glossary of Servers of Gravano et al. (Stanford) |
| ggloss | Generalised Glossary of Servers |
| dtf-cori | Decision-Theoretic Framework extension to CORI of Fuhr, Nottelmann (University of Duisburg-Essen) |
| redde | Relevant Document Distribtion Estimation of Callan et al. (Carnegie Mellon) |
| cdr | Cover Density Ranking |
| pagerank | Google's PageRank algorithm of Brin, Page (ex Stanford) |
| hilltop | The Hilltop algorithm of Bharat, Milahila (Google, University of Toronto) |
| Modifier Value | Description |
|---|---|
| sum | Add the values |
| mean | Average the values |
| nsum | Normalised the summed values |
| cmbz | Normalise and rescale values |
| max | Select maximum value |
| min | Select minimum value |
| nprv | Normalise values and privilege high ranked documents |
| pivot | Normalise sub-record retrieval scores based on document scores |