Result Sets in SRW

Version 1.1, 12th January 2004

Introduction

SRW does not require the support of persistent result sets that may be accessed by a client in subsequent requests. It does require the server to state whether or not it supports them, and if it supports them then they must obey the result set model described below.

There are applications in which result sets are critical, and conversely there are applications in which result sets are not viable. An example of the first might be scientific investigation of a database with comparison of data sets produced at different times, the second might be a very frequently used database of web pages in which persistent result sets would be an impossible burden on the infrastructure due to the frequency of use.

Even if the server does not make result sets available for public manipulation, the following model is also important to understand in order to allow a single request to both match records and then sort them.

Result Set Model

Processing of a search query results in the selection of a set of records, represented by a result set maintained at the server, and logically an ordered list of references to the records. Once created, result sets cannot be modified. Any operation which would somehow change a result set instead creates a new result set. Each result set is referenced via a unique identifying string, generated by the server when the result set is created.

From the client's point of view, the result set is a set of records each referenced by an ordinal number, beginning at 1. The client may request a given record from a result set according to a specific schema. For example the client may request record 1 in Dublin Core, and subsequently request record 1 in MODS. The requested schema does not persist with the result set, only the ordered list of records.

A record might be deleted or otherwise become unavailable while one or more result sets which reference that record still exist. If a client then requests that record, the server must supply a surrogate diagnostic in place of the record. For example, if the record at position 2 in a result set is deleted and then a client requests records 1 through 3, the server should supply, in order: record 1, a surrogate diagnostic for record 2, record 3.

The records in a result set are not necessarily ordered according to any specific or predictable scheme, unless it has been created with a request that contains one or more sort keys. See the sort specification for more information regarding the specifics of sorting. If search and sort specifications are supplied on the same request then only the final sorted result set is considered to exist, even if the server internally creates a result set and then sorts it.

Request Parameters

In order to specify a result set in a request to sort or retrieve records, the utility index 'cql.resultSetId' is used in the CQL query. See the CQL Context Set documentation for more information about its use.

The 'resultSetTTL' parameter may be given to explicitly request a time to live for the result set created by the query. This is the number of seconds for which the client is requesting that the server keep the database around.

Result sets are not created by the scan operation.

Response Parameters

resultSetId

If the server supports result sets, it may include a resultSetId in the searchRetrieve response, along with an idle time described below. If another query is submitted then the server will again supply a result set id. If the result of the query would modify an existing result set, then the server must supply a new id for this new set. The server should maintain unique names for each result set created, even if the result sets no longer exist, such that clients do not mistakenly request records from the new set when meaning to refer to the previous set with the same identifier.

resultSetIdleTime

The server may supply an idle time along with a result set. The server is making a good-faith estimate that the result set will remain available and unchanged (both in content and order) until a timeout (a period of inactivity exceeding the idle time). The idle time is an integer representing seconds; it must be a positive integer, and should not be so small that a client cannot realistically reference the result set again. If the server does not intend that the result set be referenced, it should omit the result set identifier in the response.