Sorting in SRW

Version 1.1, 12th January 2004

Introduction

A request may include a sort specification, indicating the desired ordering of the results. This is a request that the server should apply a sorting algorithm to a list of records before returning any. It may be supplied with a new search or applied to an existing result set if these are supported by the server.

The sort parameter is included in the main operation rather than as a separate operation in order to allow for optimization if the server knows the desired sort order before processing the query, rather than sorting a result set after it has been created. Secondly, a server may be able to sort a result set at creation, but not maintain it across multiple requests.

In order to specify the result set(s) which the sort specification applies to, the query should include cql.resultSetId = "resultSetId" where 'resultSetId' is the identifier supplied by the server for the result set. If multiple result set identifiers are supplied, linked by boolean OR, AND or NOT, then the request will combine and sort all of the given sets together. This is documented in the CQL context set.

Sort Definition

The sort parameter includes one or more keys, each of which includes the following information:

NameTypeRequiredDescription
pathxsd:stringMandatoryAn XPath expression describing a tagpath to be used in the sort.
schemaxsd:stringOptionalThe URI identifierfor a supported schema. This schema is the one to which the XPath expression applies. If it is not supplied then the default value from Explain will be used.
ascendingxsd:booleanOptionalShould the results be sorted ascending (true, and the default if not supplied) or descending (false).
caseSensitivexsd:booleanOptionalShould case be considered as important during the sort. The default value is false if not supplied.
missingValuexsd:stringOptionalOne of 'abort', 'highValue', 'lowValue', 'omit' or a supplied value. The semantics of each are described below and the default is 'highValue'.

XPath and Schema

XPath is a W3C specification which allows the description of an element path. So to sort by title, one might specify the xpath of "/record/title" within the Dublin Core schema. The records need not be stored in this particular schema to be able to sort by it. The records do not even necessarily need to be able to be returned in the schema.

SRW has the concept of utility schemas which are designed not to return records in, but into which records can be transformed in order to sort them in a particular way. For example, if the record has a geographical location in it, then it may be desirable to sort the locations in the records from north to south and east to west. This would obviously require transformation into a schema that allows sorting by a convenient coordinate system, rather than lexically on the place name, and this schema may not be available for retrieving the records.

Missing Value Action

This parameter of a sort key instructs the server what to do when the supplied XPath is not present within the record. For example if the server is instructed to sort by author, and a record has no author, it will behave in accordance with this value.

Its value may be:

sortKeys

The textual representation of a sort key is achieved by the following rules.

  1. The path must be included as the first parameter.
  2. Subsequent parameters are separated by the use of a comma (,) character in the order given above.
  3. The path and schema must be quoted if the contain quotes, commas or spaces. Internal quotes must be escaped with a backslash.
  4. Parameters beyond the first may be supplied with no value, in which case the server will use the default.
  5. The last parameter supplied must be present. (In other words, the key may not end in a comma)
  6. Boolean parameters are expressed as 1 (true) or 0 (false)

Multiple keys are separated by whitespace.

Thus a complete sortKeys parameter might be:

  <sortKeys>  
    "/record/title","http://www.loc.gov/zing/srw/dc-record/",1 
    "/record/datafield[@tag=\"100\"]/subfield[@code=\"a\"]","http://www.loc.gov/MARC21/slim/",,,"Smith" 
  </sortKeys>

Failure to Sort

If the server is unable to create a sorted result set according to the request, then it must supply appropriate diagnostics stating this. See the diagnostics specification for more information.

xSortKeys

The XML representation of sort keys is very simple. Each key is wrapped in a 'sortKey' element. The elements within are the parameters described above. The path element is required, but the others are optional. This is used only when echoing the searchRetrieveRequest for SRU.

An example xSortKeys:


<xSortKeys>
  <sortKey>
    <path>/record/title</path>
    <schema>http://www.loc.gov/zing/srw/dc-record/</schema>
    <ascending>true</ascending>
  </sortKey>
  <sortKey>
    <path>/record/datafield[@tag="100"]/subfield[@code="a"]</path>
    <schema>http://www.loc.gov/MARC21/slim/</schema>
    <missingValue>"Smith"</missingValue>
  </sortKey>
</xSortKeys>