A Gentle Introduction to SRW

Author: Rob Sanderson <azaroth@liverpool.ac.uk>

Version 1.1, 12th January 2004

Introduction

This document attempts to provide a more gentle introduction to SRW than simply diving into the service description and related specifications. It will step through the various parts of the SRW protocol in order to inform, rather than define.

SRW, the Search/Retrieve Webservice, is an XML oriented protocol designed to be a low barrier to entry solution to performing searches and other information retrieval operations across the internet. It uses existing, well tested and easily available technologies such as SOAP and XPath in order to perform what has been done in the past using proprietary solutions. The design has been informed by 20 years of experience with the Z39.50 information retrieval protocol, and is both robust and easy to understand while still retaining the important aspects of its predecessor.

The protocol has two ways that it can be carried, either via SOAP or as parameters in a URL. This second form is called SRU -- SearchRetrive by URL. Other transports would also be possible, for example simple XML over HTTP, but these are not defined by the current standard. As SRU does not carry the request in an XML form, we talk about request parameters rather than elements within a request XML schema.

Searching

The primary function of SRW is to allow a user to search a remote database of records. This is done via the searchRetrieve operation, in which the client sends a searchRetreiveRequest and the server responds with a searchRetrieveResponse. The request has several parameters, most of which are optional. The response is primarily a list of XML records which matched the search, along with the full count of how many records were matched.

Search Request

The most important parameter in the search request is called 'query'. It contains a CQL string representing the search query. CQL has an excellent tutorial provided by Mike Taylor, so that ground will not be covered here. You must give this parameter for all search requests.

The search request, and all other requests within the protocol, has one other mandatory parameter: 'version'. This parameter specifies the highest version of the SRW protocol understood by the client, and thus the highest version of the protocol the server should use in the response.

All of the other parameters in the request are optional, but typically will be provided as the server will otherwise provide its own default values for them, which may not be what was wanted. The other main parameters are described below, but there are a few more available.

  <searchRetrieveRequest>
    <version>1.1</version>
    <query>dc.title all "Squirrel Hungry"</query>
    <maximumRecords>1</maximumRecords>
    <startRecord>1</startrecord>
    <recordSchema>dc</recordSchema>
  </searchRetrieveRequest>

Search Response

The response to the search request is very simple. It must contain two fields: 'version', describing which version of the protocol the response is from, and 'numberOfRecords', the number of records which matched the query. If there were any records which matched the query, and the client requested at least one to be returned, then they should be present in the 'records' field. This 'records' field contains an ordered list of records. Each must have a schema and the XML data for the record itself.

  <searchRetrieveResponse>
    <version>1.1</version>
    <numberOfRecords>10</numberOfRecords>
    <records>
      <record>
        <recordSchema>dc</recordSchema>
        <recordData>
          <dc:record>
            <dc:title>Squirrel is Hungry</dc:title>
          </dc:record>
        </recordData>
      </record>
    </records>
  </searchRetrieveResponse>

Browsing

Sometimes it is useful to look through the information in the records piece by piece, regardless of which record it came from, rather than search for a specific set of records. For example, you might not know any of the titles of the records in the database, but it would be interesting to just browse through those titles, like reading along the spines of books on a shelf.

In SRW, this can be done via the scan operation. It returns a slice from the sorted list of terms present in the database for the given index. It has less request parameters than searchRetrieve, but the same mandatory ones.

Scan Request

The scan request carries most of the information in a parameter called 'scanClause'. Like the search request's 'query' parameter, it contains a CQL query. In this case, however, it is just a single search clause specifying the index, relation (plus possible modifiers) and term. It also has one other mandatory parameter: 'version', as above.

The index and relation, plus any modifiers, are used to determine the nature of the terms desired. For example, 'dc.title =/cql.word' would be keywords from the 'dc.title' index, as opposed to the complete title strings. The term is used to determine whereabouts in the list of titles the client wants to start browsing. If the client sends 'dc.title =/cql.word fish' then it wants to start browsing where 'fish' would occur in the list of terms in the index.

The maximum number of terms that the client wants returned is carried in the 'maximumTerms' field. Like 'maximumRecords', the server may return less terms than this -- for example if it has reached the end of the list of terms. If not supplied, then the server will determine a default number of terms to return.

The 'responsePosition' parameter is used to determine whereabouts the term given in the 'scanClause' should appear in the list. It is a 1 based index into the response terms, but 0 may be given meaning immediately before the first term. Also, it may be one greater than the number of terms requested, at which point it should be immediately after the last term given. If not given, then the default is 1.

  <scanRequest>
    <version>1.1</version>
    <scanClause>dc.title =/cql.word "Squirrel"</scanClause>
    <maximumTerms>10</maximumTerms>
    <responsePosition>1</responsePosition>
  </scanRequest>

Scan Response

The response is similar in style to the searchRetrieveResponse, but instead of records, the server is returning terms from the indexes in the database. These terms are returned in an ordered list within the 'terms' parameter, each in a separate 'term' structure. The response will also contain the version, as described above.

Each term listed will contain at least a 'value' field. This value could then be used as the term in a search request with the same index and relation, and would have at least one matching record.

An optional parameter called 'numberOfRecords' may also be returned. This, as its namesake in the search response, is the number of records which match if the term value is used in a search request with the given index and relation.

The 'displayTerm' field in the term structure is also optional, but if present it contains text that should be displayed to the user rather than the term itself. For example, the term might have had diacritics removed for indexing, and hence any search should have them removed as well, but the term displayed should have them added in again otherwise the word is nonsensical.

  <scanResponse>
    <version>1.1</version>
    <terms>
      <term>
        <value>ecole</value>
        <numberOfRecords>17</numberOfRecords>
        <displayTerm>école</displayTerm>
      </term>
      <term>
        <value>squirrel</value>
        <numberOfRecords>20</numberOfRecords>
      </term>
      <term>
        <value>squish</value>
        <numberOfRecords>4</numberOfRecords>
      </term>
    </terms>
  </scanResponse>

Server Capabilities

The final operation which can take place in SRW is a way to find out what the server supports in terms of the protocol, and in terms of CQL. The request is very simple, and the response is a single record which describes the server. There is also the normal 'version' parameter on both the request and the response.

The record returned is a ZeeRex description of the server, and is well documented by the maintainers of the schema.

Explain Request

  <explainRequest>
    <version>1.1</version>
  </explainRequest>

Explain Response

  <explainResponse>
    <version>1.1</version>
    <record>
      <recordSchema>http://explain.z3950.org/1.8/</recordSchema>
      <recordData>
           <zeerex:explain> ...
      </recordData>
    </record>
  </explainResponse>