Requested Term Metadata

Version 1.0, 11th March 2004

Introduction

Many user interfaces that have a 'browse' facility wish to highlight the term which the user requested, or place an entry in the list at the position where their term would be. Recent discussions concerning normalised indexes, for example stemmed words or phonetic searches, have revealed situations where an ordered index may appear to be unordered to the client. This occurs when the term returned is not the same as the term in the index, because the term returned must be able to be used in a subsequent search.

An example will help to clarify this. A stemmed search matches all words which have the same lexical stem as the search term. A search for 'happiness' will match 'happy', 'happily' and 'unhappiness' among others. As indexes cannot contain the same term in multiple locations, if the requested term is 'happiness' then the server may legitimately return 'unhapiness' as the term in the middle of the 'h' section of the index as the stemmed form is 'happi' using the Porter stemming algorithm. This 'u' term in the middle of the 'h's will make it impossible to do the regular check that the term returned is the same as the one requested, nor can you check that the term requested is between the two terms on either side of the position it is expected. Checking position also fails at the beginning and end of the index, as there may not be enough terms to return.

This being the case, the server must be able to mark a scanTerm to say that it is the one requested, or that it is immediately after or before the one requested.

RequestedTerm Extension

This is version 1.0 of the RequestedTerm Extension.

The RequestedTerm extension has the identifier and namespace: info:srw/extension/2/requestedTerm-1.0
When used with SRU, the prefix 'c3o_rt-' should be put before any parameter name, along with the mandatory 'x-' prefix.

The RequestedTerm extension defines one, empty request parameter: markRequestedTerm
This parameter is only valid for use with the scan request. If present, the server is requested to mark the location of the requested term in the response, using the method described below.

The RequestedTerm extension defines one piece of additional response data: requestedTerm. This field is only valid for use in the extraTermData field in the scanResponse message.

The requestedTerm is a string field which, if present, must contain one of the following identifiers:

IdentifierDescription
previousTermThe term immediately before the position the requested term would be, if present.
requestedTermThe term that was requested, after any normalisation routines and subsequent reconstruction.
subsequentTermThe term immediately after the position the requested term would be, if present.

If requestedTerm is present and marked, the terms on either side do not have to have to be marked, but may be at the server's discretion.

Example

SRU Request:

http://srw.cheshire3.org:8080/l5r?operation=scan&version=1.1&scanClause=dc.title any happy
&responsePosition=5&maximumTerms=10&x-c3o_rt-markRequestedTerm=

SRW Request:

<scanRequest xmlns="http://www.loc.gov/zing/srw/">
  <version>1.1</version>
  <scanClause>dc.title any happy</scanClause>
  <responsePosition>5</responsePosition>
  <maximumTerms>10</maximumTerms>
  <extraRequestData>
    <rt:markRequestedTerm xmlns:rt="info:srw/extension/2/requestedTerm-1.0"/>
 </extraRequestData>
</scanRequest>

Response:

<scanResponse xmlns="http://www.loc.gov/zing/srw/">
  <version>1.1</version>
  <terms>
    <term>
      <value>hack</value>
    </term>
    <term>
      <value>hall</value>
    </term>
    <term>
      <value>ham</value>
    </term>
    <term>
      <value>hand</value>
    </term>
    <term>
      <value>unhappiness</value>
       <extraTermData>
          <rt:requestedTerm xmlns:rt="info:srw/extension/2/requestedTerm-1.0">
             requestedTerm
          </rt:requestedTerm>
       <extraTermData>
    </term>
    <term>
      <value>hart</value>
    </term>
    <term>
      <value>unhastily</value>
    </term>
    <term>
      <value>hatred</value>
    </term>
    <term>
      <value>disheartened</value>
    </term>
    <term>
      <value>hertz</value>
    </term>
  </terms>
</scanResponse>