Skip to content

osirrc/jass-docker

Repository files navigation

OSIRRC Docker Image for JASS

Generic badge DOI

Andrew Trotman

This readme is heavily based (i.e. copied from) the Anserini readme.

This is the docker image for JASS conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. This image is available on Docker Hub. The OSIRRC 2019 image library contains a log of successful executions of this image.

JASS is not a fully stand along search system. It is just the search engine. It relies on ATIRE for indexing and other services. As JASS has been forked several times, this is the verson seen in the JASSv2 repo.

  • Supported test collections: robust04, and core17.
  • Supported hooks: init, index, search

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python3 run.py prepare \
  --repo osirrc2019/atire \
  --tag v0.1.0 \
  --collections robust04=/path/to/disk45=trectext

For example:

python3 run.py prepare --repo jass/osirrc2019 --tag v0.1.0 \
 --collections robust04=/Users/andrew/programming/JASSv2/docker/osirrc2019/robust04=trectext

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection.

python3 run.py search \
  --repo osirrc2019/atire \
  --tag v0.1.0 \
  --output out/atire \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04 \ 
  --top_k 100"

For example:

python3 run.py search --repo jass/osirrc2019 --tag v0.1.0 --collection robust04 \
 --topic topics/topics.robust04.txt --top_k 100 \
 --output /Users/andrew/programming/osirrc2019/jass-docker/output --qrels qrels/qrels.robust04.txt

Retrieval Methods

This instance of JASS uses BM25 from ATIRE with the defailt parameters. JASS requires an impact ordered index which is generated by ATIRE then converted into the JASS index format

Expected Results

The following numbers should be able to be re-produced using the scripts provided by the jig.

robust04

TREC 2004 Robust Track Topics.

  • BM25: k1=0.9, b=0.4 (Robertson et al., 1995)
Metric Score
MAP 0.1984
P@30 0.2991

core17

TREC 2017 Common Core Track Topics.

  • BM25: k1=0.9, b=0.4 (Robertson et al., 1995)
Metric Score
MAP 0.1415
P@30 0.4080

Implementation

The following is a quick breakdown of what happens in each of the scripts in this repo.

Dockerfile

The Dockerfile installs dependencies (python3, etc.), copies scripts to the root dir, and sets the working dir to /work.

init

The init script is straightforward - it's simply a shell script (via the #!/usr/bin/env sh she-bang) that downloads and builds ATIRE and JASS.

index

The index Python script (via the #!/usr/bin/python3 she-bang) reads a JSON string (see here) containing at least one collection to index (including the name, path, and format). The collection is indexed and placed in the current working directory (i.e., /work). At this point, jig takes a snapshot and the indexed collections are persisted for the search hook.

search

The search script reads a JSON string (see here) containing the collection name (to map back to the index directory from the index hook) and topic path, among other options. The retrieval run is performed and output is placed in /output for the jig to evaluate using trec_eval.

References

  • S. E. Robertson, S. Walker, M. Hancock-Beaulieu, M. Gatford, and A. Payne. (1995) Okapi at TREC-4. TREC.
  • A. Trotman, X.-F Jia, M. Crane (2012), Towards an Efficient and Effective Search Engine. Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval, pp. 40-47.
  • Y. Lv, CX. Zhai (2011) Lower-Bounding Term Frequency Normalization. CIKM 2011, pp. 7-16.
  • J. Lin, A. Trotman (2015), Anytime Ranking for Impact-Ordered Indexes. ICTIR 2015, pp. 301-304.

Reviews