benchmark on c3potato

Notes on how to run the benchmarks on c3potato.

Set up

Ideally, I would like to run asv from outside the source tree. However, that ran into some issues, possibly related to the non-standard set up of the MDAnalysis/mdanalysis repo (we need to use the new feature in the JSON file "repo_subdir":"package" from merged asv PR 611 so that we can use package/setup.py); this feature is available in asv >= 0.3.0 (we currently run 0.5.1 in a Python 3.9 conda environment). At the moment I run the benchmarks from inside the checked out repo (in the benchmarks directory) and store the results and environments elsewhere. This allows making the results a separate repo without fear of interference or having to use git subrepositories.

benchmarking/
   benchmarks/       # the MDAnalysis/benchmarks repo
       asv.conf.json
       results/      # all benchmark results
   env/              # cache with asv environments 
   html/             # html output, becomes the MDAnalysis/benchmarks gh-pages branch
repositories/
   asv/              # offical asv 
   mdanalysis/       # MDAnalysis/mdanalysis
       benchmarks/   # asv benchmark files; benchmarks are run in this directory
       package/      # source code including setup.py
       testsuite/    # unit tests
miniconda3/          # anaconda environment

I added the function

function add_miniconda () {
   echo ">> adding miniconda3 to PATH"
   # added by Miniconda3 installer
   export PATH="${HOME}/MDA/miniconda3/bin:$PATH"
}

to .bashrc so that I can add the miniconda environment on demand.

Python environment

I installed miniconda3. It is not enabled by default so to get started do

add_miniconda
source activate benchmark

and work in the benchmark environment.

Run benchmark

Benchmarks take a long time so run in byobu.

Python 3.8

Since MDAnalysis 2.0 we only support Python ≥3.7. The benchmark matrix was changed to run all benchmarks in a Python 3.8 environment (see MDAnalysis/benchmarks#11).

The history (merge commits only) was re-run up to release-1.0.1 so that we have a comparison between 3.6 and 3.8.

Python 3.6

Starting with the development after 1.0, Python 2.7 support was dropped (June 2020). Benchmarks should run under Python 3 (currently, Python 3.6 – see MDAnalysis/benchmarks# and MDAnalysis/mdanalysis#2747).

change build matrix from 2.7 to 3.6
only build history from release 0.17.0 onwards (first full Py 3 support)
use asv 0.4.2 from conda (instead of pre-0.3.0+Tyler's patch)

Notes:

asv >0.3.0 (release) did not work with my version of git 2.1.4 as it wants the git name-rev --exclude option and --exclude is only present in later versions of git; our patched version of asv didn't require new git. Because updating git is not possible in the distribution, I am installing it via conda and can now use asv 0.4.2.
Tests failed, saying that to build mdanalysis, numpy was required. This was odd because numpy is in the matrix of dependencies and is conda-installed in the iitial asv environment. However, it seems that asv created the initial conda environment in a broken state where numpy would not load: >>> import numpy failed, indicating some linking error and sent me to Troubleshooting ImportError. I manually activated the env, ran conda update --all and conda remove --force mdanalysis. The update fixed the import issue. (Note that mdanalysis gets installed with mdanalysistests but is typically removed by asv with pip uninstall mdanalysis although I am not sure how well this works on a conda package so I did it explicitly at the conda level. The test package is needed for trajectory data.) Note sure if the numpy issue was a glitch or a more general problem. An initial test run asv run --config asv_c3potato.conf.json --bench GROReadBench -v b468e574146f139f798b685db19740f5c3f58e95..3da643f37bec7ca28a8807f0948e2de3fce79014 is working.

Build initial history (started 2020-06-12: *Running 8502 total benchmarks (109 commits * 1 environments * 78 benchmarks) *):

asv run --config asv_c3potato.conf.json -e -j 4 "release-0.17.0..HEAD --merges" 2>&1 | tee asv_36_log_0.txt

Python 2.7

The Python 2.7 benchmarks were retired with 1.0 but the data are still present. The following is kept for historical reasons.

cd ~/MDA/repositories/mdanalysis/benchmarks
asv run --config asv_c3potato.conf.json -e -j 4 "release-0.11.0..HEAD --merges" 2>&1 | tee asv_log.txt

We are running the benchmarks since release 0.11.0 because the transition from 0.10 to 0.11 broke so many things in the API that it is too painful to write the performance tests to also cater to the pre-0.11.0 code. (At least for now.)

Only code changes

Re-run benchmarks from the last release (instead of release 0.18.0), e.g.

asv run --config asv_c3potato.conf.json -e -j 4 "release-1.0.0..HEAD --merges"

New benchmarks

Just run the new benchmark NAME from the beginning

asv run --config asv_c3potato.conf.json -e -j 4 "release-0.18.0..HEAD --merges" --bench NAME

Store results

The results are stored in the git repository https://github.com/MDAnalysis/benchmarks/ in the master branch.

cd benchmarking/benchmarks
asv publish  
git add results/
git commit -m 'updated benchmarks'
git push

Process and publish results

We want to create html pages of the results and push it to the gh-pages branch of the MDAnalysis/benchmarks repo. In principle asv gh-pages should do this automatically. In practice it fails for this set up with a cryptic failure at the push step ("asv.util.ProcessError: Command '/usr/bin/git push origin gh-pages' returned non-zero exit status 1").

For right now, force-push manually:

cd benchmarking/benchmarks
asv publish     # might be superfluous
asv gh-pages --no-push
git push origin +gh-pages

Check the output on https://www.mdanalysis.org/benchmarks/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly