Chunk reads from mount needs improving #323

sb10 · 2020-07-01T13:43:13Z

s3-mounting jobs were causing millions of 500 errors from ceph. From user's end, they doesn't think anything is wrong; jobs completed fine. After completing 32k irods upload jobs (the ones that actually read the cram from S3), the mean walltime was only 26s.

slowread.pl:

#!/usr/bin/env perl
use warnings;
use strict;
use Time::HiRes qw(usleep);

open my $f, "<", shift or die($!);
binmode($f);
my $buf;
while(my $len = read($f, $buf, 100000 * rand() + 10))
{
    usleep 1000*rand(5);
}
close $f;

irods_startup.sh:

#!/bin/bash
sudo su - <<'SU'
cat >> /etc/resolv.conf <<'RSVCF'
search sanger.ac.uk internal.sanger.ac.uk hinxtonit.com
RSVCF
SU
cd
umask 0077
mkdir -p .irods
cat > .irods/irods_environment.json <<"ICRW"
{
   "irods_cwd": "/Sanger1-dev/home/sb10",
   "irods_home": "/Sanger1-dev/home/sb10",
   "irods_host": "[redacted]",
   "irods_port": 1247,
   "irods_ssl_ca_certificate_file": "/etc/irods/ca.pem",
   "irods_user_name": "sb10",
   "irods_zone_name": "Sanger1-dev"
}
ICRW
export IRODS_ENVIRONMENT_FILE=.irods/irods_environment.json
expect <<'IERW' || echo 'RW iRODS not configured'
set timeout -1
spawn iinit
match_max 100000
expect -exact "Enter your current iRODS password:"
send -- "<password>\r"
expect eof
IERW

wr cloud deploy --debug -f m1.medium  -o 'Ubuntu 12.04 Precise + NPG 20180212' --network_dns 172.18.255.1,172.18.255.3 --max_servers 1 -c "~home/tmp/slowread.pl:~/slowread.pl" -s ~hom/tmp/irods_startup.sh
wr mount --mount_json '[{"Targets":[{"Path":"npg-cloud-realign-wip/IHTP_ISC_DDD_FY4"}]}]' &
find mnt -name \*.cram | head -n 10000 > crams.10k
fg
ctrl-c
split -l 1000 crams.10k crams.

cat ~home/tmp/crams.af | perl -e 'use File::Basename; while (<>) { chomp; $_ =~ s/^mnt\///; $p = $_; $c = basename($p); $d = $p; $d =~ s/$c//;; print qq~echo \047{"operation": "put", "arguments": {}, "target": {"directory":"$d", "file": "$c", "collection":"/Sanger1-dev/home/sb10", "data_object": "$c"}}\047 | baton-do && irm /Sanger1-dev/home/sb10/$c\n~; }' | wr add -i baton1000ab --mount_json '[{"Targets":[{"Path":"npg-cloud-realign-wip/IHTP_ISC_DDD_FY4"}],"Verbose":true}]' -r 0 -o 2 --memory 200MB

Confirmed issue! Apparently, even though baton is normally single threaded and does a simple streaming read, if the file is large, then it uses irods put code (a call using its C API) to do the putting, which is multi-threaded and reads chunks in parallel.

Confirm by forcing normal single-threaded behaviour:

cat ~home/tmp/crams.ae | perl -e 'use File::Basename; while (<>) { chomp; $_ =~ s/^mnt\///; $p = $_; $c = basename($p); $d = $p; $d =~ s/$c//;; print qq~echo \047{"operation": "put", "arguments": {}, "target": {"directory":"$d", "file": "$c", "collection":"/Sanger1-dev/home/sb10", "data_object": "$c"}}\047 | baton-do --single-server && irm /Sanger1-dev/home/sb10/$c\n~; }' | wr add -i baton1000abSingle --mount_json '[{"Targets":[{"Path":"npg-cloud-realign-wip/IHTP_ISC_DDD_FY4"}],"Verbose":true}]' -r 0 -o 2 --memory 200MB

No problems. Double check slow reads don't cause it on uncached crams:

cat ~home/tmp/crams.ad | perl -e 'use File::Basename; while (<>) { chomp; $_ =~ s/^mnt\///; $p = $_; $c = basename($p); print "perl ~/slowread.pl $p\n" }' | wr add -i slowread1000 --mount_json '[{"Targets":[{"Path":"npg-cloud-realign-wip/IHTP_ISC_DDD_FY4"}],"Verbose":true}]' -r 0 -o 2 --memory 200MB

No problems.

Try turning on caching for the S3 mount?

cat ~home/tmp/crams.ag | perl -e 'use File::Basename; while (<>) { chomp; $_ =~ s/^mnt\///; $p = $_; $c = basename($p); $d = $p; $d =~ s/$c//;; print qq~echo \047{"operation": "put", "arguments": {}, "target": {"directory":"$d", "file": "$c", "collection":"/Sanger1-dev/home/sb10", "data_object": "$c"}}\047 | baton-do && irm /Sanger1-dev/home/sb10/$c\n~; }' | wr add -i baton1000agCached --mounts 'cr:npg-cloud-realign-wip/IHTP_ISC_DDD_FY4' -r 0 -o 2 --memory 200MB

Problem remains. Would have to have a cached mode option that meant "if a file is read at all, read and cache all of it, don't do any range requests".

Or better, instead of doing range requests to EOF, do them in 4MB chunks (or start with small chunks and learn how much is actually being read, and do that size) behind the scenes.

The text was updated successfully, but these errors were encountered:

sb10 added the enhancement label Jul 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk reads from mount needs improving #323

Chunk reads from mount needs improving #323

sb10 commented Jul 1, 2020

Chunk reads from mount needs improving #323

Chunk reads from mount needs improving #323

Comments

sb10 commented Jul 1, 2020