Qs: parallel reading #5

raacampbell · 2015-11-18T14:21:13Z

I think knossos_cuber is reading files in series. With data on a RAID volume, reading may be sped up substantially when done in parallel. Might be worth adding the option for this.

jmrk84 · 2015-11-18T14:34:31Z

Reading the tif stack is indeed serial I think, but for more or less good reason. For reasonably large tif/image source files, there should be no real advantage of doing this in parallel, because it should be mainly I/O limited instead of CPU limited. If the image files use some expensive compression, this might actually be different. Writing the cubes is then already heavily parallelized, as well as all further operations of the cuber. Do you use compressed input image files?

raacampbell · 2015-11-18T14:43:03Z

I use uncompressed TIFFs of about 100 to 200 MB each. I find reading in parallel is much faster if I'm working from a RAID volume. I use btrfs RAID1 and read with about 1 to 2 threads per drive. I've seen similar results with hardware RAID 1+0, but in that case the optimum speed was at 1 thread per drive.

Fresh benchmarks:

Hardware
8x 4TB btrfs RAID 1; Intel i7 with 8 cores; 64 GB RAM

I read 484 uncompressed TIFFs each of size 201MB. The system cache is cleared before each run.

Serial read - 1041 seconds
8 threads - 281 seconds (3.7x)
16 threads - 231 seconds (4.5x)

jmrk84 · 2015-11-19T16:00:21Z

Thanks for looking into that, it looks like this could be optimized indeed! Could you make a pull request for your modifications that use many threads for reading into the numpy array?

raacampbell · 2015-11-19T16:05:35Z

Sorry, I wasn't modifying your code to generate those numbers. I just did a quick benchmark in MATLAB by reading in TIFF files with different number of workers using the Parallel Computing toolbox.

The benchmark was done with very low level code. i.e. I'm not even using MATLAB's tiff reader, I'm just using their basic fread command to read in the raw binary and then I reshape it to form a tiff. So it really is as simple as possible. I don't see why the speedup shouldn't hold for your code also, since I wasn't doing any decompression or other CPU-intensive steps: it's just IO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qs: parallel reading #5

Qs: parallel reading #5

raacampbell commented Nov 18, 2015

jmrk84 commented Nov 18, 2015

raacampbell commented Nov 18, 2015

jmrk84 commented Nov 19, 2015

raacampbell commented Nov 19, 2015

Qs: parallel reading #5

Qs: parallel reading #5

Comments

raacampbell commented Nov 18, 2015

jmrk84 commented Nov 18, 2015

raacampbell commented Nov 18, 2015

jmrk84 commented Nov 19, 2015

raacampbell commented Nov 19, 2015