Improve torchcodec's performance for small video sizes #192

ahmadsharif1 · 2024-08-26T14:19:09Z

I ran some benchmarks with smaller videos and found torchcodec's sampler is slower than torchvision's sampler. I then did some improvements that got me a 5x or so performance improvement over the current code. I will be submitting PRs for the next few weeks but I wanted to document inefficiencies here:

filtergraph takes a while to setup -- we could either share the filtergraph across instances or use libswscale to do the color conversion.
Batch decoding is slow because we incur an extra frame memcpy after doing the actual decoding.
Temp IO buffer is too big -- we could get away with a few KB of allocation, not MBs like we currently do.
For use cases that need approximate frames we don't need to scan the file. We can just trust the header and get approximate frames that way using the core API.

With all these improvements I got the per-batch sampler time to go from about 200ms per batch to about 37ms per batch when sampling with frames_per_clip=4, clips_per_video=1, dilation=2 and video sizes were all under 2MB each.

(Original time was about 200ms per frame -- so the total gain is more than 5x for short videos).

ahmadsharif1 self-assigned this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve torchcodec's performance for small video sizes #192

Improve torchcodec's performance for small video sizes #192

ahmadsharif1 commented Aug 26, 2024

Improve torchcodec's performance for small video sizes #192

Improve torchcodec's performance for small video sizes #192

Comments

ahmadsharif1 commented Aug 26, 2024