Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test fixes, upgrade quickcheck #640

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

PetrGlad
Copy link
Contributor

  1. quickcheck 1.x input values vary more than in 0.9 which finds overflow cases and excessive memory allocation problems in Sample::lerp. Calculations now use f64 to avoid that. Conversion back to 16 bit resolution is done explicitly since as conversion may silently discard most significant bits. Alternatively, interpolation coefficient (numerator/dividor) may use e.g. u16 instead of u32, or maybe even floating point. I believe u32 precision is unnecessary there. But that requires updating sample rate converter which at the moment is the only user of this interpolation.
  2. Documentation for Sample::lerp was incorrect. It said that calculations should follow c * first + (c - 1) * second. which gives first at c==1 and second at c==0 but actual implementations use the opposite first at c==0 and second at c==1.
  3. Fix documentation examples. Some of the examples are marked as no_run since they require audio devices and may actually make sounds while cargo test command is running.
  4. Include documentation tests into CI. --all-targets switch excludes doc tests, so those are executed separately.
  5. Non experimental builds exclude some code and in some code parts use mutually exclusive implementations, so I included those in CI too to keep experimental code compilable.

src/conversions/sample.rs Show resolved Hide resolved
src/conversions/sample_rate.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@dvdsk dvdsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fixes and improvements 👍 , thank you very much :) . This must have taken a while!

Few things I am unsure about as you'll see in the discussion. Its a bit too late for me to check on the performance impact of the try_from. Ill see if I can add a new interpolate benchmark to rodio tomorrow.

examples/automatic_gain_control.rs Outdated Show resolved Hide resolved
src/buffer.rs Outdated Show resolved Hide resolved
src/conversions/sample.rs Show resolved Hide resolved
src/conversions/sample.rs Show resolved Hide resolved
src/conversions/sample_rate.rs Outdated Show resolved Hide resolved
Restoring len() test, it is actually useful to check
`SampleRateConverter::size_hint()` implementation.
Although its implementation is not precise (see TODOs).
@dvdsk
Copy link
Collaborator

dvdsk commented Nov 16, 2024

Love the work here, can not look at it the next few days unfortunately. After the weekend I'll see if I can come up with a benchmark for the resampler. We need one anyway (I want to introduce a highfy resampler in the future). Ill do a detailed review and answer all the questions that came up then.

@PetrGlad
Copy link
Contributor Author

PetrGlad commented Nov 16, 2024

Looking a other filters in rodio I am now curious if rodio should only provide basic functions and use RustAudio/dasp for complex processing instead. Some of the rodio functionality is already implemented there.

As I understand rodio in most cases can use generic parameters to avoid dyn references (but Sync and Mixer still use them). That probably can be slightly more efficient. Any other reason rodio cannot use some existing external libraries for sound processing graph?

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 18, 2024

I am now curious if rodio should only provide basic functions and use RustAudio/dasp for complex processing instead.

Rodio and dasp have fundamentally different goals, that makes it hard and maybe unwise to merge them or require users to be familiar with both.
While rodio is an audio playback crate dasp is a signal processing suite. When Rodio is used its usually not the main feature of the application. The app might be a simple game, ui with sounds on click or podcast app. So Rodio should focus on doing most things efficient while getting out of the way of the user. That is why for example Sink exists. You do not need it, it adds nothing that can not be done with the other parts of rodio. The Sink API does however cover most use-cases and is easy to understand/use. (Its also far from perfect and has its own issues which we are going to address).

Some of the rodio functionality is already implemented there.

Rodio predates dasp, so its easy to turn this around and ask, why do they not depend on rodio. Again its a difference in goals, dasp wants to have zero dependencies, rodio just wants to be easy to use (so no C-deps if possible).

use RustAudio/dasp for complex processing instead

I also think rodio has most features dasp has since #602 landed. I might be mistaken there. And a Source that makes it easy to use dasp from rodio might be interesting.

The sinc interpolator in dasp is interesting, rodio needs an option for a slower but more hifi interpolator. See #584.

Copy link
Collaborator

@dvdsk dvdsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only AGC example and what is possibly a superfluous --doc argument in ci.yml left and then we can merge this 👍

Edit: oh and the extra asserts you added might need a message, if only to make reading the code easier.

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 18, 2024

Still have not found the time for the benchmark, as soon as I've done that ill post the results here and we can see what we do based on them.

src/source/speed.rs Outdated Show resolved Hide resolved
@dvdsk
Copy link
Collaborator

dvdsk commented Nov 21, 2024

benchmarks results from main on my machine:
(see bench/resampler.rs now on master)

If you rebase on (or merge with) main ill rerun them on the PR and we can see if something changed.

 Timer precision: 20 ns
resampler       fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ resample_to                │               │               │               │         │
   ├─ 8000      2.145 ms      │ 2.639 ms      │ 2.158 ms      │ 2.169 ms      │ 100     │ 100
   ├─ 11025     2.404 ms      │ 3.197 ms      │ 2.414 ms      │ 2.423 ms      │ 100     │ 100
   ├─ 16000     3.015 ms      │ 3.096 ms      │ 3.042 ms      │ 3.044 ms      │ 100     │ 100
   ├─ 22050     3.51 ms       │ 3.726 ms      │ 3.524 ms      │ 3.528 ms      │ 100     │ 100
   ├─ 44100     2.301 ms      │ 3.162 ms      │ 2.446 ms      │ 2.527 ms      │ 100     │ 100
   ├─ 48000     6.308 ms      │ 12.88 ms      │ 6.348 ms      │ 6.5 ms        │ 100     │ 100
   ├─ 88200     9.887 ms      │ 10.62 ms      │ 10.21 ms      │ 10.2 ms       │ 100     │ 100
   ├─ 96000     11.56 ms      │ 21.68 ms      │ 11.84 ms      │ 12.19 ms      │ 100     │ 100
   ├─ 176400    19.08 ms      │ 25.25 ms      │ 19.1 ms       │ 19.23 ms      │ 100     │ 100
   ├─ 192000    22.2 ms       │ 22.92 ms      │ 22.23 ms      │ 22.24 ms      │ 100     │ 100
   ├─ 352800    38.78 ms      │ 39.18 ms      │ 38.81 ms      │ 38.83 ms      │ 100     │ 100
   ╰─ 384000    44.21 ms      │ 45.07 ms      │ 44.25 ms      │ 44.28 ms      │ 100     │ 100

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 24, 2024

benchmarks this pr on my machine:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.414 ms      │ 2.701 ms      │ 2.508 ms      │ 2.515 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.671 ms      │ 2.755 ms      │ 2.692 ms      │ 2.693 ms      │ 100     │ 100
   ├─ 11025       2.416 ms      │ 2.682 ms      │ 2.433 ms      │ 2.436 ms      │ 100     │ 100
   ├─ 16000       3.723 ms      │ 3.812 ms      │ 3.775 ms      │ 3.772 ms      │ 100     │ 100
   ├─ 22050       3.505 ms      │ 3.62 ms       │ 3.527 ms      │ 3.53 ms       │ 100     │ 100
   ├─ 44100       2.384 ms      │ 3.039 ms      │ 2.456 ms      │ 2.486 ms      │ 100     │ 100
   ├─ 48000       7.421 ms      │ 7.512 ms      │ 7.435 ms      │ 7.439 ms      │ 100     │ 100
   ├─ 88200       11.22 ms      │ 11.38 ms      │ 11.31 ms      │ 11.31 ms      │ 100     │ 100
   ├─ 96000       12.87 ms      │ 13.2 ms       │ 13.18 ms      │ 13.17 ms      │ 100     │ 100
   ├─ 176400      20.75 ms      │ 20.91 ms      │ 20.85 ms      │ 20.85 ms      │ 100     │ 100
   ├─ 192000      23.73 ms      │ 27.32 ms      │ 23.94 ms      │ 23.99 ms      │ 100     │ 100
   ├─ 352800      40.9 ms       │ 41.99 ms      │ 40.95 ms      │ 41 ms         │ 100     │ 100
   ╰─ 384000      46.4 ms       │ 47.51 ms      │ 46.45 ms      │ 46.48 ms      │ 100     │ 100

main branch:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.294 ms      │ 5.027 ms      │ 2.498 ms      │ 2.591 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.129 ms      │ 2.307 ms      │ 2.151 ms      │ 2.153 ms      │ 100     │ 100
   ├─ 11025       2.386 ms      │ 2.444 ms      │ 2.4 ms        │ 2.402 ms      │ 100     │ 100
   ├─ 16000       3.005 ms      │ 5.783 ms      │ 3.042 ms      │ 3.075 ms      │ 100     │ 100
   ├─ 22050       3.479 ms      │ 4.052 ms      │ 3.49 ms       │ 3.497 ms      │ 100     │ 100
   ├─ 44100       2.283 ms      │ 3.367 ms      │ 2.504 ms      │ 2.574 ms      │ 100     │ 100
   ├─ 48000       6.254 ms      │ 7.076 ms      │ 6.26 ms       │ 6.272 ms      │ 100     │ 100
   ├─ 88200       9.795 ms      │ 20.01 ms      │ 10.09 ms      │ 10.24 ms      │ 100     │ 100
   ├─ 96000       11.52 ms      │ 21.97 ms      │ 11.53 ms      │ 11.74 ms      │ 100     │ 100
   ├─ 176400      19.02 ms      │ 19.15 ms      │ 19.03 ms      │ 19.03 ms      │ 100     │ 100
   ├─ 192000      22.13 ms      │ 22.26 ms      │ 22.14 ms      │ 22.15 ms      │ 100     │ 100
   ├─ 352800      38.63 ms      │ 38.76 ms      │ 38.65 ms      │ 38.66 ms      │ 100     │ 100
   ╰─ 384000      44.03 ms      │ 44.17 ms      │ 44.06 ms      │ 44.06 ms      │ 100     │ 100

So that's a 5 to 10% slowdown. A bit too much. Lets see what causes it

@dvdsk
Copy link
Collaborator

dvdsk commented Nov 24, 2024

with the try from in sample.rs replaced with a cast performance increases a tiny bit however it does not go back to what it was.

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.378 ms      │ 2.863 ms      │ 2.412 ms      │ 2.441 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.651 ms      │ 2.736 ms      │ 2.681 ms      │ 2.681 ms      │ 100     │ 100
   ├─ 11025       2.374 ms      │ 2.436 ms      │ 2.395 ms      │ 2.397 ms      │ 100     │ 100
   ├─ 16000       3.665 ms      │ 3.826 ms      │ 3.764 ms      │ 3.762 ms      │ 100     │ 100
   ├─ 22050       3.461 ms      │ 5.185 ms      │ 3.501 ms      │ 3.53 ms       │ 100     │ 100
   ├─ 44100       2.377 ms      │ 2.83 ms       │ 2.4 ms        │ 2.428 ms      │ 100     │ 100
   ├─ 48000       7.217 ms      │ 13.47 ms      │ 7.375 ms      │ 7.502 ms      │ 100     │ 100
   ├─ 88200       10.86 ms      │ 15.74 ms      │ 11.03 ms      │ 11.1 ms       │ 100     │ 100
   ├─ 96000       12.44 ms      │ 15.84 ms      │ 12.9 ms       │ 12.94 ms      │ 100     │ 100
   ├─ 176400      20.03 ms      │ 20.73 ms      │ 20.51 ms      │ 20.36 ms      │ 100     │ 100
   ├─ 192000      23.3 ms       │ 23.87 ms      │ 23.44 ms      │ 23.45 ms      │ 100     │ 100
   ├─ 352800      39.94 ms      │ 40.4 ms       │ 40.12 ms      │ 40.13 ms      │ 100     │ 100
   ╰─ 384000      44.42 ms      │ 45.61 ms      │ 44.47 ms      │ 44.59 ms      │ 100     │ 100

this pr with changes to sample.rs reverted, again matches main branch in perf:

Timer precision: 20 ns
resampler         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ no_resampling  2.304 ms      │ 2.825 ms      │ 2.429 ms      │ 2.447 ms      │ 100     │ 100
╰─ resample_to                  │               │               │               │         │
   ├─ 8000        2.123 ms      │ 2.217 ms      │ 2.17 ms       │ 2.168 ms      │ 100     │ 100
   ├─ 11025       2.372 ms      │ 4.046 ms      │ 2.387 ms      │ 2.408 ms      │ 100     │ 100
   ├─ 16000       2.982 ms      │ 3.104 ms      │ 3.035 ms      │ 3.03 ms       │ 100     │ 100
   ├─ 22050       3.486 ms      │ 6.186 ms      │ 3.497 ms      │ 3.525 ms      │ 100     │ 100
   ├─ 44100       2.287 ms      │ 2.83 ms       │ 2.384 ms      │ 2.423 ms      │ 100     │ 100
   ├─ 48000       6.27 ms       │ 6.387 ms      │ 6.279 ms      │ 6.282 ms      │ 100     │ 100
   ├─ 88200       9.841 ms      │ 10.78 ms      │ 10.14 ms      │ 10.15 ms      │ 100     │ 100
   ├─ 96000       11.77 ms      │ 13.12 ms      │ 11.8 ms       │ 11.85 ms      │ 100     │ 100
   ├─ 176400      19.11 ms      │ 20.49 ms      │ 19.14 ms      │ 19.17 ms      │ 100     │ 100
   ├─ 192000      22.23 ms      │ 23.51 ms      │ 22.26 ms      │ 22.28 ms      │ 100     │ 100
   ├─ 352800      38.82 ms      │ 41.43 ms      │ 38.84 ms      │ 38.87 ms      │ 100     │ 100
   ╰─ 384000      44.25 ms      │ 45.22 ms      │ 44.28 ms      │ 44.31 ms      │ 100     │ 100

The switch from i32 to i64 is probably to blame. Using i32 might enable the compiler to emit better/any SIMD. We won't know without looking at the assembly.

We could keep the try_into.expect(...) bit however lets make it only use that on a debug build. Do we absolutely need the change from i32 to i64?

@PetrGlad
Copy link
Contributor Author

@dvdsk Security-wise there is no strict need in checked cast but u32 * u32 may overflow in some cases. AFAIK overlapped value will wrap over zero in release build, and panic in debug mode. I'd expected some clicks or distortion in case of overflows and panics in debug mode (which was triggered by quickcheck). The final interpolated value should be in range, though.

One could use smaller integers for the ratios (I doubt u32 precision is actually necessary there), or use f32 as interpolation coefficient. I think smaller integers, like u16 can be an approximation, maybe even something crude, like dropping some least significant bits in the u32 numerator and divider when one of those has significant bits beyond 16 position. Or finding some other approximate value pair.

@PetrGlad
Copy link
Contributor Author

@dvdsk Security-wise there is no strict need in checked cast but u32 * u32 may overflow in some cases. AFAIK overlapped value will wrap over zero in release build, and panic in debug mode. I'd expected some clicks or distortion in case of overflows and panics in debug mode (which was triggered by quickcheck). The final interpolated value should be in range, though.

One could use smaller integers for the ratios (I doubt u32 precision is actually necessary there), or use f32 as interpolation coefficient. I think smaller integers, like u16 can be an approximation, maybe even something crude, like dropping some least significant bits in the u32 numerator and divider when one of those has significant bits beyond 16 position. Or finding some other approximate value pair.
E.g. num crate provides such algorithm:

let ratio = num::rational::Rational32::approximate_float(2.3f32).unwrap();
dbg!(ratio.numer(), ratio.denom());

Maybe making this more reliable can be a separate task...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants