Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a set of common throughput calculation functions #116

Open
littlespex opened this issue Oct 28, 2024 · 7 comments
Open

Provide a set of common throughput calculation functions #116

littlespex opened this issue Oct 28, 2024 · 7 comments

Comments

@littlespex
Copy link
Collaborator

Provide a set of common throughput calculation functions like ewma, slidingWindow, etc.

@Beraliv
Copy link

Beraliv commented Oct 30, 2024

@littlespex I'd like to implement those functions, if no one picked it up yet.

I already looked at shaka-player (i.e. Simple ABR manager with EWMA bandwidth estimate, which had alpha, sample and estimate calculations) and dash.js (i.e. Throughput Model, similar to shaka, which had alpha, sample, estimate and sliding window built-in).

I saw some network-related calculations, such as current throughput based on Connection API, time calculations (i.e. Bps/Kbps, ms/s), etc

I'll have another look at hls.js later today.

@dsilhavy
Copy link

For dash.js please also check the new ThroughputController: https://github.com/Dash-Industry-Forum/dash.js/blob/development/src/streaming/controllers/ThroughputController.js. A documentation on the different calculation modes is available here: https://dashif.org/dash.js/pages/advanced/abr/throughput-calculation.html.

I think we also need to distinguish between ways to access the current throughput/required information like using the Network Information API, Resource Timing API etc and actually calculating/estimating the current throughput using methods like EWMA, Harmonic Mean, Arithmetic Mean etc.

@Beraliv
Copy link

Beraliv commented Oct 30, 2024

in hls.js, there are similar things (i.e. EwmaBandWidthEstimator with EWMA)

@Beraliv
Copy link

Beraliv commented Oct 30, 2024

Overall, the majority of OSS players are aligned with the exception that dash.js has multiple throughput modes (3x different arithm. mean approaches, 3x harmonic mean approaches, EWMA and Zlema)

@Beraliv
Copy link

Beraliv commented Oct 30, 2024

1st version of a high-level design:

// Generic utilities

type ValueOf<T> = T[keyof T];

const ThroughputCalculationType = {
    EMWA: 'emwa',
    ZLEMA: 'zlema'
} as const;

type ThroughputCalculationType = ValueOf<typeof ThroughputCalculationType>;

// Calculation API

interface ThroughputCalculationState {
    estimate: number;
    sample: number[];
    sampleSize: number;
}

type ThroughputCalculation = (
    state: ThroughputCalculationState
) => number;

// Factory API

interface ThroughputCalculationFactoryOptions {
    type: ThroughputCalculationType;
}

declare const throughputCalculationFactory: (
    options: ThroughputCalculationFactoryOptions
) => ThroughputCalculationFactory;

A journey (from bottom to top):

  • Factory API uses ThroughputCalculationFactoryOptions to specify a type of what calculator API we want to use (the majority of OSS players will use EWMA, while dash.js can specify what it needs)
  • ThroughputCalculationType will contain the type of estimates that we'd like to calculate (enum-like, but using as const + ValueOf), e.g. EWMA, ZLEMA, etc
  • Once, the instance of ThroughputCalculation is created, an engineer will have to create a state of type ThroughputCalculationState (which may be dependent on ThroughputCalculationType, but for now I just simplified it) - this is expected to be scalable, so if we have more types of calculations, we just add more interfaces
  • ThroughputCalculation
    • Initially, I wanted to make ThroughputCalculation a pure function, but we had to call state = calculator(state) and then access an estimate by calling state.estimate
    • I ended up optimising ThroughputCalculation by mutating a state in each call and returning useful information (i.e. calculator focuses on estimate, therefore estimate will be a return value), e.g. estimate = calculator(state)

TS Playground for you to see the whole picture - https://tsplay.dev/w6DJYm

With this solution, there is no "Adding points to the state" API

As possible alternative, I've extended ThroughputCalculation to have both an estimate getter and an add method - https://tsplay.dev/WPQYqw (with a further suggestion to rename ThroughputCalculation to something more meaningful, e.g. ThroughputController, but as it already exists in dash.js, it would be wise to use another name. If you have suggestion or best practices, please LMK)

Another solution is to have a separate function to process data, e.g. of a type ThroughputSlidingWindow - https://tsplay.dev/w8Derw

CC @littlespex

@littlespex
Copy link
Collaborator Author

I'm still reviewing this, but a few questions:

  • With factory function / enum approach, how are configurable parameters, like halfLife, passed to calculation function?
  • What does sampleSize represent? Wouldn't that just be the length of the samples array?
  • Should the samples array be an object instead of just a number? Looking at the ewma classes in hls.js and shaka-player, the sample functions need a weight and a value, in both cases weight is the duration in ms and value is the number of bytes loaded. I think at minimum it would need to be something like:
    type ThroughputSample = { duration: number, bytes: number };
    
    type ThroughputCalculationState = {
      estimate: number;
      samples: ThroughputSample[];
    }
  • Why can't the calculation be a pure function? When a new estimate is needed, a new state object would be passed to the calculation call. It would be the player/controller's job to keep track of the samples and previous estimate.
  • I'm also not clear on the "sliding window" example. Couldn't that be achieved with composition:
    const ewma = ewmaFactory(2);
    const ewmaSliding = (state: ThroughputCalculationState) =>  {
      // only process last 10 samples
      return ewma({...state, samples: state.samples.slice(-10));
    }

@Beraliv
Copy link

Beraliv commented Nov 17, 2024

@littlespex answering your questions below:

With factory function / enum approach, how are configurable parameters, like halfLife, passed to calculation function?

This depends on whether the configurable parameter is static or dynamic.

(1) If it's dynamic, it has to be a part of ThroughputCalculationState (and because we have multiple ThroughputCalculationType, it will be a part of a particular type, e.g. EWMA)

(2) If it's static, it has to be moved to ThroughputCalculationFactoryOptions.

I believe that halfLife is static, therefore it would be placed in ThroughputCalculationFactoryOptions (e.g. option 2)

What does sampleSize represent? Wouldn't that just be the length of the samples array?

sampleSize is expected to be the maximum size of the samples array. I agree that it's unlikely to be dynamic, therefore it's possible to move it to ThroughputCalculationFactoryOptions.

Should the samples array be an object instead of just a number? Looking at the ewma classes in hls.js and shaka-player, the sample functions need a weight and a value, in both cases weight is the duration in ms and value is the number of bytes loaded.

I completely agree.

Why can't the calculation be a pure function? When a new estimate is needed, a new state object would be passed to the calculation call. It would be the player/controller's job to keep track of the samples and previous estimate.

I'd like to elaborate on my decision.

Let's have a look at the high-level requirements and evaluate the benefits/disadvantages.

The requirements are:

  • CML provides a high-order function for throughput calculations (such as EWMA) based on its dependencies (e.g. for EMWA, there are an estimate in bytes/bits and weight in seconds)
    • The dependencies have to be updated on each HTTP request (for both audio and video), so it's important that adding method is efficient in terms of time/space complexity
    • EMWA (used across all projects) depends on a half life (static) and an estimate in bytes/bits and weight in seconds (dynamic)
    • Other estimates, such as Zlema, depend on sample size (configurable for player instance) and an estimate in bytes/bits and weight in seconds (dynamic)
  • As mentioned above, Sliding Window applies to some throughput calculations (everything but EWMA), therefore throughput calculation has to be flexible

A pure function:

  • Immutability (i.e. no side-effects), therefore more readable and easier to debug
  • No need to manage a state (it will be done in the Player/Controller)
  • Possible performance impact (a new array for each entry, i.e. linear time complexity, and possible GC usage spikes) - I'm concerned here

A function with a mutation:

  • Mutable (i.e. side-effects), therefore less readable and more difficult to debug
  • We have to manage a state (no need to move this logic to the Player/Controller)
  • Performance-optimal (i.e. no need to create a new array for each entry, i.e. near-constant time complexity, and less impact on GC)

I know that it doesn't make sense to prematurely optimise the solution, therefore I'm open to suggestions.

I'm also not clear on the "sliding window" example. Couldn't that be achieved with composition:

Yes, you're right. If we agree that the SlidingWindow and ThroughputCalculator APIs are compatible, we will be able to compose these 2 functions to get the expected result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants