Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research: Design of llama-bench #10386

Open
1 of 5 tasks
jumbo-q opened this issue Nov 18, 2024 · 2 comments
Open
1 of 5 tasks

Research: Design of llama-bench #10386

jumbo-q opened this issue Nov 18, 2024 · 2 comments

Comments

@jumbo-q
Copy link

jumbo-q commented Nov 18, 2024

Research Stage

  • Background Research (Let's try to avoid reinventing the wheel)
  • Hypothesis Formed (How do you think this will work and it's effect?)
  • Strategy / Implementation Forming
  • Analysis of results
  • Debrief / Documentation (So people in the future can learn from us)

Previous existing literature and research

Dear
How the bench designed to test the efficiency of performance
whats the batch mean in the bench and what is tested

Hypothesis

No response

Implementation

No response

Analysis

No response

Relevant log output

No response

@JohannesGaessler
Copy link
Collaborator

@jumbo-q
Copy link
Author

jumbo-q commented Nov 18, 2024

Thanks u Ive seen it before
theres two Questions:

  1. Is the content of this batch input self-defined, similar to Some other Infer framework or is there a specific dataset for it? Or other operations?
  2. The output time only provides the average and variance for each token. How is this time calculated? Is it the mean and variance over multiple runs? Also, what part of the execution is being timed? From which point to which point is the timing measured?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants