-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llm large reference #1915
base: master
Are you sure you want to change the base?
Llm large reference #1915
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
3b33ce1
to
7be9b13
Compare
no_eos_ids = [] | ||
for qid, output in tqdm(run_outputs.items()): | ||
L = list(output) | ||
# Prune trailing 2s (EOS token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOS ID is not 2 for llama-405B. Need to use tokenizer.eos_token_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of consolidate_results.py
? I copied it from Llama2 but don't know why it is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not be needed since we don't split pickle anymore. Please remove if not needed.
predictions=preds, references=targets, use_stemmer=True, use_aggregator=False | ||
) | ||
|
||
assert len(rouge_scores["rouge1"]) == 24576 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
self.model = LLM( | ||
self.model_path, | ||
dtype=self.dtype, | ||
tensor_parallel_size=self.tensor_parallel_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we are not using AsyncLLMEngine here? It may be more efficient since it will support continous batching
|
||
def main(args): | ||
# Set up decode and evaluation objects | ||
tokenizer = LlamaTokenizerFast.from_pretrained(args.model_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i may be wrong, but would it better to use AutoTokenizer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also suggest to use AutoTokenizer for robustness
5da4409
to
e44c62a
Compare
3fbdbb0
to
fe9c189
Compare
No description provided.