How to Analyze Results #135

jonastuttle · 2024-03-05T17:12:57Z

jonastuttle
Mar 5, 2024

Hello,

I am enjoying this repository, however I seem to be having trouble being able to analyze specific results. For example, I can get the base results as per the tutorial, but I am unable to find how to access specific answers for questions. (i.e. the model's generation for question 143 of HumanEval, and whether it was correct or incorrect)

I was wondering if a feature like this is found within the MultiPL-E repository. The Tutorial's Example Page provides information on generated prompts, however I cannot seem to find where to access the content within my clone.

Answered by arjunguha

Mar 6, 2024

They are produced in .results.json.gz files. I recommend looking at one to see the format.

View full answer

arjunguha · 2024-03-06T15:38:32Z

arjunguha
Mar 6, 2024
Maintainer

They are produced in .results.json.gz files. I recommend looking at one to see the format.

0 replies

arjunguha · 2024-03-06T15:39:30Z

arjunguha
Mar 6, 2024
Maintainer

Alternatively, several model completions from runs done for BigCode are here:

https://huggingface.co/datasets/bigcode/MultiPL-E-completions

1 reply

jonastuttle Mar 7, 2024
Author

Thank you for this great information!

Is there a way to access the overall results for this raw data? for example when running a tutorial it gives this infromation:
humaneval-cpp-codellama_CodeLlama_7b_Instruct_hf-0.8-reworded,10,0.5194978913205424,138,30,40

I have found this beautiful page that gives the overall results. Is this page the total results found from the data you linked?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Analyze Results #135

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

How to Analyze Results #135

jonastuttle Mar 5, 2024

Replies: 2 comments · 1 reply

arjunguha Mar 6, 2024 Maintainer

arjunguha Mar 6, 2024 Maintainer

jonastuttle Mar 7, 2024 Author

jonastuttle
Mar 5, 2024

Replies: 2 comments 1 reply

arjunguha
Mar 6, 2024
Maintainer

arjunguha
Mar 6, 2024
Maintainer

jonastuttle Mar 7, 2024
Author