Test dataset of questions to score reasoning #4

sapph1re · 2023-08-09T07:26:54Z

This indeed greatly improves prompting, although one question may be not very representative for the whole approach. To measure suggested solutions properly, shall we create a test dataset of questions to evaluate the results that we get from each prompt?

dave1010 · 2023-08-09T17:57:03Z

A test dataset would be a great idea.

There are many frameworks for testing LLMs available now, such as https://github.com/openai/human-eval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test dataset of questions to score reasoning #4

Test dataset of questions to score reasoning #4

sapph1re commented Aug 9, 2023 •

edited

Loading

dave1010 commented Aug 9, 2023 •

edited

Loading

Test dataset of questions to score reasoning #4

Test dataset of questions to score reasoning #4

Comments

sapph1re commented Aug 9, 2023 • edited Loading

dave1010 commented Aug 9, 2023 • edited Loading

sapph1re commented Aug 9, 2023 •

edited

Loading

dave1010 commented Aug 9, 2023 •

edited

Loading