Replies: 7 comments
-
Tagging @mgree Regarding Case 1: I'm not even sure what the right thing to do here is! What you get will depend on what tools the generated script will shell out to. For example, here is another solution: incr_list {
python3 -c "print(5 * 3 / 2)"
} This produces "7.5\n". |
Beta Was this translation helpful? Give feedback.
-
About Case 2: Here is my hand-written fix. I've edit both the solution and the tests:
This produces:
I am not sure it is reasonable to prompt a model to produce this solution. I also think its worse than the model generated solution. I think exact-matching on Bash results is a losing proposition. What we should instead do is something fuzzier, but that will be tricky to automate in the MultiPL-E style. |
Beta Was this translation helpful? Give feedback.
-
Also, I have tons solutions where it produces Python, from all sorts of models. |
Beta Was this translation helpful? Give feedback.
-
I see, yeah I don't see an obvious solution to these problems, perhaps steering away from Bash might be the best solution. Thanks! |
Beta Was this translation helpful? Give feedback.
-
I'm going to leave this issue open. It's a warning about interpreting the bash results. |
Beta Was this translation helpful? Give feedback.
-
I think it would not be too hard to write a tester that uses, e.g., $ printf "0.5 == 1 / 2\n0.5 == 1/3" | bc -l
1
0 It would be up to the model to generate code that would produce strings that would be correctly interpreted as floating point numbers, though. |
Beta Was this translation helpful? Give feedback.
-
This loop is outputting a newline after each number. Maybe
The test is failing because of the added newlines. The command substitution |
Beta Was this translation helpful? Give feedback.
-
Failure Case 1
Not sure if this is expected failure case of unit tests in bash. Here is an example
HumanEval_45_triangle_area
If we print the output of
$(candidate "5" "3")
, it is"7.500000000"
, and it is different from the expected"7.5"
, tests fails. Maybe something withbc
to evaluate the numeric value of the strings instead of comparing strings?Failure Case 2
HumanEval_42_incr_list
first test passes, second and third tests fail. And so I printed out the output of each cases.
I tried adding the newline character
\n
to the end of expected values and that didn't work. My lack of knowledge in Bash is not giving me any idea how it might be fixed.. but I don't think this should fail?Beta Was this translation helpful? Give feedback.
All reactions