-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TIR实验结果异常 #26
Comments
我也遇到了同样的问题 |
Hi, same for me, for both 1.5B and 7B. My numbers so far:
The CoT numbers are a bit higher than reported because I used MARIO_EVAL to compare results with ground truth. I'm using vLLM for inference, with Failure casesI checked a few failure cases and they seemed strange to me. For example, problem 50 in MATH-test benchmark. The model used a function
Does Qwen team has an in-house version of Sympy that contains functions not existing in the main repo? |
Python code could be hallucinated. That explains function that cannot be found? maybe check this:
|
The same problem here. For 7B-instruct, I got 77% on GSM8K with TIR and 95.6% with CoT. |
我参考TIR的prompt在qwen2.5-Math的1.5B和7B模型上进行了实验,得到的指标结果比COT差,我怀疑我的实现缺少了一些步骤,能说明下更详细的实现方式嘛?
我参考下面的prompt实现了TIR
The text was updated successfully, but these errors were encountered: