Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIR实验结果异常 #26

Open
wangzhihao-coder opened this issue Oct 9, 2024 · 4 comments
Open

TIR实验结果异常 #26

wangzhihao-coder opened this issue Oct 9, 2024 · 4 comments

Comments

@wangzhihao-coder
Copy link

我参考TIR的prompt在qwen2.5-Math的1.5B和7B模型上进行了实验,得到的指标结果比COT差,我怀疑我的实现缺少了一些步骤,能说明下更详细的实现方式嘛?
我参考下面的prompt实现了TIR

# TIR
messages = [
    {"role": "system", "content": "Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."},
    {"role": "user", "content": prompt}
]
@rangmiao
Copy link

rangmiao commented Oct 9, 2024

我也遇到了同样的问题

@hav4ik
Copy link

hav4ik commented Oct 20, 2024

Hi, same for me, for both 1.5B and 7B. My numbers so far:

Model MATH CoT MATH CoT (maj@8) MATH TIR
Qwen2.5-1.5B-Instruct 0.7646 N/A 0.7534
Qwen2.5-7B-Instruct 0.8408 0.8804 0.8088

The CoT numbers are a bit higher than reported because I used MARIO_EVAL to compare results with ground truth. I'm using vLLM for inference, with top_p=1.0 and temperature=0.


Failure cases

I checked a few failure cases and they seemed strange to me. For example, problem 50 in MATH-test benchmark. The model used a function sympy.complete_square, which clearly does not exist:

<|im_start|>system
Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
Find the radius of the circle with equation $x^2 + 8x + y^2 - 6y = 0$.<|im_end|>
<|im_start|>assistant
To find the radius of the circle given by the equation \(x^2 + 8x + y^2 - 6y = 0\), we need to rewrite the equation in the standard form of a circle's equation \((x - h)^2 + (y - k)^2 = r^2\). This process is called completing the square.

Let's break it down step-by-step:

1. Start with the given equation:
   \[
   x^2 + 8x + y^2 - 6y = 0
   \]

2. Complete the square for the \(x\) terms. The coefficient of \(x\) is 8, so we take half of 8 (which is 4) and square it (which is 16). Add and subtract 16 inside the equation:
   \[
   x^2 + 8x + 16 - 16 + y^2 - 6y = 0
   \]
   This can be rewritten as:
   \[
   (x + 4)^2 - 16 + y^2 - 6y = 0
   \]

3. Next, complete the square for the \(y\) terms. The coefficient of \(y\) is -6, so we take half of -6 (which is -3) and square it (which is 9). Add and subtract 9 inside the equation:
   \[
   (x + 4)^2 - 16 + y^2 - 6y + 9 - 9 = 0
   \]
   This can be rewritten as:
   \[
   (x + 4)^2 + (y - 3)^2 - 25 = 0
   \]

4. Finally, move the constant term to the right side of the equation:
   \[
   (x + 4)^2 + (y - 3)^2 = 25
   \]

Now the equation is in the standard form \((x - h)^2 + (y - k)^2 = r^2\), where \((h, k)\) is the center of the circle and \(r\) is the radius. From the equation \((x + 4)^2 + (y - 3)^2 = 25\), we can see that the center of the circle is \((-4, 3)\) and the radius \(r\) is \(\sqrt{25} = 5\).

So, the radius of the circle is \(5\).

Let's confirm this with Python code.
```python
import sympy as sp

# Define the variables
x, y = sp.symbols('x y')

# Define the equation of the circle
equation = x**2 + 8*x + y**2 - 6*y

# Complete the square for x and y terms
x_term = sp.complete_square(x**2 + 8*x, full=True)[0]
y_term = sp.complete_square(y**2 - 6*y, full=True)[0]

# Rewrite the equation
completed_square_eq = x_term + y_term - sp.complete_square(x**2 + 8*x, full=True)[1] - sp.complete_square(y**2 - 6*y, full=True)[1]

# Simplify the equation
completed_square_eq = sp.simplify(completed_square_eq)

# Extract the radius
radius_squared = completed_square_eq.rhs
radius = sp.sqrt(radius_squared)

print(radius)

Does Qwen team has an in-house version of Sympy that contains functions not existing in the main repo?

@hengck23
Copy link

hengck23 commented Oct 23, 2024

Python code could be hallucinated. That explains function that cannot be found?

maybe check this:
https://github.com/QwenLM/Qwen-Agent/blob/a9ef165971d37ac37ec1df9e565ca46a38b0afab/examples/tir_math.py

# We use the following two systems to distinguish between COT mode and TIR mode
TIR_SYSTEM = """Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."""
COT_SYSTEM = """Please reason step by step, and put your final answer within \\boxed{}."""
            'The dependencies for Python Executor support are not installed. '
            'Please install the required dependencies by running: pip install "qwen-agent[python_executor]"') from e


@XinXU-USTC
Copy link

The same problem here. For 7B-instruct, I got 77% on GSM8K with TIR and 95.6% with CoT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants