Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tech report中提到RM是基于SFT模型训练的,但是SFT模型的数据筛选是基于RM的top-k,这中间的先后顺序是怎样的呢? #31

Open
HCHCXY opened this issue Oct 23, 2024 · 1 comment

Comments

@HCHCXY
Copy link

HCHCXY commented Oct 23, 2024

No description provided.

@zhenruzhang
Copy link
Contributor

SFT和RM是互相迭代促进的,最终版本RM是基于最新SFT模型,SFT筛选数据时用RM的中间版本

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants