Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于模态融合是否引起效果下降的疑问 #113

Open
xiaodongyichuan opened this issue Oct 28, 2024 · 1 comment
Open

关于模态融合是否引起效果下降的疑问 #113

xiaodongyichuan opened this issue Oct 28, 2024 · 1 comment

Comments

@xiaodongyichuan
Copy link

首先感谢优化的工作,真的感觉是一个很惊艳的多模态交互解决方案。但是还存在以下几点疑问,请问是否有解决方案。
是否可以在输入中加入图片音频文本全部的编码?这样可以方便RAG技术的引入。
第二点,关于现有模型的理解能力,使用T1A2返回的结果不如,使用纯文本输出文本的效果,引入音频输出似乎使效果变更差了。请问什么原因?
训练代码是否可以开源?谢谢。

@xiaodongyichuan
Copy link
Author

相同的文本输入,T1A2输出的结果是错的,T1T2输出是对的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant