-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Feature Requests, STT, TTS, custom commands #49
Comments
STT: https://github.com/hibobmaster/matrix-stt-bot It's not perfect but can somehow meet your need. |
The matrix-stt-bot is great, but it can only transcribe and does not support voice dialogue. Flowise is a bit complex, and unfortunately, it also does not support voice features. |
voice dialogue: you mean TTS funtion? For custom commands, this is the entrypoint: Line 241 in 81543d5
It's hard to maintain new commands at runtime. So which custom commands do you need? |
Voice dialogue involves the use of speech-to-text (STT) and text-to-speech (TTS) technologies. A user speaks a message, and the robot responds in voice, initially generating a text message which is then converted to speech by TTS. Another common practice is to display a widget on the message entry that, when clicked, plays the message in text form. However, Matrix does not have this feature (although Matrix spec 1.4 includes MSC protocols for widgets, it seems no service has implemented this yet.). Thus, asking the robot to generate voice like how it handles images by quoting and tagging the robot makes it inconvenient, as voice dialogue is usually used when typing is not feasible. Therefore, outputting voice directly in a conversation is appropriate, and perhaps displaying two or three messages simultaneously would be clearer: one for the user's voice converted to text, one for the AI-generated text, and one for the TTS voice. One can also envision a scenario where voice calls are used, similar to how chagpt and coilot operate on mobile apps, without the need for text interaction. The program automatically recognizes pauses in the user's tone (some third-party clients, like lobechat, have implemented this), and then responds with voice. Of course, this would involve extensive coding work. I am eager to participate in this project, but unfortunately, I am not familiar with Python, which makes it difficult for me to understand the entire project. Maybe when I have time, I will study it more thoroughly. |
This's another idea: I envision a default dialogue model that can temporarily switch to other models using custom commands, such as !g35(gpt-3.5) or !c3g(claude-3-opus-20240229). Does this project implement models from providers other than OpenAI? By using a baseurl proxy, it is possible to support models from multiple vendors on a single platform (e.g., one-api), though I haven't tested this yet. |
Thank you, @hibobmaster , for developing this incredible project.
Now I would like to ask if it is possible to add some features?
Like text-to-speech, speech-to-text, and custom commands using different prompts and agents would be perfect.
If that were the case, it would be perfect.
The text was updated successfully, but these errors were encountered: