Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a demo of device speaking ($1500) #1006

Open
kodjima33 opened this issue Oct 8, 2024 · 31 comments
Open

Make a demo of device speaking ($1500) #1006

kodjima33 opened this issue Oct 8, 2024 · 31 comments
Assignees
Labels
Feature Request firmware firmware work flutter flutter work help wanted Extra attention is needed Paid Bounty 💰

Comments

@kodjima33
Copy link
Collaborator

kodjima33 commented Oct 8, 2024

Is your feature request related to a problem? Please describe.
People want to talk to the device. Our v2 device is equipped with a speaker but doesn't speak yet

Describe the solution you'd like
Solution should be exactly like our in-app chat but with audio from device. For example, I click on the button (on device) and ask "hey what's the capital of united states" and device should respond with audio "Washington DC". This response and prompt should be visible on the chat inside of the app

This functionality should be disabable via settings.

Additional context
There was a PR submitted a while ago to make the app speak, take a look at that

This is a paid task. Reward is $1,500 in cash. Simply link your PR to this issue to receive the money

@kodjima33 kodjima33 moved this to Backlog in omi TODO Oct 8, 2024
@kodjima33 kodjima33 changed the title Make a demo of device speaking Make a demo of device speaking ($2000) Oct 8, 2024
@kodjima33 kodjima33 added flutter flutter work firmware firmware work Paid Bounty 💰 labels Oct 8, 2024
@beastoin
Copy link
Collaborator

beastoin commented Oct 9, 2024

sweet!

@kodjima33 kodjima33 added help wanted Extra attention is needed Feature Request labels Oct 9, 2024
@DamienDeepgram
Copy link
Contributor

I can get this working but would be using Deepgram only.

@beastoin
Copy link
Collaborator

Go a head with a PR pls @DamienDeepgram , you don't need a permission to do great things.

@beastoin
Copy link
Collaborator

@hoai265 you too - great firmware dev man

@kodjima33 kodjima33 changed the title Make a demo of device speaking ($2000) Make a demo of device speaking ($1000) Oct 14, 2024
@kodjima33
Copy link
Collaborator Author

Sorry guys had to update the bounty

@beastoin
Copy link
Collaborator

so sad, no one take this :dead: too sweet $1K

@kodjima33 kodjima33 changed the title Make a demo of device speaking ($1000) Make a demo of device speaking ($1500) Oct 25, 2024
@Sanchay-T
Copy link

Hi @kodjima33 , I’d love to take this task. Please assign it to me, and I’ll get started. Thanks!

@beastoin
Copy link
Collaborator

@Sanchay-T yes, pls keep us updated! every 24h would be great ~ just to keep your motivation up!

@Sanchay-T
Copy link

Hi @beastoin 👋

I've been diving into the codebase to understand how we can implement voice responses for the device. Really interesting architecture you've built here! While I'm familiar with backend systems, I'm getting up to speed with some of the Flutter and BLE specifics.

Looking at the current audio pipeline, I can see we're handling real-time streaming through WebSockets pretty elegantly. The socket service in app/lib/services/sockets.dart seems to be the core of this:

Future<TranscriptSegmentSocketService?> socket({
    required BleAudioCodec codec,
    required int sampleRate,
    required String language,
    bool force = false,
}) async {

For implementing the voice response feature, I think we can build on this foundation. I see we're already integrated with OpenAI's APIs in backend/http/openai.dart, which could be extended for text-to-speech capabilities.

I have a few questions about the device interaction part though. Looking at app/lib/services/devices/models.dart, I see how we're handling BLE characteristics:

BluetoothCharacteristic? getCharacteristicByUuid(BluetoothService service, String uuid) {
  return service.characteristics.firstWhereOrNull(
    (characteristic) => characteristic.uuid.str128.toLowerCase() == uuid.toLowerCase(),
  );
}

Before I proceed with the implementation, I wanted to check:

  1. For handling button presses - what would be the best way to detect when the user wants to trigger a voice command? Should we use an existing characteristic or define a new one?

  2. Regarding audio playback through the device's speaker - are there any specific format requirements or limitations I should be aware of?

  3. For the chat interface, I see we're using the Memory system to handle conversations. Would adding voice responses require any significant changes to the current schema?

I have some ideas about the implementation, but wanted to validate these core aspects first to make sure I'm heading in the right direction. Happy to elaborate on any part of this!

Thanks for the help! Looking forward to your insights.

@beastoin
Copy link
Collaborator

1/ what do your propose ? pros / cons.
2/ @kevvz could help ? but you should try it yourself first.
3/ just do it (to know that you're wrong 😏)
no worries man, be creative. let's finish the first draft quickly then we have something to discuss. embracing the changes( good changes :))

@Sanchay-T

@DamienDeepgram
Copy link
Contributor

DamienDeepgram commented Oct 31, 2024

Deepgram also has TTS so you could use the same sdk I think that the speech to text is using. Not sure if Omi has a preference there tho

see: https://pub.dev/packages/deepgram_speech_to_text#text-to-speech

@kodjima33
Copy link
Collaborator Author

Removed @Sanchay-T from assigned - no progress

@beastoin let's try to not assign people if they didn't yet have PRs submitted. We assign only to those who had PRs. Others will need to do a PR first.

@DamienDeepgram try it out bro - looking forward!

@Sanchay-T
Copy link

Sanchay-T commented Nov 2, 2024

Hi @beastoin and @kodjima33

I wanted to clarify the situation regarding my previous assignment. First, I apologize for the delay in updates - I was away for Diwali celebrations in my hometown, which affected my response time. However, I want to assure you that I've been actively working on this in the background:

  1. I've been going through the codebase thoroughly, particularly focusing on the audio pipeline and BLE integration
  2. While I have less experience with Flutter/Dart specifically, I bring relevant experience with speech/text models which I believe will be valuable for this feature
  3. I'm currently working on implementing a proof-of-concept to address the questions I raised earlier, particularly around:
    • Button press handling for voice command triggering
    • Audio playback implementation
    • Memory system integration for voice responses

I understand the policy about assignments and PRs, and I'm committed to submitting a PR with my implementation soon. I should have communicated my temporary absence better, and I appreciate your patience.

Would it be alright if I continue working on this feature and submit a PR for review? I'm happy to share my current progress in more detail if helpful.

Thanks for understanding!

@beastoin
Copy link
Collaborator

beastoin commented Nov 2, 2024

checking speaker functions of current firmware...

@hoai265
Copy link
Contributor

hoai265 commented Nov 2, 2024

Removed @Sanchay-T from assigned - no progress

@beastoin let's try to not assign people if they didn't yet have PRs submitted. We assign only to those who had PRs. Others will need to do a PR first.

@DamienDeepgram try it out bro - looking forward!

Hi @kodjima33 What if we add a separate option to route audio to the phone's output, like AirPods? I think this would be another option for users, as they could listen privately.

@beastoin
Copy link
Collaborator

beastoin commented Nov 2, 2024

@DamienDeepgram don't forget to ref your PR ;)

about the preference to implement this task, be creative. but i think Nik's description is good/simple enough to roll out l the first draft. smth likes ~

1/ the user press the button in the device and say something
2/ the device send that voice to the app
3/ the app send the voice message to the backend
4/ the backend process the voice message then response to the app with audio bytes
5/ the app send the audio bytes to the device
6/ the device speak it out loud.

hope that helps.

@vincentkoc
Copy link
Contributor

speaker should support playback over BT, i dont might looking into this after apple watch PR as it will use a similar two-way transport.

@vincentkoc
Copy link
Contributor

I have started on this, but need to get some sleep #1243 - i think even before i flashed new firmware any button click on my (red) devkit2 causes a fatal crash - not sure if the shipped devkit2's have a different setup with button? Could be a different pin/setup for the button causing this issue.

Code in WIP includes all the BLE setup to stream and handle the stream on desktop side (Python). Once finalised can move to dart code.

@kodjima33
Copy link
Collaborator Author

kodjima33 commented Nov 3, 2024

@Sanchay-T bro no worries, just keep building this and try to make it work. No one blames you - it's just we assign issues only after first PR

@vincentkoc @DamienDeepgram guys I believe in you. Let's make this work! (ideally today)

You are both working on this, if you both make it work, I'll make a post about both of you and we will solve the bounty issue

@beastoin
Copy link
Collaborator

beastoin commented Nov 4, 2024

fighting 💪

@DamienDeepgram
Copy link
Contributor

@DamienDeepgram don't forget to ref your PR ;)

Sorry yes here is the PR with the issue with playback not streaming correctly

#1246

@ombhojane
Copy link

Hey @kodjima33 @beastoin
Is this issue been resolved? I've gone through discussions and am willing to develop this, may I proceed?

@beastoin
Copy link
Collaborator

Please go a head and keep us updated @ombhojane

@ombhojane
Copy link

Sure @beastoin

@ombhojane
Copy link

@beastoin
I'm stucked at setup the project.
With Omi's instructions, I did setup, at last stage it was building android gradle files, it downloaded more data than expected.
So more time was gone in setting up the things, and still figuring out.
Need to see what's going on and how to fix.

@himmat12
Copy link

So how do we integrate this feature if we do not have Omi dev kit devices? Or, can we implement this feature in our Android/iOS device, and if it works there, then it works with OMI devices?

@beastoin
Copy link
Collaborator

@himmat12 you already know the answer man. the ticket title is super clear.

@beastoin
Copy link
Collaborator

@ombhojane how's it going?

if you want to get this tiket done - building the app / the firmware is a basic requirement.

@ombhojane
Copy link

Hey @beastoin I'll figure this out today, yesterday was my exam.
I'll try manual installation once.

@ombhojane
Copy link

Hii @beastoin
I've set up the Omi. Now I'm looking to fix the issue, I'll update the progress

@vincentkoc
Copy link
Contributor

@beastoin @kodjima33 is this fixed with #1452 or work still required?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request firmware firmware work flutter flutter work help wanted Extra attention is needed Paid Bounty 💰
Projects
Status: No status
Development

No branches or pull requests

8 participants