Utilizing Deepgram as a Websocket (Question about Multichannel streaming) #993
Replies: 8 comments
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
Below are some additional requested details about my use case: Language: Java |
Beta Was this translation helpful? Give feedback.
-
I was able to come up with a solution for combining audio streams for both inbound and outbound callers but now the issue is with returning the transcription results for both legs of a call. Currently, this is the URL I am utilizing for the deepgram websocket: wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=8000&model=nova-2-phonecall&interim_results=true&endpointing=300&utterance_end_ms=1200&multichannel=true&channels=2&vad_events=true&smart_format=true I've seen in other discussions that users were having issues with just having multichannel set to true, and I find that to be my case as well. When channels is not present in the URL .. no transcription results are returned to my client side application. On the other hand, if channels is set to 2, I only receive the outbound transcription for the call. |
Beta Was this translation helpful? Give feedback.
-
Hi @JlossBustin, streaming audio with |
Beta Was this translation helpful? Give feedback.
-
Hi Julia,
I understand the functionality of multichannel. However the issue is with returning both channels’ respective transcripts.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Julia Kroll ***@***.***>
Sent: Monday, November 11, 2024 5:56:20 PM
To: deepgram/community ***@***.***>
Cc: Justin Bloss ***@***.***>; Mention ***@***.***>
Subject: Re: [deepgram/community] Utilizing Deepgram as a Websocket (Question about Multichannel streaming) (Discussion #993)
Hi @JlossBustin<https://github.com/JlossBustin>, streaming audio with multichannel=true returns the channel_index field, which is an array of two items. The first value is the channel index (0-indexing), and the second value is the total number of channels. So a transcription response with "channel_index": [0, 2] means that the transcript is for channel #0 (the first channel), out of 2 total channels. See our docs section<https://developers.deepgram.com/docs/multichannel#streaming-response> for further explanation.
—
Reply to this email directly, view it on GitHub<#993 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQZHETD4ZRQ2OOYAL75YQGT2AEYZJAVCNFSM6AAAAABRSJRBSKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRRHE2TENY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi @JlossBustin, if you can provide a Deepgram request ID ( You mentioned you found "a solution for combining audio streams for both inbound and outbound callers" - are you still sending two-channel audio to Deepgram, or are you combining the channels such that it becomes mono audio? |
Beta Was this translation helpful? Give feedback.
-
Hi there, it appears I am sending two-channel audio to Deepgram. The issue with only receiving the live transcription for one leg of the call is due to an error in my processing outside of the Deepgram websocket. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to understand the lengths to which deepgram will work as a websocket, providing live transcriptions on the fly to my client-side application.
Currently, I am forwarding raw RTP data (inbound & outbound) to the websocket, to my knowledge there is no way for Deepgram to pick up on which direction is which and assign a channel (have channels set to 2 in the URL parameters). I have the option of sending the inbound and outbound RTP data with a header byte defining the direction of the audio (1 or 2) and was wondering if I could somehow get Deepgram to pick up on this and return back a respective channel based on the header byte?
Beta Was this translation helpful? Give feedback.
All reactions