Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Functionality for Closing Stream (w/o Cancel) Allow Transcription #343

Merged

Conversation

davidvonthenen
Copy link
Contributor

@davidvonthenen davidvonthenen commented Oct 23, 2024

Completion

Proposed changes

Address: #341

The Stop()~ function has been changed to no longer use cancel to trigger the stop, but to fire the Close` message with a small delay, does a "best effort" to allow any final transcription to get passed back to the client.

This also changes the TTS WS example to use linear16 which is more inline with examples in other SDKs.

Types of changes

What types of changes does your code introduce to the community .NET SDK?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update or tests (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have lint'ed all of my code using repo standards
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

NA

Summary by CodeRabbit

  • New Features

    • Enhanced WebSocket client interfaces with new optional parameters for improved control over connection closure.
    • Updated audio output handling to support WAV format and include a WAV header.
  • Bug Fixes

    • Improved error handling and logging during WebSocket connection and disconnection processes.
  • Documentation

    • Added comments to clarify the functionality of new parameters and methods.

Copy link
Contributor

coderabbitai bot commented Oct 23, 2024

Walkthrough

The pull request introduces modifications to several interfaces and classes within the Deepgram.Clients namespace. Key changes include the addition of a bool nullByte parameter, defaulting to false, in the Stop and Close methods of the IListenWebSocketClient and ISpeakWebSocketClient interfaces, as well as in the corresponding Client classes. These changes enhance the functionality of these methods by allowing more control over the WebSocket connection closure process. Additionally, the Program.cs file in the examples directory has been updated to handle audio output in WAV format, including the addition of a WAV header.

Changes

File Change Summary
Deepgram/Clients/Interfaces/v1/IListenWebSocketClient.cs Updated Stop method signature to include bool nullByte = false.
Deepgram/Clients/Interfaces/v1/ISpeakWebSocketClient.cs Updated Stop method signature to include bool nullByte = false and Close method signature to include bool nullByte = false.
Deepgram/Clients/Listen/v1/WebSocket/Client.cs Updated Stop method signature to include bool nullByte = false and modified internal logic to handle closing via SendClose(nullByte) and added a delay for processing before closing.
Deepgram/Clients/Speak/v1/WebSocket/Client.cs Updated Close method signature to include bool nullByte = false and modified Stop method to call updated Close. Streamlined close message sending logic and added logging enhancements.
examples/text-to-speech/websocket/simple/Program.cs Added bool appendWavHeader variable and modified audio output handling to write a WAV header and change output file format from MP3 to WAV, initializing SpeakSchema with Encoding set to "linear16" and SampleRate set to 48000.

Possibly related PRs

  • Reintroduce TTS WS #316: The ISpeakWebSocketClient interface has been modified to include a Stop method with an additional bool nullByte parameter, similar to the changes made in the IListenWebSocketClient interface in the main PR.
  • Implement Length on Send Functions, Expose KeepAlive, Finalize, etc #334: The ISpeakWebSocketClient interface's Send, SendMessage, and SendMessageImmediately methods have been updated to include an optional length parameter, which aligns with the enhancements made to the Stop method in the main PR.
  • Improve WebSocketMessage Buffer Length Handling #337: The WebSocketMessage class in both the Listen and Speak namespaces has been updated to improve buffer length handling, which relates to the overall message handling improvements seen in the main PR's changes to the Stop method.
  • Throw Exception on Failed WS Connect #340: The changes in the Client class for both Listen and Speak functionalities include improved error handling and connection checks, which are relevant to the modifications made to the Stop method in the main PR.

Suggested reviewers

  • lukeocodes
  • jpvajda
  • SandraRodgers
  • naomi-lgbt

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (8)
Deepgram/Clients/Interfaces/v1/IListenWebSocketClient.cs (2)

18-18: Add XML documentation for the new parameter.

The nullByte parameter should be documented to explain its purpose and impact on the stream closure behavior.

Apply this diff to add the documentation:

+    /// <summary>
+    /// Stops the WebSocket connection
+    /// </summary>
+    /// <param name="cancelToken">Optional cancellation token</param>
+    /// <param name="nullByte">When true, sends a null byte before closing the stream to ensure final transcription is received</param>
+    /// <returns>A task representing the asynchronous operation</returns>
     public Task Stop(CancellationTokenSource? cancelToken = null, bool nullByte = false);

18-18: Consider documenting the relationship between Stop() and SendClose().

The interface has both Stop() and SendClose() methods with similar parameters, but their relationship and usage scenarios aren't clear. This could lead to confusion for implementers and consumers of the interface.

Consider:

  1. Adding interface-level documentation explaining when to use each method
  2. Clarifying if Stop() internally calls SendClose() or if they serve different purposes
  3. Documenting the order of operations during stream closure

Would you like me to provide a detailed documentation proposal?

Deepgram/Clients/Interfaces/v1/ISpeakWebSocketClient.cs (2)

18-18: Add XML documentation for the new parameter.

The implementation change aligns with the PR objectives. However, please add XML documentation for the nullByte parameter to maintain API documentation standards.

+    /// <param name="nullByte">When true, sends a null byte before stopping the stream</param>
     public Task Stop(CancellationTokenSource? cancelToken = null, bool nullByte = false);

100-100: Update method documentation to reflect new behavior.

The implementation is consistent with the Stop method. However, the commented summary above needs updating, and XML documentation should be added for the new parameter.

-    ///// <summary>
-    /////  This method tells Deepgram to initiate the close server-side.
-    ///// </summary>
+    /// <summary>
+    /// Closes the WebSocket connection.
+    /// </summary>
+    /// <param name="nullByte">When true, sends a null byte before closing the connection</param>
     public void Close(bool nullByte = false);
examples/text-to-speech/websocket/simple/Program.cs (1)

30-31: Consider thread-safety for header writing control

The boolean flag works for simple scenarios, but in a concurrent environment, it could lead to race conditions. Consider using Interlocked.Exchange or making the flag volatile for thread-safe operations.

-bool appendWavHeader = true;
+private static volatile bool appendWavHeader = true;
Deepgram/Clients/Speak/v1/WebSocket/Client.cs (3)

348-364: LGTM! Consider using constant for null byte array.

The implementation looks good with proper locking and error handling. The nullByte parameter addition is backward compatible.

Consider extracting the null byte array to a constant:

+private static readonly byte[] NULL_BYTE = new byte[] { 0 };
 public void Close(bool nullByte = false)
 {
     Log.Debug("SendFinalize", "Sending Close Message Immediately...");
     if (nullByte && _clientWebSocket != null)
     {
         lock (_mutexSend)
         {
-            _clientWebSocket.SendAsync(new ArraySegment<byte>([0]), WebSocketMessageType.Binary, true, _cancellationTokenSource.Token)
+            _clientWebSocket.SendAsync(new ArraySegment<byte>(NULL_BYTE), WebSocketMessageType.Binary, true, _cancellationTokenSource.Token)
                 .ConfigureAwait(false);
         }
         return;
     }

Line range hint 875-904: LGTM! Consider making delay configurable.

The implementation looks good with proper error handling and async/await usage. The added delay allows for final transcription to be received.

Consider making the delay configurable through the options:

+public class DeepgramWsClientOptions
+{
+    public int FinalTranscriptionDelayMs { get; set; } = 100;
+}

 public async Task Stop(CancellationTokenSource? cancelToken = null, bool nullByte = false)
 {
     // ... existing code ...
     
     // small delay to wait for any final transcription
-    await Task.Delay(100, cancelToken.Token).ConfigureAwait(false);
+    await Task.Delay(_deepgramClientOptions.FinalTranscriptionDelayMs, cancelToken.Token).ConfigureAwait(false);
     
     // ... existing code ...
 }

348-348: Add XML documentation for the nullByte parameter.

The new parameter should be documented to help users understand its purpose.

Add XML documentation for the nullByte parameter in both methods:

+/// <param name="nullByte">When true, sends a null byte instead of a close message.</param>
 public void Close(bool nullByte = false)

+/// <param name="nullByte">When true, sends a null byte instead of a close message.</param>
 public async Task Stop(CancellationTokenSource? cancelToken = null, bool nullByte = false)

Also applies to: 875-875

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7aebe09 and 7143760.

📒 Files selected for processing (5)
  • Deepgram/Clients/Interfaces/v1/IListenWebSocketClient.cs (1 hunks)
  • Deepgram/Clients/Interfaces/v1/ISpeakWebSocketClient.cs (2 hunks)
  • Deepgram/Clients/Listen/v1/WebSocket/Client.cs (2 hunks)
  • Deepgram/Clients/Speak/v1/WebSocket/Client.cs (3 hunks)
  • examples/text-to-speech/websocket/simple/Program.cs (3 hunks)
🧰 Additional context used
🔇 Additional comments (3)
Deepgram/Clients/Interfaces/v1/IListenWebSocketClient.cs (1)

18-18: Verify implementations of the interface.

The new parameter requires updates to all implementing classes.

Let's verify the implementations:

✅ Verification successful

Let me check the implementation of the Stop method in the found file to verify if it matches the interface.


Let me try a different approach to find the Stop method implementation using ast-grep with a more flexible pattern.


Based on the code inspection, I can now provide the final response:

Implementation has been updated correctly

The implementation in Client.cs has been properly updated to match the interface with the new nullByte parameter:

public async Task Stop(CancellationTokenSource? cancelToken = null, bool nullByte = false)

The implementation correctly:

  • Accepts both parameters with the same types and default values as defined in the interface
  • Uses the nullByte parameter in the SendClose method to determine how to close the connection
  • Maintains backward compatibility with the default false value
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find all implementations of IListenWebSocketClient
echo "Searching for implementations of IListenWebSocketClient..."
rg -l "class.*:.*IListenWebSocketClient"

# Find all Stop method implementations
echo -e "\nSearching for Stop method implementations..."
ast-grep --pattern 'public Task Stop(CancellationTokenSource? $_, bool $_ = false)'

Length of output: 407


Script:

#!/bin/bash
# Show the Stop method implementation in Client.cs
echo "Checking Stop method implementation in Client.cs..."
rg -A 10 "public\s+Task\s+Stop" "Deepgram/Clients/Listen/v1/WebSocket/Client.cs"

Length of output: 193


Script:

#!/bin/bash
# Show the content of the Client.cs file
echo "Content of Client.cs:"
cat "Deepgram/Clients/Listen/v1/WebSocket/Client.cs"

# Also try to find Stop method with a more flexible pattern
echo -e "\nSearching for Stop method with flexible pattern:"
ast-grep --pattern 'public Task Stop($$$)'

Length of output: 42569

Deepgram/Clients/Listen/v1/WebSocket/Client.cs (2)

825-825: LGTM: Method signature change maintains backward compatibility.

The addition of the optional nullByte parameter with a default value of false ensures existing code continues to work without modification.


849-849: LGTM: Stream closing behavior modified as intended.

The implementation correctly uses the new nullByte parameter to control how the stream is closed, aligning with the PR objective to allow closing without using a cancel command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants