[CHA-RL4b5][Room monitoring] Contributors detach on channel FAILED state #405

sacOO7 · 2024-11-15T11:18:58Z

As defined in CHA-RL4b5, when a channel failed event is detected, we initiate channel _doChannelWindDown for all contributors.
This is an internal operation similar to CHA-RL4b9 channel suspended and should be treated as an internal atomic operation to avoid conflicts with externally triggered operations.
Currently, we perform the normal _doChannelWindDown instead of executing it inside _mtx.runExclusive.
This seems to compromise the atomicity of the room lifecycle as per our previous discussion.

┆Issue is synchronized with this Jira Task by Unito

The text was updated successfully, but these errors were encountered:

sacOO7 · 2024-11-15T11:28:18Z

Also, not sure about the comment

// We'll make a best effort at detaching all the other channels

Feels like a soft detach, where RoomStatus will be inconsistent with underlying channel states.
Because in this case, we don't care if detach fails for one or more remaining contributors.
I think operation should be retried till all channels go into Detached or Failed state 🤔
WDYT

sacOO7 · 2024-11-15T17:19:20Z

Documentation does mention that user intervention required for failed state ->

ably-chat-js/src/core/room-status.ts

Lines 45 to 49 in 533c4d0

    
             /** 
        
              * The room is currently detached and will not attempt to re-attach. User intervention is required. 
        
              */ 
        
             Failed = 'failed',

But Room Failed is inconsistent with underlying channel states. Impl. for the same is reflected at multiple places in room-lifecycle. Above is one of the cases.

Currently, we mark RoomStatus as Failed when one of the channels goes into the Failed state. Ideally, this shouldn't be the case. RoomStatus should represent deterministic channel states.

sacOO7 · 2024-11-15T17:20:25Z

Attached = 'attached' -> all channels in ATTACHED state
Detached = 'detached' -> all channels in DETACHED or FAILED state
Suspended = 'suspended' -> This state represents ongoing retry operation till one of the channel goes into FAILED state.
Failed = 'failed' -> This currently represents when one of the channels enters the Failed state. Ideally, this shouldn't happen. Imagine one channel is in a Failed state and another in a Suspended state. If users don't explicitly ATTACH, Suspended channels will automatically reattach after reconnection, while channels in the Failed state will not reattach. It shouldn't be the case that a Failed room is receiving messages on some channels but not others. RoomStatus should be set to Failed only when all channels go into the Failed or Detached state. This ensures no messages are received in the Failed state. Although not an urgent use-case, it will show deterministic behavior.
Currently, when the Room state is Failed, we don't always try full runDownChannels ( make sure channels go into either Detached or Failed state ):

Partial runDownChannels (current case, room monitoring) -> Link
Partial runDownChannels ( windown for suspended channels)-> Link
Full runDownChannels ( first failed attach )-> Link
Full runDownChannels ( attach inside retry )-> Link

Currently, we exhibit inconsistent behavior for the Room Failed state.

Released = 'released' -> Represents channels in either detached or failed state. Also, prohibits from performing Attach and Detach operations.

sacOO7 · 2024-11-17T11:09:44Z

Btw, I just went through the spec, and found out cases when channel goes in Failed state.

When Connection Enters Failed State (RTL3a) - In this case, all channels enter Failed state, along with given channel. Partial runDownChannels impl. assumes this is the case, so there' no need to make sure all channels are into FAILED/ DETACHED state.
Lack of permissions during channel ATTACH (RTL4e) - During channel ATTACH, if user doesn't have enough permissions related to the channel, then channel can go into FAILED state. Impl. for the same in ably-java and ably-go.
Full runDownChannels impl. assumes this is the case, all remaining channels are forced into DETACHED/FAILED state.
Automatic Token renewal with lack of permissions for one or more channels - Please note that, ATTACHED channel can also go into Failed state if renewed token doesn't have permission to support the same. In such a case, Partial runDownChannels can lead to inconsistent states for given channels.

So, it's better to support Full runDownChannels( atomic internal op - CHA-RL7a1) for all cases.

AndyTWF · 2024-11-18T11:41:41Z

Currently, we perform the normal _doChannelWindDown instead of executing it inside _mtx.runExclusive.

An oversight, which we can fix.

I think operation should be retried till all channels go into Detached or Failed state 🤔

Seems reasonable - and shouldn't be too hard.

RoomStatus should be set to Failed only when all channels go into the Failed or Detached state. This ensures no messages are received in the Failed state. Although not an urgent use-case, it will show deterministic behaviour.

I think that we could argue either way here - what's more important, to tell the user straight away that something has failed, or perhaps have them wait X seconds for the detach procedure to complete (assuming some bad network conditions)?

sacOO7 changed the title ~~[CHA-RL4b5][Room monitoring] contributors detach on channel FAILED state not atomic~~ [CHA-RL4b5][Room monitoring] Contributors detach on channel FAILED state Nov 15, 2024

AndyTWF added the bug Something isn't working label Nov 18, 2024

This was referenced Nov 25, 2024

[CHA-RL4] Implement the Room MONITORING ably/ably-chat-kotlin#25

Open

bug: complete channel wind-down on failed channel in retry loop #416

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CHA-RL4b5][Room monitoring] Contributors detach on channel FAILED state #405

[CHA-RL4b5][Room monitoring] Contributors detach on channel FAILED state #405

sacOO7 commented Nov 15, 2024 •

edited by sync-by-unito bot

Loading

sacOO7 commented Nov 15, 2024 •

edited

Loading

sacOO7 commented Nov 15, 2024 •

edited

Loading

sacOO7 commented Nov 15, 2024 •

edited

Loading

sacOO7 commented Nov 17, 2024 •

edited

Loading

AndyTWF commented Nov 18, 2024

[CHA-RL4b5][Room monitoring] Contributors detach on channel FAILED state #405

[CHA-RL4b5][Room monitoring] Contributors detach on channel FAILED state #405

Comments

sacOO7 commented Nov 15, 2024 • edited by sync-by-unito bot Loading

sacOO7 commented Nov 15, 2024 • edited Loading

sacOO7 commented Nov 15, 2024 • edited Loading

sacOO7 commented Nov 15, 2024 • edited Loading

sacOO7 commented Nov 17, 2024 • edited Loading

AndyTWF commented Nov 18, 2024

sacOO7 commented Nov 15, 2024 •

edited by sync-by-unito bot

Loading

sacOO7 commented Nov 15, 2024 •

edited

Loading

sacOO7 commented Nov 15, 2024 •

edited

Loading

sacOO7 commented Nov 15, 2024 •

edited

Loading

sacOO7 commented Nov 17, 2024 •

edited

Loading