-
Notifications
You must be signed in to change notification settings - Fork 1.1k
SOCKETS_Recv returning an error does not close the MQTT connection #2356
Comments
Hello @lightblu, thank you for raising this issue with us and providing detailed information. I agree that this is not a desirable behavior and have started looking into this. |
@lightblu I found that we have addressed a similar issue in the aws/aws-iot-device-sdk-embedded-C repository in this PR. The PR enables the user to define an optional callback function that is called when the underlying socket gets disconnected. The application writer can take appropriate action in the callback - possibly re-establishing the MQTT connection. Would this solve your problem? |
Need to check this more closely but looks like it would need switching/porting to Still, also the recv in
and I would think the Recv thread on seeing an error indication from that function should initiate closing the socket? (and not need this callback? But will double check). |
I see that Net_Recv task does call SOCKETS_Close: https://github.com/aws/amazon-freertos/blob/master/libraries/abstractions/platform/freertos/iot_network_freertos.c#L124 Is this what you are talking about? Thanks. |
Indeed, you are right. But that I do not overlook from the distance right now where this |
Hello @lightblu, Is this still an issue for you? Do you know if MQTT's receive callback Also, I noticed that in this, MQTT calls its disconnect callback. Have you tried defining your own disconnect callback before making the call to Thanks |
Hi, I just bumped our code to using the new IotMqtt_xxx v2 API. I am still seeing that behaviour that an underlying SOCKETS_Recv returning an error stops the NetRecv thread, but the disconnect is only happening when the next pingreq/resp is attempted and also the SOCKETS_Send fails (as before). I did not get to digging that far to figure out what you ask - maybe this weekend, sorry. (Overlooked the second question, yes, with v2 API we now also have the disconnect callback defined (that was another callback in v1 before). It gets called as described above only after the pingreq/resp fails - so with a OT_MQTT_KEEP_ALIVE_TIMEOUT). |
Hello @lightblu , I have been reading through the discussion and what you are concerned about. You do not like that _networkReceiveTask() simply stops and exits without closing the connection after a SOCKETS_Recv() returns an error. So _networkReceiveTask() will keep looping on SOCKETS_Recv until these conditions:
If an error is returned from SOCKETS_Recv(), there is no chance for the IotMqtt_ReceiveCallback() to be invoked and the socket to be shutdown until we try to send something on the socket. When the socket is shutdown, we then need to destroy the socket and clear all resources taken by the connection by calling SOCKETS_Close with IotNetworkAfr_Destroy(). IotNetworkAfr_Destroy() can be called only given that all references to the connection are not being used, so the it is the application's responsibility to destroy the connection. The thing is also that the loop in _networkReceiveTask() is temporary work around for the lack of a socket poll and select feature. The transport layer is not supposed to shutdown the socket, it is the application layer's responsibility given return values from the socket. And we can't possibly know the server has disconnected us or some other network error until we try to read on the socket. The issue with this workaround is that it performs the applications responsibility of reading from the socket. Let's say we did call IotNetworkAfr_Close() when SOCKETS_Revc returns an error. We cannot immediately call _destroyConnection() because there could be outstanding MQTT operations that are referencing the connection. So now it comes down to letting the application know the connection was closed in the transport layer. This mechanism doesn't currently exist in the iot_network.h API. But as @yanjos-dev pointed out, in the latest CSDK v4_beta_deprecated branch this has: aws/aws-iot-device-sdk-embedded-C#634. If we put this connection close infrastructure into the FreeRTOS iot_network.h port, then we can add code to let MQTT know that the connection was closed. MQTT can then try and clear all of it's operations that have been queued in the taskpool (stored in _mqttConnection_t.pendingProcessing and _mqttConnection_t) and try to destroy the connection. There may also be other implications with QoS1 that I am not addressing when thinking of this change. This would be a hefty change. It is indeed annoying to not know the connection is closed until the keep alive lets you know, but given that the overall MQTT API will return the expected results, we will consider it in our future planning. Thank you so much for bringing this issue to the light. |
Describe the bug
SOCKETS_Recv returning an error does not close the MQTT connection
The
NetRecv
task just stops pollingSOCKETS_Recv
until the keep alive tasks kicks in, tries to send a pingreq, and then the send fails. Only then the connection is considered "closed" and the disconnection is reported up the layers.This is really a blocker if
mqttconfigKEEP_ALIVE_*
are configured for pretty high values, while it is still annoying with lower values.That NetRecv starts just stops when encountering a recv error I think underlines that this is wrong, the connection is already (rightly) considered broken from NetRecv pov, it is just not acted further on.
Note: This issue was also observed and described here #2155 - though the reporter saw sending QoS0 messages (which there didn't even return an error in this situation) as a problem, and that was then fixed.
(Right, if you send something and notice a publish error, you can close and reopen the connection from the application layer; but if you have nothing to send you won't notice.)
Note2: This behaviour was also different (and imo correct) in the older version https://github.com/aws/amazon-freertos/blob/master/CHANGELOG.md#v148-05212019 I am coming from. There, when
SOCKETS_Recv
returned an error, that mqtt connection was immediatly considered remotely closed,SOCKETS_Shutdown
andSOCKETS_Close
was called, and the disconnect callback to the upper layer was called.System information
Expected behavior
When the NetRecv task sees an error from
SOCKETS_Recv
, it should consider the connection broken and act immediatly (triggeringSOCKETS_Shutdown
,SOCKETS_Close
and calling the disconnected callback).To reproduce
Steps to reproduce the behavior:
Force a remote disconnect somehow (e.g. by connecting with the ""same""" thing to AWS IoT a second time)
Have a SOCKETS implementation that would then return an error in
SOCKETS_Recv
on the connection closure (I would think any SOCKETS_ implementation should do that, MQTT Publish message with QoS0 always return success even if there is some error at network level #2155 hints at that this also happens with the mbedTLS implementation).Have higher values configured for
mqttconfigKEEP_ALIVE_INTERVAL_SECONDS
andmqttconfigKEEP_ALIVE_ACTUAL_INTERVAL_TICKS
to see the closure reported up the layers only when keepalive hits.The text was updated successfully, but these errors were encountered: