-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Detecting Greengrass Core Disconnection at IoT Thing side #3094
Comments
Hello @Prahar08modi
I'm assuming you are using process loop when connection is idle. What do you mean by GGC disconnection? If Ping is sent, GGC should not disconnect. If Ping is not sent, since your keep-alive interval is 30 seconds, GGC should wait till 45 seconds before it disconnects.
In this case, the application has to re-establish the connection. When Publish returns failure, that's the indication that the network connection has gone down. The application should then close MQTT connection and disconnect TCP/TLS connection( i.e. close socket etc) and then reestablish both TLS and MQTT connections. The library does not have any running thread to detect the connection has gone down. It will only detect it when either send (eg. sending PUBLISH) of receive ( processLoop) fails. |
Hello @abhidixi11 Firstly I'm sorry for the late reply.
Yeah. The assumption is correct.
I meant to say GGC disconnection (in this context) as when the power supply of GGC is cut/removed
I think there is a confusion and let me clear it. GGC does not get disconnected by its own. I deliberately remove the power supply of GGC.
Yeah, I've implemented the logic of re-connection by the same flow you mentioned and it works perfectly well. But there is a problem of detecting MQTT disconnection in the case mentioned above.
Exactly that's what is expected. It is able to detect the processLoop failure immediately after keep alive timeout but it is not able to detect failure immediately due to NETWORK ERROR when PUBLISH is send. Instead it detects after around 5 minutes which is too much in my application. Thanks for the help! |
Is there any further update on this? |
Hello @abhidixi11 I found a similar issue #2155. I think the fix provided there was for the older version of Amazon FreeRTOS as the version which I am using (202011.00-4-ga83a71b33) uses coreMQTT Library's function MQTT_Publish for publishing a PUBLISH packet on network by using MQTT LTS PUBLISH API. |
Hello @Prahar08modi |
Hi @Prahar08modi, What version MQTT library are you using? From the following line in your error log:
It looks like it's coming from this line in the MQTT compatibility layer with the old API. I'd recommend using the latest coreMQTT API or the recently released coreMQTT Agent library if working with multiple threads. Second, can you clarify the issue you are facing? Is the issue:
If it's the former, this is an issue at the transport layer as @abhidixi11 described. For the latter, you can just close the network connection yourself as soon as the publish returns an error. Additionally, if using the old MQTT library, you might find this response helpful in explaining why the network connection isn't closed as soon as a receive failure is detected, and instead waits for the keep alive job to fail. This could be related to the issue you are facing. |
Hello @abhidixi11 and @muneebahmed10, Firstly, Thank you for the support. I have a confusion regarding the MQTT v2.x.x Library and coreMQTT library. I can't understand the interlinking of these libraries.
Does it still queues the publish messages to send queue even if these are QoS0 type?
Actually, for managing Keep Alive Timeout instead of MQTT_ProcessLoop, _IotMqtt_ProcessKeepAlive is called as mentioned above.
Old MQTT Shim for MQTT V2.x.x APIs is being used.
This is the case where I am not able to detect immediately whether the underlying network connection does not exist (i.e, the GGC's power is disconnected) and I'm still getting success messages from IoTMqtt_Publish API till 5 minutes. After 5 minutes, it detects the NETWORK_ERROR from IoTMqtt_Publish API and then connection is disconnected when keep alive timeout is detected.
Yeah I'm using multiple threads. Maybe I can give it a try... |
Hello @Prahar08modi , Quick clarification :
I looks like you are using shim layer which is a compatibility layer between old API and new API. If you are developing a new application, I would recommend directly using coreMQTT library. We are also working on adding support for coreMQTT-Agent library in this repository, but you can check out the sample usage of coreMQTT-Agent library, which supports connection sharing in demos repository . It shows how various AWS services can be used. Any particular reason you are using shim layer ? Thanks! |
Hello @abhidixi11 ,
Actually I had a previous experience using iot_mqtt_agent and had the above confusion regarding the new coreMQTT library. So, to reduce development time, I decided to move on with MQTT Shim for now and implement coreMQTT afterwards.
Okay. So, I'll be trying the new coreMQTT Agent library after understanding the basics of how packets are transmitted and received, keep alive is handled and all. We can close this issue. If I have any doubt regarding coreMQTT Agent Library, I'll create another thread for the same. Thanks for guiding and helping me out... |
Hi @Prahar08modi, Closing this issue as suggested. Please don't hesitate to open another if you run into any problems with coreMQTT, or require further clarification on this topic. Thank you. |
Describe the bug
My overall system consists of a Greengrass Core (Raspberry Pi) and multiple other devices (ESP-WROOM-32) which runs Amazon FreeRTOS and sends Sensor Readings every 10s. Also I want to cover the case when sometimes by mistake power supply of Greengrass Core is cut then Thing Code must detect that and try to reconnect to GGC. But Thing code is not able to detect this case.
System information
Expected behavior
According to AWS IoT Greengrass documentation, all local communication uses QoS0 (no acknowledgement is provided) as shown below in a screenshot.
I have configured keep-alive timeout as 30s.
According to my understanding, I'm sending sensor readings every 10s which are QoS0 type messages. So, the last message time of MQTT Connection is getting reset after every 10s. In this case, Thing Code is not able to detect the keep-alive timeout. But after around 5 minutes, code detects NETWORK ERROR and knows that the QoS0 messages are actually not getting published. The screenshot of the same is attached below.
I tested the other alternative by an idle code which just waits for keep-alive timeout. In this case, when nothing is getting published, code is able to detect keep-alive timeout event exactly after 30s of GGC disconnection.
The expected behavior should be to detect keep-alive timeout event after 30s while sending sensor readings every 10s.
Thank you!
The text was updated successfully, but these errors were encountered: