-
Notifications
You must be signed in to change notification settings - Fork 1.1k
If any MQTT publish message fails with the Error -27648 while sending data, it continuously fails, recovers only on reboot #3359
Comments
Hi Team, Can you please help me on this as soon as possible. This is an issue faced in the site. This needs to be fixed immediately. I am clueless with this. I need your immediate assistance in this. Thanks |
Hello @kumarfirmware , If you are using coreMQTT APIs from multiple threads and sharing connection then thread safety must be handled in the application. Alternatively you can use the coreMQTT Agent library that adds thread safety to the coreMQTT library. Please refer https://freertos.org/mqtt-agent/index.html |
Hi Pvyawaha, Thanks for the message. I am using AWS FreeRTOS version 202007 and ESP32. There is no coreMQTT and coreMQTT Agent library in this version. Please let me know how to proceed in detail. please give me clear detailed information with respect to version 202007. Thanks |
Hi @kumarfirmware, From your logs, it looks like the publish completes successfully, and then on line 54 the timestamp jumps from 1659 to 4628, where it fails to send a PINGREQ packet as part of the keep alive job. Is the connection intended to be left inactive long enough for the keep alive job to kick in? Also, there have been issues in the past with this version of the MQTT library where the network receive task fails and the application isn't notified until the keep alive job fails, resulting in similar log messages to what you are seeing. Can you enable debug logs to try to narrow down where the issue occurs between the send of the PUBLISH and the keep alive attempt? Additionally, as the MQTT library that was part of 202007 was removed, we recommend that you upgrade to the latest release if you are able. |
Hi Muneeb, Thanks for the message. I do not close the MQTT connection and there is no need to close the MQTT connection. MQTT PING is a continuous process, it will go on. The worry is sometimes this Error -27648 does not go off hours together even after reboot, it continuously keep giving this Error -27648. I am using AWS FreeRTOS version 202007 and ESP32. As you know it takes long time to move to AWS FreeRTOS version 202107. Meanwhile production sites are struggling with this issue with AWS FreeRTOS version 202007. How do I replace MQTT library seamlessly in the AWS FreeRTOS version 202007 with latest bug free MQTT library? Please help me with this immediately with detailed steps. Any immediate help in this is highly appreciated. Thanks, Regards, |
Hi @kumarfirmware, Can you enable debug logging? As I mentioned earlier, it's possible the connection goes down even before the ping request attempt is made, and more logs could help narrow down when the error occurs. If you aren't able to upgrade to the latest release due to the MQTT API changing, are you able to use the MQTT library from previous tags, like 202012.00 or 20210526_Archive, which was the last tag before the old MQTT API was removed? I ask since both of those tags contain the coreMQTT library, and also have a compatibility layer to interface with the old MQTT API that you are using in 202007.00. I think using coreMQTT along with the compatibility layer might help you since you would still be able to use the same API you currently are. |
Hi Muneeb, Thanks for the message. I will enable the debug logs and send it to you. The best solution is to move to 202012 which has coreMQTT library and still my application code of 202007 will work because it has the compatibility layer to interface with the old MQTT API. I do not think this possibility is there in 202107. SO I cannot move to 202107. In this case moving to 202012 should be very easy and straightforward. This should solve all the problems and it will bring in stability. Please confirm the above as soon as possible. But But if the MQTT library used in 202007 and 202012 are same, then what is the point in moving to 202012? Please explain this in detail. Can you please help me in this with immediate reply? If you can explain this in detail, I will immediately move to 202012. I could do a quick diff and see 202007 has FreeRTOS MQTT V2.2.0 and 202012 has FreeRTOS MQTT V2.3.1. This could be a huge difference right in terms of stability? Please confirm? Also initially I thought for a quick port I thought I will copy the MQTT folder from 202012 and replace it in 202007. will it work or it requires CoreMQTT library to work? is this fully tested 20210526_Archive? Can I rely on this? can I move to this if it has compatibility to 202007 MQTT API? I would like to hear from you immediately. Thanks, Regards, |
The MQTT library in 202007 and 202012 is not the same. 202012 uses coreMQTT, but only contains a compatibility layer to be able to use the APIs in 202007. Internally, it will still use coreMQTT, so you will need to include the coreMQTT library as well. If 202012 works for you, then I would recommend next using 20210526_Archive, since it contains bug fixes for memory leaks that were in 202012. That tag would have had testing for demo builds and runs when each commit was merged, but it does not have extensive testing for every build combination as our release tags do. |
Hi Muneeb, Thanks for the message. I started using 20210526_Archive. Everything looks good, until now. But I guess device shadows are handled a bit differently here in this version compared to 202007, so I need to adopt my code here. So I assume I should do device shadows also using normal MQTT Publish and subscribe to match the compatibility for MQTT API used in AWS V202007. So I cannot use the shadow library here right? Please confirm this as soon as possible, so that it will help me to move forward. Should I use the master branch or release branch for 20210526_Archive. Which is the most latest and tested thoroughly? Master/Release? Can I use this 20210526_Archive in production to replace my current production running with 202007 ? Thanks, Regards, |
Hi, I'm not sure why you wouldn't be able to use the Shadow library in that version. Looking at the commit history, there weren't that many changes to the old shadow files in that tag. Since that version of the shadow library used the old MQTT APIs, it will just be using the MQTT compatibility layer instead, so the APIs would be the same. Of course, the recommendation is to use the latest Device Shadow library but the old API is still present in the 20210526 tag. I'm not sure what you mean by master/release for 20210526_Archive. Release branches are tested the most. Our recommendation is still to use the latest release version 202107.00, but if that is not possible then you can use 202012 or 20210526_Archive. It's up to you whether you are willing to use 20210526_Archive in production; if you only want to use release tags, then you should know that 20210526 contains bug fixes for memory leaks that were present in 202012, so you should copy over the changes that were made to the MQTT compatibility layer |
From the error, it looks like you need to include this file. You should add the directory that file is in to your include path before compiling. |
What build system/compiler are you using? For gcc compilers, you can add include paths by using the |
If you are using that version of shadow and cmake, then you should add a dependency on |
Hi Muneeb, I am able to build this successfully now. I will test and see if I can reproduce the [iot_thread] [ERROR][NET][117930] Error -27648 while sending data. I will let you know how the test goes. Thanks, |
Namaste @kumarfirmware,
One of the many possible reasons for the above error is the possibility of the duplication of We had encountered this error wherein two clients - one being our Reason for such behavior: According to the MQTT spec, multiple connections to the same endpoint must have unique client IDs, otherwise it's likely that the first connection will suddenly be dropped by the server, without notification, after the second connection has been made. Requesting you to go through this #2916 (comment), which is part of the AFR & ESP32: NETWORK ERROR encountered, for aperiodic MQTT Publish operation issue, I had raised. Additionally please note, the FreeRTOS version we are using for development is V202002.00, which is older than yours. Hope this helps! Thanks | Regards, |
Hi Dipen, Thanks for the message. I am pretty sure I am using unique clientcredentialIOT_THING_NAME. I am not doing any multiple MQTT connections with same clientcredentialIOT_THING_NAME. I am still facing the issue [iot_thread] [ERROR][NET][117930] Error -27648 while sending data. some times. Thanks, |
Hi Muneeb, As you know I am using AWS 20210526_Archive, as this has a compatibility layer for MQTT API used in AWS V202007. I am using the compatibility MQTT API. Is it thread safe? Can I publish from three different tasks at the same time? Can the OTA run all the time and publish from three different tasks at the same time? Please let me know. Thanks, |
Hi @kumarfirmware, We apologize for taking this long to get back to you. The compatibility layer is thread safe. You can use it to publish from multiple tasks including OTA simultaneously. Thanks |
Hi Muneeb, Thanks a lot for confirming this. This is really great news. Now the question is in actual 202107 is coreMQTT thread safe? or coreMQTT agent is must to achieve thread safety, please confirm this as soon as possible. Thanks, Regards, |
Hi @kumarfirmware, In the 202107 release, coreMQTT is not thread safe by itself. The coreMQTT Agent library can be used to make thread safe MQTT calls. |
Hi Muneeb, Thanks this is clear. I am attaching here three crash logs. All are similar. I ran overnight testing and the build crashed. Pattern is the same. Failed to send PING. MQTT got disconnected. Looks like wifi lost and connected back. It was trying to establish an MQTT connection, but it crashed and rebooted. It happened thrice and it happened exactly the same. Can you please let me know why it crashed and what the reason is, in my opinion, this is not coming from the application, it is definitely coming from the underlying. This is very very critical. stopping production release. Please help me urgently. This needs to be fixed ASAP. Need to analyse the log, from the log only we need to find the root cause. Since crash is not from the application and it is from the underlying network layer I am not able to understand much. Please help. MQTT_Crash_Log1.txt Thanks, Regards, |
Hi @kumarfirmware, Looking at some of your logs,
Some comments:
It looks like you are not yet able to reconnect. You should keep retrying the TLS connection before proceeding further with the MQTT connection. |
Hi Muneeb, Thanks for the message. As you know I am using AWS 20210526_Archive, as this has a compatibility layer for MQTT API used in AWS V202007. To establish MQTT connection, I have two steps here.
when ever there is an MQTT disconnection, because of PING request fail or for some other reason I am doing the following two steps again.
Please confirm, if this is correct or not. Please let me know if I am doing something wrong. "You should not attempt a new MQTT connection if the TLS connection fails" - This is not possible for me with the above two APIs. Please confirm. "You should not call IotMqtt_Init() a second time without calling IotMqtt_Cleanup()" - Please confirm is this a must and should step when I am using AWS 20210526_Archive, as this has a compatibility layer for MQTT API used in AWS V202007. Do you mean to say when ever there is MQTT disconnect and I am trying to do reconnect I should call IotMqtt_Cleanup()? Please confirm this. This is a big mistake which I am doing, if you double confirm it has to be like this. But it is not documented any where. Please confirm this with evidence. This is really urgent. Kindly reply as soon as possible. DO you think not calling IotMqtt_Cleanup() is the reason for the crash? Please confirm. Thanks, Regards, |
Hi Muneeb, Can you please reply to my above message as soon as possible. I need to fix this issue as soon as possible. Thanks, |
Hi @kumarfirmware, You shouldn't call init again without calling the cleanup functions first. You should probably not need to call init again in the first place. The error could be due to the fact that the previous connection didn't close successfully:
or that you are attempting a new connection when the network isn't ready. You can try calling |
Hello @kumarfirmware , Closing this issue now, please reopen if you have more questions on the same topic or create a new one for other topics. |
Hi Team,
I am using AWS FreeRTOS version 202007 and ESP32.
ESP32 is connected to the corporate network Wi-Fi. Very High speed network. No chance of unstable network.
Also ESP does not loose wifi connectivity at all. wifi is very stable.
If any MQTT publish message fails with the following error, it will not recover and it continuously fails, recovers only on reboot.
[iot_thread] [ERROR][NET][117930] Error -27648 while sending data.
This occurs very randomly.
In my opinion this error is observed happening generally when two close publish or one publish and one subscribe happens almost together or close or back to back.
Can we do publish from multiple threads with out worrying about thread safety? I strongly suspect thread safety here.
How can we use this SDK or how can we trust this SDK, if this error keeps coming now and then and creates instability in the system.
Please let me know the fix for this as soon as possible.
Again this is the same issue from 2 years.
There is single MQTT connection for all the three threads, there is unique client ID.
Thanks,
Kumar.
The text was updated successfully, but these errors were encountered: