Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT connection lost, laggy web interface, restart required #2172

Closed
3 tasks done
broth-itk opened this issue Jul 29, 2024 · 14 comments
Closed
3 tasks done

MQTT connection lost, laggy web interface, restart required #2172

broth-itk opened this issue Jul 29, 2024 · 14 comments
Labels
bug Something isn't working

Comments

@broth-itk
Copy link
Contributor

What happened?

Yesterday OpenDTU stopped to publish data to MQTT.
The web interface was somewhat laggy and eventually I managed to reboot the unit.
Afterwards all started to work as normal.

image

This is the second or third time it happened. The first two events required a power cycle to get all back to normal.

To Reproduce Bug

No indication of the issue being reproducible.
Looks like memory leak or similar.

Expected Behavior

Well, the system work with no outage :)

There is already another case where the implementation of a watchdog is discussed: #693

Although I think the best would be to solve the root cause, a Watchdog would help to recover from these situations.

At the same time, remote logging would help to collect valuable system information like memory usage to track leaks, see #1819

Install Method

Pre-Compiled binary from GitHub

What git-hash/version of OpenDTU?

v24.6.29

Relevant log/trace output

No response

Anything else?

No response

Please confirm the following

  • I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.
@broth-itk broth-itk added the bug Something isn't working label Jul 29, 2024
@broth-itk
Copy link
Contributor Author

Happened again:

image

image

I saw the unit connected to WiFi. Immediately when I initiated a "Disconnect" from my wireless infrastucture, it reconnected and was properly available.

There was no need to reboot or similar.

Have there been any changes to the WiFi code in the last release? I don't remember having had the issue before.

@broth-itk
Copy link
Contributor Author

Lets see what happens with new 24.8.1 release, I'll let you know.
Maybe it's just a bug in the backend libs somewhere

@broth-itk
Copy link
Contributor Author

It just happened again:

image

IP connection is down, no connection to wireless infrastructure...
Red LED did blink each 5 seconds, indicating that OpenDTU was still running somehow.

After resetting power, all back to normal.
Strange.

@broth-itk
Copy link
Contributor Author

Has this been corrected with the latest version (wifi reconnect issue)?

I wonder how I can get the unit back online without being on site... hm

@stefan123t
Copy link

I think this might still be related to some MQTT buffer overload / heap fragmentation. Without further USB Serial Logs about the time the problem occurs, ie sometime before and starting to loose connection this is hard to debug.

Though the comments in #2185 by @Kroki0815 here #2185 (comment) and by @jstammi here #2185 (comment) might shed some light on your issue too.

@broth-itk
Copy link
Contributor Author

First I'm going to install that latest update to see if it helps. As I'm on vacation right now this will be in 2 weeks since I need to power cycle. Maybe a short power cut might help ;-)

USB serial debugging is the next step.

Thanks!

@stuckinger
Copy link

Have you tried another esp32?
I have experienced similar effects on different projects, even with simple stuff using esphome . Effect was observed on some boards, on some not using the same firmware.
Most boards get back again when soft rebooted remotely once they appear again after short outage and run stable for a while afterwards. Some don't and need to be powered off.
I think the quality of the chips may vary too much...

@trixing
Copy link

trixing commented Sep 8, 2024

fwiw, I experienced the same failure mode, no mqtt enabled though.

Kicking / Blocking it from Wifi allowed it to reconnect and got it unstuck (no reboot required).

v24.8.5 "uptime":965588

@stefan123t
Copy link

@broth-itk are you back from your holidays and have you had time already to upgrade to latest version and do some serial logging ?

Follow the link to the documentation to setup for USB / serial logging:
https://www.opendtu.solar/firmware/howto/serial_console/

@stefan123t
Copy link

stefan123t commented Oct 5, 2024

@broth-itk hi Bernhard there is a working PR for remote logging in #1819 / #2292 though you may need to somehow build and flash the image as it is not merged into the master yet. Maybe this helps to monitor your OpenDTU and analyse this issue ?

@ranma
Copy link
Contributor

ranma commented Oct 5, 2024

@broth-itk hi Bernhard there is a working PR for remote logging in #1819 / #2292 though you may need to somehow build and flash the image as it is not merged into the master yet. Maybe this helps to monitor your OpenDTU and analyse this issue ?

Additionally newer versions export heap statistics under the ${prefix}/dtu/heap/ topic in case this is a memory issue.

@broth-itk
Copy link
Contributor Author

broth-itk commented Oct 5, 2024

@stefan123t @ranma Thanks for the PR and the syslog enhancement! This is very appreciated and will help a lot to gather informations form the unit.

I compiled the code & webapp and from what I can tell it looks fine.
Tomorrow I am going to see how it behaves when there are more logs generated from the unit.

@broth-itk
Copy link
Contributor Author

I am going to close this issue since it did not happen anymore since some update.
Maybe it was related to the recent Wifi issue?
heap monitoring is very valuable as well. This allows to track down a potential memory leak.

@broth-itk broth-itk closed this as not planned Won't fix, can't repro, duplicate, stale Oct 5, 2024
Copy link

github-actions bot commented Nov 5, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants