Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote log level #4413

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

europaul
Copy link
Contributor

This PR is a work in progress, but I appreciate if somebody could take a look at the code and the general approach and give me feedback.

Please see the commit messages (especially the second commit) for the overview of the changes.

Before merging this I need to update the pillar first to include the dependency in newlogd. I also need to update eve-api to include the new metrics.

I'm also planning to add more metrics as well as proper tests for newlogd to at least ensure that there is no regression.

@europaul
Copy link
Contributor Author

the build is gonna break of course because the dependencies are missing in pillar

Copy link
Contributor

@naiming-zededa naiming-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch in general looks good to me, I have some comments

  • in collect-info.sh for 'newlog' directory, we should skip the 'devUpload' directory, since those not yet uploaded files will be the duplicates for the keep directory
  • similar in the edgeview in walkLogDirs() function, we need to skip the 'devUpload' directory to avoid duplicates
  • in loguploader.go, there is 'failSendDir' and logic to move the failed to send to controller log files and keep them there. With this change, i don't think it is needed anymore, since everything will be in the keep directory. this can simplify some logic
  • when doing kibana search, i do search for 'Alpine' string for locate when the device has rebooted, now we may not have this. maybe to always log at least the first 5 device log entries?

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the description in https://github.com/lf-edge/eve/blob/master/docs/LOGGING.md with the intended new state of things?

@europaul
Copy link
Contributor Author

europaul commented Nov 6, 2024

Can you update the description in https://github.com/lf-edge/eve/blob/master/docs/LOGGING.md with the intended new state of things?

@eriknordmark I made the changes to the documentation, please have a look. I'm not sure if it's an overkill with the debug.local.loglevel and is a feature that nobody wants that only makes the system harder to understand?

@europaul europaul marked this pull request as ready for review November 25, 2024 21:37
@europaul
Copy link
Contributor Author

@naiming-zededa
Thank you very much for the detailed comments ❤️
I addressed this comment

* similar in the edgeview in walkLogDirs() function, we need to skip the 'devUpload' directory to avoid duplicates

but I would like to address the following in the next PR to not bloat the scope of this one.

* in collect-info.sh for 'newlog' directory, we should skip the 'devUpload' directory, since those not yet uploaded files will be the duplicates for the keep directory

* in loguploader.go, there is 'failSendDir' and logic to move the failed to send to controller log files and keep them there. With this change, i don't think it is needed anymore, since everything will be in the keep directory. this can simplify some logic

* when doing kibana search, i do search for 'Alpine' string for locate when the device has rebooted, now we may not have this. maybe to always log at least the first 5 device log entries?

@europaul
Copy link
Contributor Author

@eriknordmark I think the architecture should be right from our discussions. Please give this PR another look now that I adapted the implementation accordingly.
I would like to include proper testing in another PR, as this will require big structural changes to newlog, which would make this PR harder to follow.

@europaul
Copy link
Contributor Author

Btw before merging this I'll need to change the EVE API and pillar first, so that I can update the dependencies here. Please let me know if you are satisfied with the changes, then I'll prepare the respective PRs.

docs/LOGGING.md Outdated Show resolved Hide resolved
docs/LOGGING.md Outdated Show resolved Hide resolved
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks fine to me so please proceeed generating the other PRs/commits.

@naiming-zededa can you take a look at the newlogd code changes?

"testing"
"time"

"github.com/stretchr/testify/assert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want testify? Can't we use gomega for tests like we do in other golang tests in EVE repository?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because I know how to use testify and I don't know how to use gomega 😄
but if you want I'll learn how to use gomega and switch to it. is it better than testify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brining go-chi was useful, because they have route parameter parser built-in, so we didn't have to build our own. The only reason why we left parts of Gorilla then was because we used ws communication (happy to restart the conversation of refactoring it). Rest was completely replaced by go-chi. Basically I'm asking the same question here: what's the benefit of testify over gomega and why do we want to have two testing frameworks in the repo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/gorilla/mux/blob/main/README.md?plain=1#L303 - isn't that the same?

We didn't go for gorilla-mux, because at the time it didn't have maintainers, now they do have them. We could have just used standard http mux from golang, but then we would have to write our own route parameter parser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@europaul europaul Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uncleDecart rewrote the tests with gomega - it's not much of a difference, so it makes sense to me to stick with one testing framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw

grep -r "github.com/stretchr/testify/assert" pkg/ | grep "_test.go" | wc -l
26

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for change! Missed the fact that you bring testify to edge_view, not pillar, didn't know we were using testify, I think it's a good raise on LF-Edge call

@europaul
Copy link
Contributor Author

the changes to EVE API are here

@europaul
Copy link
Contributor Author

the dependencies on pillar types are here

| debug.kernel.loglevel | string | info | level of the produced kernel log messages. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |
| debug.kernel.remote.loglevel | string | info | level of the kernel log messages sent to the controller. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |

In addition, there can be per-agent settings to overwrite the default log level set for zedbox.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/zedbox/EVE microservices/ (even though this only applies to some microservices like downloader and zedagent and not to others like wwan and wlan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant to replace all of the occurrences and forgot that one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the full list of all "agents" that support log level settings (call agentlog.HandleGlobalConfig) right now:
newlogd
wwan
nodeagent
downloader
tpmmgr
client
vcomlink
executor
vaultmgr
baseosmgr
zedagent
verifier
wstunnelclient
zfsmanager
zedkube
ledmanager
faultinjection
zedmanager
nim
loguploader
watcher
volumemgr
zedrouter
msrv
domainmgr
diag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it also to CONFIG-PROPERTIES.md


Additionally all log levels can be set to "none" to disable logging for the corresponding component or to "all" to enable all log levels.

Furthermore, the "remote" log levels control, which subset of the produced logs will be sent to the controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think the comma after "control" makes any sense. But I'm not a native english speaker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. copilot suggested it to me and I didn't dare to contradict it 🤖

Copy link
Member

@uncleDecart uncleDecart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, FWIW you can tick "include copy of your diagram" in draw.io and then you'll have just one file my_diagram.drawio.png and if you can edit that one directly in draw.io

As opposed to the default log level, which was setting which logs are
produced by EVE's microservices, the remote log level will set which
device logs are uploaded to the cloud. So it's assumed that the remote
log level is always higher than the default log level.

There are no changes in how newlogd collects the logs, only in how it
handles the log files.
For the logs that are uploaded:
- we create a separate file with prefix dev.log.upload in collect
- that file is gzipped when it reaches a certain size or by timer - the
  same as before
- once gzipped the file is moved to devUpload - the same as before
- once the file is uploaded successfully, it's deleted instead of being
  moved to keepSentQueue

For the logs that stay on the device:
- we create a separate file with prefix dev.log.keep in collect
- that file is gzipped when it reaches a certain size - no timer
- once gzipped the file is moved to keepSentQueue - the name of the
  folder is preserved

The commit also adds log levels "all" and "none" as well as remote log
levels for kernel and syslog logs:
as the names suggest, "all" will log everything and "none" will suppress
all logs. The levels can be set in the global configuration for
- debug.default.loglevel
- debug.default.remote.loglevel
- agent.*agentname*.debug.loglevel
- agent.*agentname*.debug.remote.loglevel
- debug.syslog.loglevel
- debug.syslog.remote.loglevel
- debug.kernel.loglevel
- debug.kernel.remote.loglevel

Signed-off-by: Paul Gaiduk <[email protected]>
Add new metrics for newlogd:
- OldestSavedLog - the timestamp of the oldest log available on the
    device
- TotalSizeLogs - the total size of all logs on the device

Signed-off-by: Paul Gaiduk <[email protected]>
In case the total size of gzipped logs exceeds the limit
they should be removed in the following order by directory:
- `keepSentQueue`
- `failedUpload`
- `devUpload`
- `appUpload`

Signed-off-by: Paul Gaiduk <[email protected]>
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 20.93%. Comparing base (a73c78d) to head (4eeccfc).
Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4413   +/-   ##
=======================================
  Coverage   20.93%   20.93%           
=======================================
  Files          13       13           
  Lines        2895     2895           
=======================================
  Hits          606      606           
  Misses       2163     2163           
  Partials      126      126           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Since keepSentQueue contains the same or more logs than
devUpload, it's enough to only use those logs when
searching with edge-view.

Signed-off-by: Paul Gaiduk <[email protected]>
Add a chapter to LOGGING.md explaining how different log levels are
handled in the system.

Updated CONFIG-PROPERTIES.md to clarify remote log level settings and
provide full list of log level parameters

Signed-off-by: Paul Gaiduk <[email protected]>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@europaul
Copy link
Contributor Author

the last push just fixed some spelling found by local yetus run, since the one in CI doesn't work

@europaul
Copy link
Contributor Author

thank you for your patience! in the final version I

  • reworked how we check the remote log level - instead of checking all log entries when writing to the temporary log file, we now check them when getting them from the source. This way it's easier to differentiate between the three main sources and log levels: EVE microservices, kernel and syslog
  • made sure the log levels coming from the global config are set and read atomically to prevent potential concurrency issues as they are set and read by different goroutines

@europaul
Copy link
Contributor Author

could it be that yetus is failing because it doesn't exclude the changes to vendor dirs and fails to process the amount of files changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants