Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves stellar-core-debug-info script and adds docs #4553

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SirTyson
Copy link
Contributor

Description

Resolves #4545

This PR updates documentation regarding the stellar-core-debug-info.

Additionally, while helping people debug nodes, the script was difficult to use and had many default values specific only to SDF infrastructure. I've updated the script to be easier to use. Specifically, it requires an output directory argument, and creates the directory automatically if it does not exist. The script also automatically detects the stellar-core executable path and config via the stellar-core.service file. Finally, I've added additional error checking around offline-info and better path resolution, which previously was buggy.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

Copy link
Contributor

@jacekn jacekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvements. I added one question about docker and one non-blocking idea.

Comment on lines 39 to 44
result = subprocess.run(
["systemctl", "cat", service_name],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 subprocess.check_output might allow you to reduce code needed to handle errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:
# Use systemctl to retrieve the service file content
result = subprocess.run(
["systemctl", "cat", service_name],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ it would be worth testing this inside docker too. We use /usr/bin/stellar-core as entrypoint so systemd will not be running inside docker containers.
Perhaps this needs to be handled separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a function to see if we're running in docker and used the docker default paths if we are.

@anupsdf
Copy link
Contributor

anupsdf commented Nov 22, 2024

Here is the console log from running this on my mac.

  • gathering core info has this vm space error
  • Core was not running when I ran this so its strange that offline-info complained but it gave me this log where it was having trouble getting db schema version
  • gather_os_info is not mac friendly i guess
  • i was able to get the buckets directory
  • gathering sqllite db info also complained
 ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core -s . -b buckets  ../core_logs
Getting get_full_path_for_command /Users/anuppani/sdf_git/stellar-core/src/stellar-core
Gathering OS information...
Error calling function gather_os_info
Gathering stellar-core version and config...
stellar-core(40716,0x1feb14f40) malloc: nano zone abandoned due to inability to reserve vm space.
Warning: running non-release version v22.0.0rc2-85-g4b4cd3657 of stellar-core
Error calling function gather_core_info
Gathering stellar-core offline-info...
Warning: offline-info command failed. Maybe stellar-core is still running? For more information check /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09/offline-info/output
Gathering logs...
Error calling function gather_logs
Gathering buckets directory
Gathering sqlite DB
Error calling function gather_sqlite_db
Results stored in /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09.tar.gz
Encountered some errors when gathering data

@SirTyson
Copy link
Contributor Author

Here is the console log from running this on my mac.

  • gathering core info has this vm space error
  • Core was not running when I ran this so its strange that offline-info complained but it gave me this log where it was having trouble getting db schema version
  • gather_os_info is not mac friendly i guess
  • i was able to get the buckets directory
  • gathering sqllite db info also complained
 ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core -s . -b buckets  ../core_logs
Getting get_full_path_for_command /Users/anuppani/sdf_git/stellar-core/src/stellar-core
Gathering OS information...
Error calling function gather_os_info
Gathering stellar-core version and config...
stellar-core(40716,0x1feb14f40) malloc: nano zone abandoned due to inability to reserve vm space.
Warning: running non-release version v22.0.0rc2-85-g4b4cd3657 of stellar-core
Error calling function gather_core_info
Gathering stellar-core offline-info...
Warning: offline-info command failed. Maybe stellar-core is still running? For more information check /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09/offline-info/output
Gathering logs...
Error calling function gather_logs
Gathering buckets directory
Gathering sqlite DB
Error calling function gather_sqlite_db
Results stored in /Users/anuppani/sdf_git/core_logs/stellar-core-debug-info-2024-11-22-08-18-09.tar.gz
Encountered some errors when gathering data

Hmm this looks like an issue with your core build. Can you run version command normally without the script? It looks like the malloc error is coming from within stellar-core, not the script.

Wrt os_info, that's only supported on linux. This is fine, as the script is intended for production environments. Finally, I don't think gathering sqllite info worked because you fed the script a bad path via the -s . flag, this should be something like -s ./sql.db. You should be able to just run ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core ../core_logs and the db and buckets path will be automatically pulled from the provided config.

@anupsdf
Copy link
Contributor

anupsdf commented Nov 22, 2024

Hmm this looks like an issue with your core build. Can you run version command normally without the script? It looks like the malloc error is coming from within stellar-core, not the script.

Wrt os_info, that's only supported on linux. This is fine, as the script is intended for production environments. Finally, I don't think gathering sqllite info worked because you fed the script a bad path via the -s . flag, this should be something like -s ./sql.db. You should be able to just run ./scripts/stellar-core-debug-info -c ../pubnet_watcher.cfg -p /Users/anuppani/sdf_git/stellar-core/src/stellar-core ../core_logs and the db and buckets path will be automatically pulled from the provided config.

My config file didn't have the db and buckets path, I will add them. For now, -s ./stellar.db option worked for SQLite.
My version command output also throws this malloc error but does print the details afterwards.

@SirTyson
Copy link
Contributor Author

My config file didn't have the db and buckets path, I will add them. For now, -s ./stellar.db option worked for SQLite.
My version command output also throws this malloc error but does print the details afterwards.

I think the script is working as intended then. I'm not sure why your core image is throwing and still reporting version info, but it's definitely failing, and the correct behavior script wise is probably to just give up on processing any sort of output if the stellar-core invocation is returning a non zero exit code. In this particular instance it looks like we could still parse output on failure, but I don't think that can be generalized.

Is this build the current master? This might be a mac specific issue, the offline-info and version commands run fine for me on linux.

@anupsdf
Copy link
Contributor

anupsdf commented Nov 22, 2024

I think the script is working as intended then. I'm not sure why your core image is throwing and still reporting version info, but it's definitely failing, and the correct behavior script wise is probably to just give up on processing any sort of output if the stellar-core invocation is returning a non zero exit code. In this particular instance it looks like we could still parse output on failure, but I don't think that can be generalized.

Is this build the current master? This might be a mac specific issue, the offline-info and version commands run fine for me on linux.

The malloc error was because I had asan enabled. The error went away with core build without --enable-asan.
Yeah, this doesn't seem like a normal processing when encountering error.

Copy link
Contributor

@anupsdf anupsdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks for fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document stellar-core-debug-info
3 participants