Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement two step shutdown with sigint #249

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ocaballeror
Copy link
Contributor

Fixes #60

Here's a quick rundown of the changes:

  • The signal handler needs to send a shutdown signal to the agents from a separate thread, since running it in the same one as the nameserver's socket (default), would lead to a deadlock where the nameserver is waiting for the agents to shut down and the agents are waiting for the name server to accept their 'unregister' message.
  • The nameserver process can't call terminate on itself from the sigint handler because ._popen seems to be undefined when accessed from a different process. Right now it seems the NameserverProcess is being terminated anyway (the tests would block otherwise, and there's no zombie python process afterwards), but I'm not entirely sure how or why.
  • I removed the sigint handler on the AgentProcess. I don't think it's needed anymore, since we are capturing the signal from the nameserver.
  • This took me way longer than I expected, so I hope this is the right way to do it 👍

@Peque Peque added this to the 0.7.0 milestone Feb 15, 2018
@codecov
Copy link

codecov bot commented Feb 15, 2018

Codecov Report

Merging #249 into master will decrease coverage by 0.2%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #249      +/-   ##
==========================================
- Coverage   99.15%   98.94%   -0.21%     
==========================================
  Files          26       26              
  Lines        3550     3619      +69     
  Branches      256      259       +3     
==========================================
+ Hits         3520     3581      +61     
- Misses         18       24       +6     
- Partials       12       14       +2
Impacted Files Coverage Δ
osbrain/agent.py 98.38% <100%> (+0.67%) ⬆️
osbrain/tests/common.py 100% <100%> (ø) ⬆️
osbrain/tests/test_agent.py 100% <100%> (ø) ⬆️
osbrain/nameserver.py 97.84% <86.66%> (-1.38%) ⬇️
osbrain/tests/test_nameserver.py 97.6% <94.73%> (-0.52%) ⬇️
osbrain/tests/test_bugs.py 83.33% <0%> (-16.67%) ⬇️
osbrain/__init__.py 91.66% <0%> (-8.34%) ⬇️
osbrain/tests/test_agent_transport.py 93.97% <0%> (-2.41%) ⬇️
osbrain/proxy.py 98.75% <0%> (-1.25%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad5e4e2...c03435f. Read the comment docs.

@ocaballeror
Copy link
Contributor Author

More changes:

  • Re-added and fixed the sigint handler for agents.
  • Added a test for the agent's sigint handler based on the commits from this branch: https://github.com/Flood1993/osbrain/tree/sigint_tests
  • Added psutil as a testing dependency. It makes process management much easier to handle, especially if you are dealing with multiple OSs.

@Peque Let me know how much you disagree with me 😄

@ocaballeror
Copy link
Contributor Author

@Peque Is this a good moment to go on with this?

This should be rebased on top of the appveyor integration, by the way, I don't trust Windows signals whatsoever.

@Peque
Copy link
Member

Peque commented Feb 27, 2018

@ocaballeror Let us first improve the current Windows CI integration (combine coverage reports, fix tests currently failing, see if we can skip less tests...).

Trying to tackle this now would only add more noise to the currently incomplete Windows support.

@ocaballeror
Copy link
Contributor Author

@Peque Shall we take a look at this again?

@Peque
Copy link
Member

Peque commented Apr 2, 2018

@ocaballeror Sure. I will review this this week.

@Peque Peque added the feature label Apr 4, 2018
@Peque
Copy link
Member

Peque commented Apr 4, 2018

@ocaballeror Before reviewing this, would you mind rebasing against master?

Guillermo Alonso and others added 8 commits April 5, 2018 09:15
This will have to be rebased so as to clean the mess.

In this commit are the settings which I have been using to try a
clean shutdown with Ctrl + C. The idea behind it is, launch
`python test.py` from a console and then kill it with Ctrl + C.

We can see that the daemon is the first one who gets the signal,
and right now it prevents the processes from getting it. By
reverting the changes in nameserver.py:92, we can see that the
problem is that we are not able to locate the NS after Ctrl + C,
so the NameServerProcess cannot know when all agents have been
successfully shutdown.

Further investigation is required.
@ocaballeror
Copy link
Contributor Author

I didn't realize Windows compatibility was still not a thing when we started implemented this, so it will require a bit of work.

What I found from a few minutes of playing around with this is that:

  • Windows doesn't interpret a ^C event as a signal.SIGINT, but rather as a signal.CTRL_C_EVENT, which is kind of unfortunate.
  • Windows doesn't support os.wait() either, so we will probably have to work around this by using psutil.Process.wait().
  • Something weird happens on test_nameserver_sigint_kill(), since it's reporting a PermissionError on os.kill().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants