-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Socket file removal tests randomly failing on Travis #287
Comments
Yeah, I just found another one. Added a table to keep track of occurrence counts. For now it seems the failure is always when the agent is killed or crashes (if the agent is blocked, then the name server will actually kill it, so it is the same). |
I also checked my |
@ocaballeror When were they last modified? We used to have that problem before d76a1ea. Otherwise it might mean you can actually reproduce what we are seeing in Travis. 😄 |
I have a few ones from February 2-7, and the next ones were created on March 27th, so that could be a good clue of where to find a possible bug. |
I just realized I can get osbrain to keep the socket files by killing the main process with That is probably the reason why I had so many leftover files in my system, and that also makes me think that the one socket file I supposedly got to keep on purpose by running the tests over and over was actually the result of a manually killed process. That's bad news, because it means I have not been able to reproduce the error locally, and that this issue will be a pain to debug 😓 |
I left the tests running overnight, with 4 simultaneously instances of I also tried with different values of The only modification I can think of is to add an extra check after closing the socket, to see if the file is still there, and remove it if it is. Something like (pseudo code): def remove_socket_file(socket):
if os.path.exists("whatever the path of the socket file is"):
os.unlink("whatever the path of the socket file is")
class Agent:
...
def close_all(self):
...
sock.close(linger=get_linger())
self.after(linger, remove_socket_file, sock) |
I was able to reproduce it: #293 Working on a fix... |
https://travis-ci.org/opensistemas-hub/osbrain/jobs/362494498
https://travis-ci.org/ocaballeror/osbrain/jobs/362021572
https://travis-ci.org/ocaballeror/osbrain/jobs/362012690
Occurrences
test_agent_close_ipc_socket_agent_blocked_nameserver_shutdown
test_agent_close_ipc_socket_agent_crash_nameserver_shutdown
test_agent_close_ipc_socket_agent_kill
The text was updated successfully, but these errors were encountered: