Nameserver crashes unexpectedly #318

ocaballeror · 2018-11-30T10:15:13Z

The nameserver crashed on shutdown and I could not restart it because it was left hanging, waiting for a rogue agent to shut down, which apparently is the expected behavior.

Surprisingly enough the error message shown was:

TimeoutError: Chances are [] were not shutdown after 10.0 s!

So it would appear like the agent was still alive after the call to async_kill_agents but it effectively died in the milliseconds between us checking if it was alive and the TimeoutErrror being raised just after that. I find it very very strange, especially considering that we set a default timeout of 10 seconds, which should be plenty for any kind of agent to shut down.

It probably has something to do with the agent being unresponsive and having broken the connection between it and the nameserver, but it's hard to know for sure until we can get a reproducible case.

The text was updated successfully, but these errors were encountered:

ocaballeror · 2018-11-30T10:16:37Z

We'll have to experiment by making the agents crash in different ways until we find a situation that we can reproduce.

Peque · 2019-03-11T17:44:47Z

Maybe unrelated, but I was able to reproduce a crash like that (only the list of agents was not empty) in my pypy branch with:

tox -e pypy3 -- -xsv -k close_ipc_socket_agent_blocked

ocaballeror · 2019-03-11T21:34:35Z

To be fair, there are quite a few things that don't seem to work with pypy, so I'm not sure if this counts as "reproducing" the error.

My guess from a few minutes of running this is that pypy must handle threads in a different way than what we are used to. The ContextTerminated errors that pop up when running this test certainly look like the context is being terminated before we expected.

What is happening on pypy reminds me of this other test I wrote when I first tried to reproduce the error.
The agent ends up in a very wrong state, and the output looks kind of similar:

def test_agent_break():
    def break_internals(agent):
        agent._context.term()

    ns = run_nameserver()
    agent = run_agent('agent')
    agent.set_method(break_internals)
    agent.after(0, 'break_internals')
    time.sleep(.1)
    ns.shutdown(3)

    assert agent_dies('agent', ns)

I still haven't found a way to reproduce the original Chances are [] were not shutdown error 😞. There could be many factors involved, but what exactly happened is still beyond me.

ocaballeror self-assigned this Nov 30, 2018

Peque added this to the 0.7.0 milestone Nov 30, 2018

Peque added bug maybe labels Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nameserver crashes unexpectedly #318

Nameserver crashes unexpectedly #318

ocaballeror commented Nov 30, 2018

ocaballeror commented Nov 30, 2018

Peque commented Mar 11, 2019

ocaballeror commented Mar 11, 2019

Nameserver crashes unexpectedly #318

Nameserver crashes unexpectedly #318

Comments

ocaballeror commented Nov 30, 2018

ocaballeror commented Nov 30, 2018

Peque commented Mar 11, 2019

ocaballeror commented Mar 11, 2019