-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop killing deleted connections in a replicaSet #236
Stop killing deleted connections in a replicaSet #236
Conversation
cc: @comtihon for your thoughts |
hi,
no, it was a mistake.
which other responses do you mean? |
@comtihon sorry for my delayed response - I was pulled into some urgent issues. most of the updates in my guess is that none would present themselves as circumstances to kill the connection (vs |
hi @dmsnell , thank you for the response. |
and version please? |
right, sorry @comtihon - I updated this partially to try and fix an issue we saw in our app, getting into a crash-loop on boot but only when adding both nodes in a replicaSet cluster in the config (with only the primary server, or at least, with only one node listed in the config it was fine) I will |
Resolves comtihon#226 When stopping connections to "deleted" servers use `stop(Pid, normal)` instead of `stop(Pid, kill)`, since killing crashes and will cascade up to bring down the supervisor.
In this patch I've bumped the version number as a minor update. It was unclear whether this was a major or minor update because while it changes the behavior of the topology state when server configs change it doesn't do anything but prevent crashing the app. For applications which depend on the crashing behavior (hopefully there are none) then this is a major change.
9e451e6
to
46c34c1
Compare
@comtihon I think this is as finished as it's going to get right now. I was hoping to have more thorough production testing but we've had this code running for a while and haven't seen the crashes that it fixes. I still have some conceptual questions about the topology state but maybe we can get this one in now and come back later if we need. Version bump is in there and all connections now stop normally instead of being killed. |
Thank you very much for the contribution! |
Status
master
Resolves #226
When stopping connections to "deleted" servers use
stop(Pid, normal)
instead of
stop(Pid, kill)
, since killing crashes and will cascadeup to bring down the supervisor.
This behavior is reproducible when connecting to a replica set with a connection
string that doesn't match the hostname and port that Mongo sees for itself, such
as when running in a Docker environment.
As soon as
mc_monitor:check/2
reports the set of hosts for the replica set theywill come back different than what was used to connect, e.g. "mongo.localhost:27017"
vs. "localhost:27020" (if port-mapping is used with Docker), and so the driver will
think that the "localhost:27020" server was deleted from the cluster and then kill
the connections.
By not killing the connections the driver continues to operate stably.
❓ why are these connections being killed in the first place? would we really
want to crash the entire driver, connection pool, and database supervisor just because
the topology changed? (or in this case, was reporting differently than how we connected
to it in the first place)
❓ other responses in
mc_topology:update_topology_state/2
also kill their connections.should those be updated to
stop(Pid, normal)
as well? is there a need to communicatethese shutdowns to the calling application? could that be done without a crash? (the crashes
not only take down the supervisor but also flood the system crash logs, as noted in #224 and #225)