Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glusterd: avoid starting the same brick twice #4088

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 48 additions & 39 deletions xlators/mgmt/glusterd/src/glusterd-handler.c
Original file line number Diff line number Diff line change
Expand Up @@ -6358,53 +6358,62 @@ __glusterd_brick_rpc_notify(struct rpc_clnt *rpc, void *mydata,
brickinfo->path);
break;
}
if (glusterd_is_brick_started(brickinfo)) {
gf_msg(this->name, GF_LOG_INFO, 0, GD_MSG_BRICK_DISCONNECTED,
"Brick %s:%s has disconnected from glusterd.",
brickinfo->hostname, brickinfo->path);

ret = get_volinfo_from_brickid(brickid, &volinfo);
if (!glusterd_is_brick_started(brickinfo)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to set start_triggered to false also if brick is not started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to set start_triggered to false also if brick is not started?

No. Setting start_triggered to false is precisely what causes that glusterd tries to start the brick twice.

If the brick is still starting here, it means that someone else is managing it, so it's better to not touch anything and let the other thread to adjust the state and flags as necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag was introduced by the patch (https://review.gluster.org/#/c/glusterfs/+/18577/) and the patch was specific to brick_mux environment though it is applicable everywhere. The change would not be easy to validate. The purpose of this flag is to indicate brick_start has been triggered and it continues to be true until a brick has been disconnect so if we would not reset it the brick would not start. We can get the race scenario in the case of brick_mux only while continuous run brick stop/start in a loop.

Copy link
Contributor Author

@xhernandez xhernandez Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the whole start/stop logic is unnecessarily complex. However it's very hard to modify it now. The main problem here is that any attempt to connect to the brick while it's still starting will fail, so current code marks the brick as down, while actually it's still starting and most probably it will start successfully. So I think that marking it as stopped and clearing the start_triggered flag is incorrect (basically this causes that another attempt to start it from another thread creates a new process).

However, after looking again at the code, it seems that we can start bricks in an asynchronous mode (without actually waiting for the process start up) and there's no callback in case of failure. This means that no one will check if the process actually started or not to mark the brick as stopped in case of error. Even worse, just after starting a brick asynchronously, a connection attempt is done, which may easily fail under some conditions (I can hit this issue almost 100% of the time by running some tests on a zram disk).

How would you solve this issue ? I guess that making all brick starts synchronous is not an option, right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be a good idea start a brick asynchronously the challenge is how to make sure the brick has started successfully to establish a connection with glusterd.

/* If the brick is already GF_BRICK_STOPPED, there's no need to
* do anything else. However, if the brick is GF_BRICK_STARTING
* or GF_BRICK_STOPPING, it's better to not touch anything since
* there's another thread already managing the brick.
*
* This may happen when, during the start of a brick, another
* thread attempts to connect to it. Since the unix socket name
* is already present in brickinfo, it will attempt to connect.
* However the brick may still be initializing and not accepting
* connections, so the connection attempt will be rejected,
* triggering the RPC_CLNT_DISCONNECT event. */
break;
}

gf_msg(this->name, GF_LOG_INFO, 0, GD_MSG_BRICK_DISCONNECTED,
"Brick %s:%s has disconnected from glusterd.",
brickinfo->hostname, brickinfo->path);

ret = get_volinfo_from_brickid(brickid, &volinfo);
if (ret) {
gf_msg(this->name, GF_LOG_ERROR, 0, GD_MSG_VOLINFO_GET_FAIL,
"Failed to get volinfo from brickid(%s)", brickid);
goto out;
}
gf_event(EVENT_BRICK_DISCONNECTED, "peer=%s;volume=%s;brick=%s",
brickinfo->hostname, volinfo->volname, brickinfo->path);
/* In case of an abrupt shutdown of a brick PMAP_SIGNOUT
* event is not received by glusterd which can lead to a
* stale port entry in glusterd, so forcibly clean up
* the same if the process is not running sometime
* gf_is_service_running true so to ensure about brick instance
* call search_brick_path_from_proc
*/
GLUSTERD_GET_BRICK_PIDFILE(pidfile, volinfo, brickinfo, conf);
is_service_running = gf_is_service_running(pidfile, &pid);
if (pid > 0)
brickpath = search_brick_path_from_proc(pid, brickinfo->path);
if (!is_service_running || !brickpath) {
ret = pmap_port_remove(this, brickinfo->port, brickinfo->path,
NULL, _gf_true);
if (ret) {
gf_msg(this->name, GF_LOG_ERROR, 0, GD_MSG_VOLINFO_GET_FAIL,
"Failed to get volinfo from "
"brickid(%s)",
brickid);
goto out;
}
gf_event(EVENT_BRICK_DISCONNECTED, "peer=%s;volume=%s;brick=%s",
brickinfo->hostname, volinfo->volname,
brickinfo->path);
/* In case of an abrupt shutdown of a brick PMAP_SIGNOUT
* event is not received by glusterd which can lead to a
* stale port entry in glusterd, so forcibly clean up
* the same if the process is not running sometime
* gf_is_service_running true so to ensure about brick instance
* call search_brick_path_from_proc
*/
GLUSTERD_GET_BRICK_PIDFILE(pidfile, volinfo, brickinfo, conf);
is_service_running = gf_is_service_running(pidfile, &pid);
if (pid > 0)
brickpath = search_brick_path_from_proc(pid,
brickinfo->path);
if (!is_service_running || !brickpath) {
ret = pmap_port_remove(this, brickinfo->port,
brickinfo->path, NULL, _gf_true);
if (ret) {
gf_msg(this->name, GF_LOG_WARNING,
GD_MSG_PMAP_REGISTRY_REMOVE_FAIL, 0,
"Failed to remove pmap "
"registry for port %d for "
"brick %s",
brickinfo->port, brickinfo->path);
ret = 0;
}
gf_msg(this->name, GF_LOG_WARNING,
GD_MSG_PMAP_REGISTRY_REMOVE_FAIL, 0,
"Failed to remove pmap registry for port %d for "
"brick %s",
brickinfo->port, brickinfo->path);
ret = 0;
}
}

if (brickpath)
GF_FREE(brickpath);

if (is_brick_mx_enabled() && glusterd_is_brick_started(brickinfo)) {
if (is_brick_mx_enabled()) {
brick_proc = brickinfo->brick_proc;
if (!brick_proc)
break;
Expand Down