Skip to content

Commit

Permalink
Merge pull request #730 from johanneskastl/20220512_Kastl_arbiter_typos
Browse files Browse the repository at this point in the history
docs/Administrator-Guide/arbiter-volumes-and-quorum.md: typos, formatting, ...
  • Loading branch information
rakshithakamath94 authored Jul 1, 2022
2 parents b15bceb + c26fc5e commit 78bfdf9
Showing 1 changed file with 12 additions and 13 deletions.
25 changes: 12 additions & 13 deletions docs/Administrator-Guide/arbiter-volumes-and-quorum.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The arbiter volume is a special subset of replica volumes that is aimed at
preventing split-brains and providing the same consistency guarantees as a normal
replica 3 volume without consuming 3x space.
`replica 3` volume without consuming 3x space.

<!-- TOC depthFrom:1 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->

Expand All @@ -22,7 +22,7 @@ replica 3 volume without consuming 3x space.

The syntax for creating the volume is:
```
# gluster volume create <VOLNAME> replica 2 arbiter 1 <NEW-BRICK> ...
# gluster volume create <VOLNAME> replica 2 arbiter 1 <NEW-BRICK> ...
```
**Note**: The earlier syntax used to be ```replica 3 arbiter 1``` but that was
leading to confusions among users about the total no. of data bricks. For the
Expand All @@ -33,7 +33,7 @@ arbiter volume.

For example:
```
# gluster volume create testvol replica 2 arbiter 1 server{1..6}:/bricks/brick
# gluster volume create testvol replica 2 arbiter 1 server{1..6}:/bricks/brick
volume create: testvol: success: please start the volume to access data
```

Expand Down Expand Up @@ -66,21 +66,20 @@ performance.readdir-ahead: on `
```

The arbiter brick will store only the file/directory names (i.e. the tree structure)
and extended attributes (metadata) but not any data. i.e. the file size
and extended attributes (metadata) but not any data, i.e. the file size
(as shown by `ls -l`) will be zero bytes. It will also store other gluster
metadata like the .glusterfs folder and its contents.
metadata like the `.glusterfs` folder and its contents.

_**Note:** Enabling the arbiter feature **automatically** configures_
_client-quorum to 'auto'. This setting is **not** to be changed._

## Arbiter brick(s) sizing

Since the arbiter brick does not store file data, its disk usage will be considerably
less than the other bricks of the replica. The sizing of the brick will depend on
smaller than for the other bricks of the replica. The sizing of the brick will depend on
how many files you plan to store in the volume. A good estimate will be
4KB times the number of files in the replica. Note that the estimate also
depends on the inode space alloted by the underlying filesystem for a given
disk size.
depends on the inode space allocated by the underlying filesystem for a given disk size.

The `maxpct` value in XFS for volumes of size 1TB to 50TB is only 5%.
If you want to store say 300 million files, 4KB x 300M gives us 1.2TB.
Expand Down Expand Up @@ -130,7 +129,7 @@ greater than 50%, so that two nodes separated from each other do not believe
they have quorum simultaneously. For a two-node plain replica volume, this would
mean both nodes need to be up and running. So there is no notion of HA/failover.

There are users who create a replica 2 volume from 2 nodes and peer-probe
There are users who create a `replica 2` volume from 2 nodes and peer-probe
a 'dummy' node without bricks and enable server quorum with a ratio of 51%.
This does not prevent files from getting into split-brain. For example, if B1
and B2 are the bricks/nodes of the replica and B3 is the dummy node, we can
Expand Down Expand Up @@ -176,7 +175,7 @@ The following volume set options are used to configure it:
to specify the number of bricks to be active to participate in quorum.
If the quorum-type is auto then this option has no significance.

Earlier, when quorm was not met, the replica subvolume turned read-only. But
Earlier, when quorum was not met, the replica subvolume turned read-only. But
since [glusterfs-3.13](https://docs.gluster.org/en/latest/release-notes/3.13.0/#addition-of-checks-for-allowing-lookups-in-afr-and-removal-of-clusterquorum-reads-volume-option) and upwards, the subvolume becomes unavailable, i.e. all
the file operations fail with ENOTCONN error instead of becoming EROFS.
This means the ```cluster.quorum-reads``` volume option is also not supported.
Expand All @@ -185,16 +184,16 @@ This means the ```cluster.quorum-reads``` volume option is also not supported.
## Replica 2 and Replica 3 volumes

From the above descriptions, it is clear that client-quorum cannot really be applied
to a replica 2 volume:(without costing HA).
to a `replica 2` volume (without costing HA).
If the quorum-type is set to auto, then by the description
given earlier, the first brick must always be up, irrespective of the status of the
second brick. IOW, if only the second brick is up, the subvol returns ENOTCONN, i.e. no HA.
If quorum-type is set to fixed, then the quorum-count *has* to be two
to prevent split-brains (otherwise a write can succeed in brick1, another in brick2 =>split-brain).
So for all practical purposes, if you want high availability in a replica 2 volume,
So for all practical purposes, if you want high availability in a `replica 2` volume,
it is recommended not to enable client-quorum.

In a replica 3 volume, client-quorum is enabled by default and set to 'auto'.
In a `replica 3` volume, client-quorum is enabled by default and set to 'auto'.
This means 2 bricks need to be up for the write to succeed. Here is how this
configuration prevents files from ending up in split-brain:

Expand Down

0 comments on commit 78bfdf9

Please sign in to comment.