diff --git a/docs/rook/v1.15/CRDs/Block-Storage/ceph-block-pool-crd/index.html b/docs/rook/v1.15/CRDs/Block-Storage/ceph-block-pool-crd/index.html index 0dc27a4e7..eafbabb29 100644 --- a/docs/rook/v1.15/CRDs/Block-Storage/ceph-block-pool-crd/index.html +++ b/docs/rook/v1.15/CRDs/Block-Storage/ceph-block-pool-crd/index.html @@ -147,7 +147,7 @@ size: 4 replicasPerFailureDomain: 2 subFailureDomain: rack -

Pool Settings

Metadata

Spec

Add specific pool properties

With poolProperties you can set any pool property:

1
+

Pool Settings

Metadata

Spec

Add specific pool properties

With poolProperties you can set any pool property:

1
 2
 3
spec:
   parameters:
diff --git a/docs/rook/v1.15/CRDs/Block-Storage/ceph-rbd-mirror-crd/index.html b/docs/rook/v1.15/CRDs/Block-Storage/ceph-rbd-mirror-crd/index.html
index 6e8a0b38d..4e40ccc61 100644
--- a/docs/rook/v1.15/CRDs/Block-Storage/ceph-rbd-mirror-crd/index.html
+++ b/docs/rook/v1.15/CRDs/Block-Storage/ceph-rbd-mirror-crd/index.html
@@ -11,4 +11,4 @@
   namespace: rook-ceph
 spec:
   count: 1
-

Prerequisites

This guide assumes you have created a Rook cluster as explained in the main Quickstart guide

Settings

If any setting is unspecified, a suitable default will be used automatically.

RBDMirror metadata

RBDMirror Settings

Configuring mirroring peers

Configure mirroring peers individually for each CephBlockPool. Refer to the CephBlockPool documentation for more detail.

\ No newline at end of file +

Prerequisites

This guide assumes you have created a Rook cluster as explained in the main Quickstart guide

Settings

If any setting is unspecified, a suitable default will be used automatically.

RBDMirror metadata

RBDMirror Settings

Configuring mirroring peers

Configure mirroring peers individually for each CephBlockPool. Refer to the CephBlockPool documentation for more detail.

\ No newline at end of file diff --git a/docs/rook/v1.15/CRDs/Cluster/ceph-cluster-crd/index.html b/docs/rook/v1.15/CRDs/Cluster/ceph-cluster-crd/index.html index bf7b9b465..d523a2a93 100644 --- a/docs/rook/v1.15/CRDs/Cluster/ceph-cluster-crd/index.html +++ b/docs/rook/v1.15/CRDs/Cluster/ceph-cluster-crd/index.html @@ -1,11 +1,11 @@ - CephCluster CRD - Rook Ceph Documentation
Skip to content

CephCluster CRD

Rook allows creation and customization of storage clusters through the custom resource definitions (CRDs). There are primarily four different modes in which to create your cluster.

  1. Host Storage Cluster: Consume storage from host paths and raw devices
  2. PVC Storage Cluster: Dynamically provision storage underneath Rook by specifying the storage class Rook should use to consume storage (via PVCs)
  3. Stretched Storage Cluster: Distribute Ceph mons across three zones, while storage (OSDs) is only configured in two zones
  4. External Ceph Cluster: Connect your K8s applications to an external Ceph cluster

See the separate topics for a description and examples of each of these scenarios.

Settings

Settings can be specified at the global level to apply to the cluster as a whole, while other settings can be specified at more fine-grained levels. If any setting is unspecified, a suitable default will be used automatically.

Cluster metadata

  • name: The name that will be used internally for the Ceph cluster. Most commonly the name is the same as the namespace since multiple clusters are not supported in the same namespace.
  • namespace: The Kubernetes namespace that will be created for the Rook cluster. The services, pods, and other resources created by the operator will be added to this namespace. The common scenario is to create a single Rook cluster. If multiple clusters are created, they must not have conflicting devices or host paths.

Cluster Settings

  • external:
    • enable: if true, the cluster will not be managed by Rook but via an external entity. This mode is intended to connect to an existing cluster. In this case, Rook will only consume the external cluster. However, Rook will be able to deploy various daemons in Kubernetes such as object gateways, mds and nfs if an image is provided and will refuse otherwise. If this setting is enabled all the other options will be ignored except cephVersion.image and dataDirHostPath. See external cluster configuration. If cephVersion.image is left blank, Rook will refuse the creation of extra CRs like object, file and nfs.
  • cephVersion: The version information for launching the ceph daemons.
    • image: The image used for running the ceph daemons. For example, quay.io/ceph/ceph:v18.2.4. For more details read the container images section. For the latest ceph images, see the Ceph DockerHub. To ensure a consistent version of the image is running across all nodes in the cluster, it is recommended to use a very specific image version. Tags also exist that would give the latest version, but they are only recommended for test environments. For example, the tag v17 will be updated each time a new Quincy build is released. Using the v17 tag is not recommended in production because it may lead to inconsistent versions of the image running across different nodes in the cluster.
    • allowUnsupported: If true, allow an unsupported major version of the Ceph release. Currently quincy and reef are supported. Future versions such as squid (v19) would require this to be set to true. Should be set to false in production.
    • imagePullPolicy: The image pull policy for the ceph daemon pods. Possible values are Always, IfNotPresent, and Never. The default is IfNotPresent.
  • dataDirHostPath: The path on the host (hostPath) where config and data should be stored for each of the services. If there are multiple clusters, the directory must be unique for each cluster. If the directory does not exist, it will be created. Because this directory persists on the host, it will remain after pods are deleted. Following paths and any of their subpaths must not be used: /etc/ceph, /rook or /var/log/ceph.
    • WARNING: For test scenarios, if you delete a cluster and start a new cluster on the same hosts, the path used by dataDirHostPath must be deleted. Otherwise, stale keys and other config will remain from the previous cluster and the new mons will fail to start. If this value is empty, each pod will get an ephemeral directory to store their config files that is tied to the lifetime of the pod running on that node. More details can be found in the Kubernetes empty dir docs.
  • skipUpgradeChecks: if set to true Rook won't perform any upgrade checks on Ceph daemons during an upgrade. Use this at YOUR OWN RISK, only if you know what you're doing. To understand Rook's upgrade process of Ceph, read the upgrade doc.
  • continueUpgradeAfterChecksEvenIfNotHealthy: if set to true Rook will continue the OSD daemon upgrade process even if the PGs are not clean, or continue with the MDS upgrade even the file system is not healthy.
  • upgradeOSDRequiresHealthyPGs: if set to true OSD upgrade process won't start until PGs are healthy.
  • dashboard: Settings for the Ceph dashboard. To view the dashboard in your browser see the dashboard guide.
    • enabled: Whether to enable the dashboard to view cluster status
    • urlPrefix: Allows to serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    • port: Allows to change the default port where the dashboard is served
    • ssl: Whether to serve the dashboard via SSL, ignored on Ceph versions older than 13.2.2
  • monitoring: Settings for monitoring Ceph using Prometheus. To enable monitoring on your cluster see the monitoring guide.
    • enabled: Whether to enable the prometheus service monitor for an internal cluster. For an external cluster, whether to create an endpoint port for the metrics. Default is false.
    • metricsDisabled: Whether to disable the metrics reported by Ceph. If false, the prometheus mgr module and Ceph exporter are enabled. If true, the prometheus mgr module and Ceph exporter are both disabled. Default is false.
    • externalMgrEndpoints: external cluster manager endpoints
    • externalMgrPrometheusPort: external prometheus manager module port. See external cluster configuration for more details.
    • port: The internal prometheus manager module port where the prometheus mgr module listens. The port may need to be configured when host networking is enabled.
    • interval: The interval for the prometheus module to to scrape targets.
    • exporter: Ceph exporter metrics config.
      • perfCountersPrioLimit: Specifies which performance counters are exported. Corresponds to --prio-limit Ceph exporter flag. 0 - all counters are exported, default is 5.
      • statsPeriodSeconds: Time to wait before sending requests again to exporter server (seconds). Corresponds to --stats-period Ceph exporter flag. Default is 5.
  • network: For the network settings for the cluster, refer to the network configuration settings
  • mon: contains mon related options mon settings For more details on the mons and when to choose a number other than 3, see the mon health doc.
  • mgr: manager top level section
    • count: set number of ceph managers between 1 to 2. The default value is 2. If there are two managers, it is important for all mgr services point to the active mgr and not the standby mgr. Rook automatically updates the label mgr_role on the mgr pods to be either active or standby. Therefore, services need just to add the label mgr_role=active to their selector to point to the active mgr. This applies to all services that rely on the ceph mgr such as the dashboard or the prometheus metrics collector.
    • modules: A list of Ceph manager modules to enable or disable. Note the "dashboard" and "monitoring" modules are already configured by other settings.
  • crashCollector: The settings for crash collector daemon(s).
    • disable: is set to true, the crash collector will not run on any node where a Ceph daemon runs
    • daysToRetain: specifies the number of days to keep crash entries in the Ceph cluster. By default the entries are kept indefinitely.
  • logCollector: The settings for log collector daemon.
    • enabled: if set to true, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option log_to_file will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. In case a daemon terminates with a segfault, the coredump files will be commonly be generated in /var/lib/systemd/coredump directory on the host, depending on the underlying OS location. (default: true)
    • periodicity: how often to rotate daemon's log. (default: 24h). Specified with a time suffix which may be h for hours or d for days. Rotating too often will slightly impact the daemon's performance since the signal briefly interrupts the program.
  • annotations: annotations configuration settings
  • labels: labels configuration settings
  • placement: placement configuration settings
  • resources: resources configuration settings
  • priorityClassNames: priority class names configuration settings
  • storage: Storage selection and configuration that will be used across the cluster. Note that these settings can be overridden for specific nodes.
    • useAllNodes: true or false, indicating if all nodes in the cluster should be used for storage according to the cluster level storage selection and configuration values. If individual nodes are specified under the nodes field, then useAllNodes must be set to false.
    • nodes: Names of individual nodes in the cluster that should have their storage included in accordance with either the cluster level configuration specified above or any node specific overrides described in the next section below. useAllNodes must be set to false to use specific nodes and their config. See node settings below.
    • config: Config settings applied to all OSDs on the node unless overridden by devices. See the config settings below.
    • allowDeviceClassUpdate: Whether to allow changing the device class of an OSD after it is created. The default is false to prevent unintentional data movement or CRUSH changes if the device class is changed accidentally.
    • allowOsdCrushWeightUpdate: Whether Rook will resize the OSD CRUSH weight when the OSD PVC size is increased. This allows cluster data to be rebalanced to make most effective use of new OSD space. The default is false since data rebalancing can cause temporary cluster slowdown.
    • storage selection settings
    • Storage Class Device Sets
    • onlyApplyOSDPlacement: Whether the placement specific for OSDs is merged with the all placement. If false, the OSD placement will be merged with the all placement. If true, the OSD placement will be applied and the all placement will be ignored. The placement for OSDs is computed from several different places depending on the type of OSD:
      • For non-PVCs: placement.all and placement.osd
      • For PVCs: placement.all and inside the storageClassDeviceSets from the placement or preparePlacement
    • flappingRestartIntervalHours: Defines the time for which an OSD pod will sleep before restarting, if it stopped due to flapping. Flapping occurs where OSDs are marked down by Ceph more than 5 times in 600 seconds. The OSDs will stay down when flapping since they likely have a bad disk or other issue that needs investigation. If the issue with the OSD is fixed manually, the OSD pod can be manually restarted. The sleep is disabled if this interval is set to 0.
    • scheduleAlways: Whether to always schedule OSD pods on nodes declared explicitly in the "nodes" section, even if they are temporarily not schedulable. If set to true, consider adding placement tolerations for unschedulable nodes.
    • fullRatio: The ratio at which Ceph should block IO if the OSDs are too full. The default is 0.95.
    • backfillFullRatio: The ratio at which Ceph should stop backfilling data if the OSDs are too full. The default is 0.90.
    • nearFullRatio: The ratio at which Ceph should raise a health warning if the cluster is almost full. The default is 0.85.
  • disruptionManagement: The section for configuring management of daemon disruptions
    • managePodBudgets: if true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically via the strategy outlined in the design. The operator will block eviction of OSDs by default and unblock them safely when drains are detected.
    • osdMaintenanceTimeout: is a duration in minutes that determines how long an entire failureDomain like region/zone/host will be held in noout (in addition to the default DOWN/OUT interval) when it is draining. The default value is 30 minutes.
    • pgHealthCheckTimeout: A duration in minutes that the operator will wait for the placement groups to become healthy (see pgHealthyRegex) after a drain was completed and OSDs came back up. Operator will continue with the next drain if the timeout exceeds. No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
    • pgHealthyRegex: The regular expression that is used to determine which PG states should be considered healthy. The default is ^(active\+clean|active\+clean\+scrubbing|active\+clean\+scrubbing\+deep)$.
  • removeOSDsIfOutAndSafeToRemove: If true the operator will remove the OSDs that are down and whose data has been restored to other OSDs. In Ceph terms, the OSDs are out and safe-to-destroy when they are removed.
  • cleanupPolicy: cleanup policy settings
  • security: security page for key management configuration
  • cephConfig: Set Ceph config options using the Ceph Mon config store
  • csi: Set CSI Driver options

Ceph container images

Official releases of Ceph Container images are available from Docker Hub.

These are general purpose Ceph container with all necessary daemons and dependencies installed.

TAG MEANING
vRELNUM Latest release in this series (e.g., v17 = Quincy)
vRELNUM.Y Latest stable release in this stable series (e.g., v17.2)
vRELNUM.Y.Z A specific release (e.g., v18.2.4)
vRELNUM.Y.Z-YYYYMMDD A specific build (e.g., v18.2.4-20240724)

A specific will contain a specific release of Ceph as well as security fixes from the Operating System.

Mon Settings

  • count: Set the number of mons to be started. The number must be between 1 and 9. The recommended value is most commonly 3. For highest availability, an odd number of mons should be specified. For higher durability in case of mon loss, an even number can be specified although availability may be lower. To maintain quorum a majority of mons must be up. For example, if there are three mons, two must be up. If there are four mons, three must be up. If there are two mons, both must be up. If quorum is lost, see the disaster recovery guide to restore quorum from a single mon.
  • allowMultiplePerNode: Whether to allow the placement of multiple mons on a single node. Default is false for production. Should only be set to true in test environments.
  • volumeClaimTemplate: A PersistentVolumeSpec used by Rook to create PVCs for monitor storage. This field is optional, and when not provided, HostPath volume mounts are used. The current set of fields from template that are used are storageClassName and the storage resource request and limit. The default storage size request for new PVCs is 10Gi. Ensure that associated storage class is configured to use volumeBindingMode: WaitForFirstConsumer. This setting only applies to new monitors that are created when the requested number of monitors increases, or when a monitor fails and is recreated. An example CRD configuration is provided below.
  • failureDomainLabel: The label that is expected on each node where the mons are expected to be deployed. The labels must be found in the list of well-known topology labels.
  • zones: The failure domain names where the Mons are expected to be deployed. There must be at least three zones specified in the list. Each zone can be backed by a different storage class by specifying the volumeClaimTemplate.

    • name: The name of the zone, which is the value of the domain label.
    • volumeClaimTemplate: A PersistentVolumeSpec used by Rook to create PVCs for monitor storage. This field is optional, and when not provided, HostPath volume mounts are used. The current set of fields from template that are used are storageClassName and the storage resource request and limit. The default storage size request for new PVCs is 10Gi. Ensure that associated storage class is configured to use volumeBindingMode: WaitForFirstConsumer. This setting only applies to new monitors that are created when the requested number of monitors increases, or when a monitor fails and is recreated. An example CRD configuration is provided below.
  • stretchCluster: The stretch cluster settings that define the zones (or other failure domain labels) across which to configure the cluster.

    • failureDomainLabel: The label that is expected on each node where the cluster is expected to be deployed. The labels must be found in the list of well-known topology labels.
    • subFailureDomain: With a zone, the data replicas must be spread across OSDs in the subFailureDomain. The default is host.
    • zones: The failure domain names where the Mons and OSDs are expected to be deployed. There must be three zones specified in the list. This element is always named zone even if a non-default failureDomainLabel is specified. The elements have two values:
      • name: The name of the zone, which is the value of the domain label.
      • arbiter: Whether the zone is expected to be the arbiter zone which only runs a single mon. Exactly one zone must be labeled true.
      • volumeClaimTemplate: A PersistentVolumeSpec used by Rook to create PVCs for monitor storage. This field is optional, and when not provided, HostPath volume mounts are used. The current set of fields from template that are used are storageClassName and the storage resource request and limit. The default storage size request for new PVCs is 10Gi. Ensure that associated storage class is configured to use volumeBindingMode: WaitForFirstConsumer. This setting only applies to new monitors that are created when the requested number of monitors increases, or when a monitor fails and is recreated. An example CRD configuration is provided below. The two zones that are not the arbiter zone are expected to have OSDs deployed.

If these settings are changed in the CRD the operator will update the number of mons during a periodic check of the mon health, which by default is every 45 seconds.

To change the defaults that the operator uses to determine the mon health and whether to failover a mon, refer to the health settings. The intervals should be small enough that you have confidence the mons will maintain quorum, while also being long enough to ignore network blips where mons are failed over too often.

Mgr Settings

You can use the cluster CR to enable or disable any manager module. This can be configured like so:

1
+ CephCluster CRD - Rook Ceph Documentation      

CephCluster CRD

Rook allows creation and customization of storage clusters through the custom resource definitions (CRDs). There are primarily four different modes in which to create your cluster.

  1. Host Storage Cluster: Consume storage from host paths and raw devices
  2. PVC Storage Cluster: Dynamically provision storage underneath Rook by specifying the storage class Rook should use to consume storage (via PVCs)
  3. Stretched Storage Cluster: Distribute Ceph mons across three zones, while storage (OSDs) is only configured in two zones
  4. External Ceph Cluster: Connect your K8s applications to an external Ceph cluster

See the separate topics for a description and examples of each of these scenarios.

Settings

Settings can be specified at the global level to apply to the cluster as a whole, while other settings can be specified at more fine-grained levels. If any setting is unspecified, a suitable default will be used automatically.

Cluster metadata

  • name: The name that will be used internally for the Ceph cluster. Most commonly the name is the same as the namespace since multiple clusters are not supported in the same namespace.
  • namespace: The Kubernetes namespace that will be created for the Rook cluster. The services, pods, and other resources created by the operator will be added to this namespace. The common scenario is to create a single Rook cluster. If multiple clusters are created, they must not have conflicting devices or host paths.

Cluster Settings

  • external:
    • enable: if true, the cluster will not be managed by Rook but via an external entity. This mode is intended to connect to an existing cluster. In this case, Rook will only consume the external cluster. However, Rook will be able to deploy various daemons in Kubernetes such as object gateways, mds and nfs if an image is provided and will refuse otherwise. If this setting is enabled all the other options will be ignored except cephVersion.image and dataDirHostPath. See external cluster configuration. If cephVersion.image is left blank, Rook will refuse the creation of extra CRs like object, file and nfs.
  • cephVersion: The version information for launching the ceph daemons.
    • image: The image used for running the ceph daemons. For example, quay.io/ceph/ceph:v18.2.4. For more details read the container images section. For the latest ceph images, see the Ceph DockerHub. To ensure a consistent version of the image is running across all nodes in the cluster, it is recommended to use a very specific image version. Tags also exist that would give the latest version, but they are only recommended for test environments. For example, the tag v17 will be updated each time a new Quincy build is released. Using the v17 tag is not recommended in production because it may lead to inconsistent versions of the image running across different nodes in the cluster.
    • allowUnsupported: If true, allow an unsupported major version of the Ceph release. Currently quincy and reef are supported. Future versions such as squid (v19) would require this to be set to true. Should be set to false in production.
    • imagePullPolicy: The image pull policy for the ceph daemon pods. Possible values are Always, IfNotPresent, and Never. The default is IfNotPresent.
  • dataDirHostPath: The path on the host (hostPath) where config and data should be stored for each of the services. If there are multiple clusters, the directory must be unique for each cluster. If the directory does not exist, it will be created. Because this directory persists on the host, it will remain after pods are deleted. Following paths and any of their subpaths must not be used: /etc/ceph, /rook or /var/log/ceph.
    • WARNING: For test scenarios, if you delete a cluster and start a new cluster on the same hosts, the path used by dataDirHostPath must be deleted. Otherwise, stale keys and other config will remain from the previous cluster and the new mons will fail to start. If this value is empty, each pod will get an ephemeral directory to store their config files that is tied to the lifetime of the pod running on that node. More details can be found in the Kubernetes empty dir docs.
  • skipUpgradeChecks: if set to true Rook won't perform any upgrade checks on Ceph daemons during an upgrade. Use this at YOUR OWN RISK, only if you know what you're doing. To understand Rook's upgrade process of Ceph, read the upgrade doc.
  • continueUpgradeAfterChecksEvenIfNotHealthy: if set to true Rook will continue the OSD daemon upgrade process even if the PGs are not clean, or continue with the MDS upgrade even the file system is not healthy.
  • upgradeOSDRequiresHealthyPGs: if set to true OSD upgrade process won't start until PGs are healthy.
  • dashboard: Settings for the Ceph dashboard. To view the dashboard in your browser see the dashboard guide.
    • enabled: Whether to enable the dashboard to view cluster status
    • urlPrefix: Allows to serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    • port: Allows to change the default port where the dashboard is served
    • ssl: Whether to serve the dashboard via SSL, ignored on Ceph versions older than 13.2.2
  • monitoring: Settings for monitoring Ceph using Prometheus. To enable monitoring on your cluster see the monitoring guide.
    • enabled: Whether to enable the prometheus service monitor for an internal cluster. For an external cluster, whether to create an endpoint port for the metrics. Default is false.
    • metricsDisabled: Whether to disable the metrics reported by Ceph. If false, the prometheus mgr module and Ceph exporter are enabled. If true, the prometheus mgr module and Ceph exporter are both disabled. Default is false.
    • externalMgrEndpoints: external cluster manager endpoints
    • externalMgrPrometheusPort: external prometheus manager module port. See external cluster configuration for more details.
    • port: The internal prometheus manager module port where the prometheus mgr module listens. The port may need to be configured when host networking is enabled.
    • interval: The interval for the prometheus module to to scrape targets.
    • exporter: Ceph exporter metrics config.
      • perfCountersPrioLimit: Specifies which performance counters are exported. Corresponds to --prio-limit Ceph exporter flag. 0 - all counters are exported, default is 5.
      • statsPeriodSeconds: Time to wait before sending requests again to exporter server (seconds). Corresponds to --stats-period Ceph exporter flag. Default is 5.
  • network: For the network settings for the cluster, refer to the network configuration settings
  • mon: contains mon related options mon settings For more details on the mons and when to choose a number other than 3, see the mon health doc.
  • mgr: manager top level section
    • count: set number of ceph managers between 1 to 2. The default value is 2. If there are two managers, it is important for all mgr services point to the active mgr and not the standby mgr. Rook automatically updates the label mgr_role on the mgr pods to be either active or standby. Therefore, services need just to add the label mgr_role=active to their selector to point to the active mgr. This applies to all services that rely on the ceph mgr such as the dashboard or the prometheus metrics collector.
    • modules: A list of Ceph manager modules to enable or disable. Note the "dashboard" and "monitoring" modules are already configured by other settings.
  • crashCollector: The settings for crash collector daemon(s).
    • disable: is set to true, the crash collector will not run on any node where a Ceph daemon runs
    • daysToRetain: specifies the number of days to keep crash entries in the Ceph cluster. By default the entries are kept indefinitely.
  • logCollector: The settings for log collector daemon.
    • enabled: if set to true, the log collector will run as a side-car next to each Ceph daemon. The Ceph configuration option log_to_file will be turned on, meaning Ceph daemons will log on files in addition to still logging to container's stdout. These logs will be rotated. In case a daemon terminates with a segfault, the coredump files will be commonly be generated in /var/lib/systemd/coredump directory on the host, depending on the underlying OS location. (default: true)
    • periodicity: how often to rotate daemon's log. (default: 24h). Specified with a time suffix which may be h for hours or d for days. Rotating too often will slightly impact the daemon's performance since the signal briefly interrupts the program.
  • annotations: annotations configuration settings
  • labels: labels configuration settings
  • placement: placement configuration settings
  • resources: resources configuration settings
  • priorityClassNames: priority class names configuration settings
  • storage: Storage selection and configuration that will be used across the cluster. Note that these settings can be overridden for specific nodes.
    • useAllNodes: true or false, indicating if all nodes in the cluster should be used for storage according to the cluster level storage selection and configuration values. If individual nodes are specified under the nodes field, then useAllNodes must be set to false.
    • nodes: Names of individual nodes in the cluster that should have their storage included in accordance with either the cluster level configuration specified above or any node specific overrides described in the next section below. useAllNodes must be set to false to use specific nodes and their config. See node settings below.
    • config: Config settings applied to all OSDs on the node unless overridden by devices. See the config settings below.
    • allowDeviceClassUpdate: Whether to allow changing the device class of an OSD after it is created. The default is false to prevent unintentional data movement or CRUSH changes if the device class is changed accidentally.
    • allowOsdCrushWeightUpdate: Whether Rook will resize the OSD CRUSH weight when the OSD PVC size is increased. This allows cluster data to be rebalanced to make most effective use of new OSD space. The default is false since data rebalancing can cause temporary cluster slowdown.
    • storage selection settings
    • Storage Class Device Sets
    • onlyApplyOSDPlacement: Whether the placement specific for OSDs is merged with the all placement. If false, the OSD placement will be merged with the all placement. If true, the OSD placement will be applied and the all placement will be ignored. The placement for OSDs is computed from several different places depending on the type of OSD:
      • For non-PVCs: placement.all and placement.osd
      • For PVCs: placement.all and inside the storageClassDeviceSets from the placement or preparePlacement
    • flappingRestartIntervalHours: Defines the time for which an OSD pod will sleep before restarting, if it stopped due to flapping. Flapping occurs where OSDs are marked down by Ceph more than 5 times in 600 seconds. The OSDs will stay down when flapping since they likely have a bad disk or other issue that needs investigation. If the issue with the OSD is fixed manually, the OSD pod can be manually restarted. The sleep is disabled if this interval is set to 0.
    • scheduleAlways: Whether to always schedule OSD pods on nodes declared explicitly in the "nodes" section, even if they are temporarily not schedulable. If set to true, consider adding placement tolerations for unschedulable nodes.
    • fullRatio: The ratio at which Ceph should block IO if the OSDs are too full. The default is 0.95.
    • backfillFullRatio: The ratio at which Ceph should stop backfilling data if the OSDs are too full. The default is 0.90.
    • nearFullRatio: The ratio at which Ceph should raise a health warning if the cluster is almost full. The default is 0.85.
  • disruptionManagement: The section for configuring management of daemon disruptions
    • managePodBudgets: if true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically via the strategy outlined in the design. The operator will block eviction of OSDs by default and unblock them safely when drains are detected.
    • osdMaintenanceTimeout: is a duration in minutes that determines how long an entire failureDomain like region/zone/host will be held in noout (in addition to the default DOWN/OUT interval) when it is draining. The default value is 30 minutes.
    • pgHealthCheckTimeout: A duration in minutes that the operator will wait for the placement groups to become healthy (see pgHealthyRegex) after a drain was completed and OSDs came back up. Operator will continue with the next drain if the timeout exceeds. No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.
    • pgHealthyRegex: The regular expression that is used to determine which PG states should be considered healthy. The default is ^(active\+clean|active\+clean\+scrubbing|active\+clean\+scrubbing\+deep)$.
  • removeOSDsIfOutAndSafeToRemove: If true the operator will remove the OSDs that are down and whose data has been restored to other OSDs. In Ceph terms, the OSDs are out and safe-to-destroy when they are removed.
  • cleanupPolicy: cleanup policy settings
  • security: security page for key management configuration
  • cephConfig: Set Ceph config options using the Ceph Mon config store
  • csi: Set CSI Driver options

Ceph container images

Official releases of Ceph Container images are available from Docker Hub.

These are general purpose Ceph container with all necessary daemons and dependencies installed.

TAG MEANING
vRELNUM Latest release in this series (e.g., v17 = Quincy)
vRELNUM.Y Latest stable release in this stable series (e.g., v17.2)
vRELNUM.Y.Z A specific release (e.g., v18.2.4)
vRELNUM.Y.Z-YYYYMMDD A specific build (e.g., v18.2.4-20240724)

A specific will contain a specific release of Ceph as well as security fixes from the Operating System.

Mon Settings

  • count: Set the number of mons to be started. The number must be between 1 and 9. The recommended value is most commonly 3. For highest availability, an odd number of mons should be specified. For higher durability in case of mon loss, an even number can be specified although availability may be lower. To maintain quorum a majority of mons must be up. For example, if there are three mons, two must be up. If there are four mons, three must be up. If there are two mons, both must be up. If quorum is lost, see the disaster recovery guide to restore quorum from a single mon.
  • allowMultiplePerNode: Whether to allow the placement of multiple mons on a single node. Default is false for production. Should only be set to true in test environments.
  • volumeClaimTemplate: A PersistentVolumeSpec used by Rook to create PVCs for monitor storage. This field is optional, and when not provided, HostPath volume mounts are used. The current set of fields from template that are used are storageClassName and the storage resource request and limit. The default storage size request for new PVCs is 10Gi. Ensure that associated storage class is configured to use volumeBindingMode: WaitForFirstConsumer. This setting only applies to new monitors that are created when the requested number of monitors increases, or when a monitor fails and is recreated. An example CRD configuration is provided below.
  • failureDomainLabel: The label that is expected on each node where the mons are expected to be deployed. The labels must be found in the list of well-known topology labels.
  • zones: The failure domain names where the Mons are expected to be deployed. There must be at least three zones specified in the list. Each zone can be backed by a different storage class by specifying the volumeClaimTemplate.

    • name: The name of the zone, which is the value of the domain label.
    • volumeClaimTemplate: A PersistentVolumeSpec used by Rook to create PVCs for monitor storage. This field is optional, and when not provided, HostPath volume mounts are used. The current set of fields from template that are used are storageClassName and the storage resource request and limit. The default storage size request for new PVCs is 10Gi. Ensure that associated storage class is configured to use volumeBindingMode: WaitForFirstConsumer. This setting only applies to new monitors that are created when the requested number of monitors increases, or when a monitor fails and is recreated. An example CRD configuration is provided below.
  • stretchCluster: The stretch cluster settings that define the zones (or other failure domain labels) across which to configure the cluster.

    • failureDomainLabel: The label that is expected on each node where the cluster is expected to be deployed. The labels must be found in the list of well-known topology labels.
    • subFailureDomain: With a zone, the data replicas must be spread across OSDs in the subFailureDomain. The default is host.
    • zones: The failure domain names where the Mons and OSDs are expected to be deployed. There must be three zones specified in the list. This element is always named zone even if a non-default failureDomainLabel is specified. The elements have two values:
      • name: The name of the zone, which is the value of the domain label.
      • arbiter: Whether the zone is expected to be the arbiter zone which only runs a single mon. Exactly one zone must be labeled true.
      • volumeClaimTemplate: A PersistentVolumeSpec used by Rook to create PVCs for monitor storage. This field is optional, and when not provided, HostPath volume mounts are used. The current set of fields from template that are used are storageClassName and the storage resource request and limit. The default storage size request for new PVCs is 10Gi. Ensure that associated storage class is configured to use volumeBindingMode: WaitForFirstConsumer. This setting only applies to new monitors that are created when the requested number of monitors increases, or when a monitor fails and is recreated. An example CRD configuration is provided below. The two zones that are not the arbiter zone are expected to have OSDs deployed.

If these settings are changed in the CRD the operator will update the number of mons during a periodic check of the mon health, which by default is every 45 seconds.

To change the defaults that the operator uses to determine the mon health and whether to failover a mon, refer to the health settings. The intervals should be small enough that you have confidence the mons will maintain quorum, while also being long enough to ignore network blips where mons are failed over too often.

Mgr Settings

You can use the cluster CR to enable or disable any manager module. This can be configured like so:

1
 2
 3
 4
mgr:
   modules:
   - name: <name of the module>
     enabled: true
-

Some modules will have special configuration to ensure the module is fully functional after being enabled. Specifically:

  • pg_autoscaler: Rook will configure all new pools with PG autoscaling by setting: osd_pool_default_pg_autoscale_mode = on

Network Configuration Settings

If not specified, the default SDN will be used. Configure the network that will be enabled for the cluster and services.

  • provider: Specifies the network provider that will be used to connect the network interface. You can choose between host, and multus.
  • selectors: Used for multus provider only. Select NetworkAttachmentDefinitions to use for Ceph networks.
    • public: Select the NetworkAttachmentDefinition to use for the public network.
    • cluster: Select the NetworkAttachmentDefinition to use for the cluster network.
  • addressRanges: Used for host or multus providers only. Allows overriding the address ranges (CIDRs) that Ceph will listen on.
    • public: A list of individual network ranges in CIDR format to use for Ceph's public network.
    • cluster: A list of individual network ranges in CIDR format to use for Ceph's cluster network.
  • ipFamily: Specifies the network stack Ceph daemons should listen on.
  • dualStack: Specifies that Ceph daemon should listen on both IPv4 and IPv6 network stacks.
  • connections: Settings for network connections using Ceph's msgr2 protocol
    • requireMsgr2: Whether to require communication over msgr2. If true, the msgr v1 port (6789) will be disabled and clients will be required to connect to the Ceph cluster with the v2 port (3300). Requires a kernel that supports msgr2 (kernel 5.11 or CentOS 8.4 or newer). Default is false.
    • encryption: Settings for encryption on the wire to Ceph daemons
      • enabled: Whether to encrypt the data in transit across the wire to prevent eavesdropping the data on the network. The default is false. When encryption is enabled, all communication between clients and Ceph daemons, or between Ceph daemons will be encrypted. When encryption is not enabled, clients still establish a strong initial authentication and data integrity is still validated with a crc check. IMPORTANT: Encryption requires the 5.11 kernel for the latest nbd and cephfs drivers. Alternatively for testing only, set "mounter: rbd-nbd" in the rbd storage class, or "mounter: fuse" in the cephfs storage class. The nbd and fuse drivers are not recommended in production since restarting the csi driver pod will disconnect the volumes. If this setting is enabled, CephFS volumes also require setting CSI_CEPHFS_KERNEL_MOUNT_OPTIONS to "ms_mode=secure" in operator.yaml.
    • compression:
      • enabled: Whether to compress the data in transit across the wire. The default is false. See the kernel requirements above for encryption.

Caution

Changing networking configuration after a Ceph cluster has been deployed is only supported for the network encryption settings. Changing other network settings is NOT supported and will likely result in a non-functioning cluster.

Provider

Selecting a non-default network provider is an advanced topic. Read more in the Network Providers documentation.

IPFamily

Provide single-stack IPv4 or IPv6 protocol to assign corresponding addresses to pods and services. This field is optional. Possible inputs are IPv6 and IPv4. Empty value will be treated as IPv4. To enable dual stack see the network configuration section.

Node Settings

In addition to the cluster level settings specified above, each individual node can also specify configuration to override the cluster level settings and defaults. If a node does not specify any configuration then it will inherit the cluster level settings.

  • name: The name of the node, which should match its kubernetes.io/hostname label.
  • config: Config settings applied to all OSDs on the node unless overridden by devices. See the config settings below.
  • storage selection settings

When useAllNodes is set to true, Rook attempts to make Ceph cluster management as hands-off as possible while still maintaining reasonable data safety. If a usable node comes online, Rook will begin to use it automatically. To maintain a balance between hands-off usability and data safety, Nodes are removed from Ceph as OSD hosts only (1) if the node is deleted from Kubernetes itself or (2) if the node has its taints or affinities modified in such a way that the node is no longer usable by Rook. Any changes to taints or affinities, intentional or unintentional, may affect the data reliability of the Ceph cluster. In order to help protect against this somewhat, deletion of nodes by taint or affinity modifications must be "confirmed" by deleting the Rook Ceph operator pod and allowing the operator deployment to restart the pod.

For production clusters, we recommend that useAllNodes is set to false to prevent the Ceph cluster from suffering reduced data reliability unintentionally due to a user mistake. When useAllNodes is set to false, Rook relies on the user to be explicit about when nodes are added to or removed from the Ceph cluster. Nodes are only added to the Ceph cluster if the node is added to the Ceph cluster resource. Similarly, nodes are only removed if the node is removed from the Ceph cluster resource.

Node Updates

Nodes can be added and removed over time by updating the Cluster CRD, for example with kubectl -n rook-ceph edit cephcluster rook-ceph. This will bring up your default text editor and allow you to add and remove storage nodes from the cluster. This feature is only available when useAllNodes has been set to false.

Storage Selection Settings

Below are the settings for host-based cluster. This type of cluster can specify devices for OSDs, both at the cluster and individual node level, for selecting which storage resources will be included in the cluster.

  • useAllDevices: true or false, indicating whether all devices found on nodes in the cluster should be automatically consumed by OSDs. Not recommended unless you have a very controlled environment where you will not risk formatting of devices with existing data. When true, all devices and partitions will be used. Is overridden by deviceFilter if specified. LVM logical volumes are not picked by useAllDevices.
  • deviceFilter: A regular expression for short kernel names of devices (e.g. sda) that allows selection of devices and partitions to be consumed by OSDs. LVM logical volumes are not picked by deviceFilter.If individual devices have been specified for a node then this filter will be ignored. This field uses golang regular expression syntax. For example:
    • sdb: Only selects the sdb device if found
    • ^sd.: Selects all devices starting with sd
    • ^sd[a-d]: Selects devices starting with sda, sdb, sdc, and sdd if found
    • ^s: Selects all devices that start with s
    • ^[^r]: Selects all devices that do not start with r
  • devicePathFilter: A regular expression for device paths (e.g. /dev/disk/by-path/pci-0:1:2:3-scsi-1) that allows selection of devices and partitions to be consumed by OSDs. LVM logical volumes are not picked by devicePathFilter.If individual devices or deviceFilter have been specified for a node then this filter will be ignored. This field uses golang regular expression syntax. For example:
    • ^/dev/sd.: Selects all devices starting with sd
    • ^/dev/disk/by-path/pci-.*: Selects all devices which are connected to PCI bus
  • devices: A list of individual device names belonging to this node to include in the storage cluster.
    • name: The name of the devices and partitions (e.g., sda). The full udev path can also be specified for devices, partitions, and logical volumes (e.g. /dev/disk/by-id/ata-ST4000DM004-XXXX - this will not change after reboots).
    • config: Device-specific config settings. See the config settings below

Host-based cluster supports raw devices, partitions, logical volumes, encrypted devices, and multipath devices. Be sure to see the quickstart doc prerequisites for additional considerations.

Below are the settings for a PVC-based cluster.

Storage Class Device Sets

The following are the settings for Storage Class Device Sets which can be configured to create OSDs that are backed by block mode PVs.

  • name: A name for the set.
  • count: The number of devices in the set.
  • resources: The CPU and RAM requests/limits for the devices. (Optional)
  • placement: The placement criteria for the devices. (Optional) Default is no placement criteria.

    The syntax is the same as for other placement configuration. It supports nodeAffinity, podAffinity, podAntiAffinity and tolerations keys.

    It is recommended to configure the placement such that the OSDs will be as evenly spread across nodes as possible. At a minimum, anti-affinity should be added so at least one OSD will be placed on each available nodes.

    However, if there are more OSDs than nodes, this anti-affinity will not be effective. Another placement scheme to consider is to add labels to the nodes in such a way that the OSDs can be grouped on those nodes, create multiple storageClassDeviceSets, and add node affinity to each of the device sets that will place the OSDs in those sets of nodes.

    Rook will automatically add required nodeAffinity to the OSD daemons to match the topology labels that are found on the nodes where the OSD prepare jobs ran. To ensure data durability, the OSDs are required to run in the same topology that the Ceph CRUSH map expects. For example, if the nodes are labeled with rack topology labels, the OSDs will be constrained to a certain rack. Without the topology labels, Rook will not constrain the OSDs beyond what is required by the PVs, for example to run in the zone where provisioned. See the OSD Topology section for the related labels.

  • preparePlacement: The placement criteria for the preparation of the OSD devices. Creating OSDs is a two-step process and the prepare job may require different placement than the OSD daemons. If the preparePlacement is not specified, the placement will instead be applied for consistent placement for the OSD prepare jobs and OSD deployments. The preparePlacement is only useful for portable OSDs in the device sets. OSDs that are not portable will be tied to the host where the OSD prepare job initially runs.

    • For example, provisioning may require topology spread constraints across zones, but the OSD daemons may require constraints across hosts within the zones.
  • portable: If true, the OSDs will be allowed to move between nodes during failover. This requires a storage class that supports portability (e.g. aws-ebs, but not the local storage provisioner). If false, the OSDs will be assigned to a node permanently. Rook will configure Ceph's CRUSH map to support the portability.
  • tuneDeviceClass: For example, Ceph cannot detect AWS volumes as HDDs from the storage class "gp2-csi", so you can improve Ceph performance by setting this to true.
  • tuneFastDeviceClass: For example, Ceph cannot detect Azure disks as SSDs from the storage class "managed-premium", so you can improve Ceph performance by setting this to true..
  • volumeClaimTemplates: A list of PVC templates to use for provisioning the underlying storage devices.
    • metadata.name: "data", "metadata", or "wal". If a single template is provided, the name must be "data". If the name is "metadata" or "wal", the devices are used to store the Ceph metadata or WAL respectively. In both cases, the devices must be raw devices or LVM logical volumes.
      • resources.requests.storage: The desired capacity for the underlying storage devices.
      • storageClassName: The StorageClass to provision PVCs from. Default would be to use the cluster-default StorageClass.
      • volumeMode: The volume mode to be set for the PVC. Which should be Block
      • accessModes: The access mode for the PVC to be bound by OSD.
  • schedulerName: Scheduler name for OSD pod placement. (Optional)
  • encrypted: whether to encrypt all the OSDs in a given storageClassDeviceSet

See the table in OSD Configuration Settings to know the allowed configurations.

OSD Configuration Settings

The following storage selection settings are specific to Ceph and do not apply to other backends. All variables are key-value pairs represented as strings.

  • metadataDevice: Name of a device, partition or lvm to use for the metadata of OSDs on each node. Performance can be improved by using a low latency device (such as SSD or NVMe) as the metadata device, while other spinning platter (HDD) devices on a node are used to store data. Provisioning will fail if the user specifies a metadataDevice but that device is not used as a metadata device by Ceph. Notably, ceph-volume will not use a device of the same device class (HDD, SSD, NVMe) as OSD devices for metadata, resulting in this failure.
  • databaseSizeMB: The size in MB of a bluestore database. Include quotes around the size.
  • walSizeMB: The size in MB of a bluestore write ahead log (WAL). Include quotes around the size.
  • deviceClass: The CRUSH device class to use for this selection of storage devices. (By default, if a device's class has not already been set, OSDs will automatically set a device's class to either hdd, ssd, or nvme based on the hardware properties exposed by the Linux kernel.) These storage classes can then be used to select the devices backing a storage pool by specifying them as the value of the pool spec's deviceClass field. If updating the device class of an OSD after the OSD is already created, allowDeviceClassUpdate: true must be set. Otherwise updates to this deviceClass will be ignored.
  • initialWeight: The initial OSD weight in TiB units. By default, this value is derived from OSD's capacity.
  • primaryAffinity: The primary-affinity value of an OSD, within range [0, 1] (default: 1).
  • osdsPerDevice**: The number of OSDs to create on each device. High performance devices such as NVMe can handle running multiple OSDs. If desired, this can be overridden for each node and each device.
  • encryptedDevice**: Encrypt OSD volumes using dmcrypt ("true" or "false"). By default this option is disabled. See encryption for more information on encryption in Ceph. (Resizing is not supported for host-based clusters.)
  • crushRoot: The value of the root CRUSH map label. The default is default. Generally, you should not need to change this. However, if any of your topology labels may have the value default, you need to change crushRoot to avoid conflicts, since CRUSH map values need to be unique.
  • enableCrushUpdates: Enables rook to update the pool crush rule using Pool Spec. Can cause data remapping if crush rule changes, Defaults to false.

Allowed configurations are:

block device type host-based cluster PVC-based cluster
disk
part encryptedDevice must be false encrypted must be false
lvm metadataDevice must be "", osdsPerDevice must be 1, and encryptedDevice must be false metadata.name must not be metadata or wal and encrypted must be false
crypt
mpath

Limitations of metadata device

  • If metadataDevice is specified in the global OSD configuration or in the node level OSD configuration, the metadata device will be shared between all OSDs on the same node. In other words, OSDs will be initialized by lvm batch. In this case, we can't use partition device.
  • If metadataDevice is specified in the device local configuration, we can use partition as metadata device. In other words, OSDs are initialized by lvm prepare.

Annotations and Labels

Annotations and Labels can be specified so that the Rook components will have those annotations / labels added to them.

You can set annotations / labels for Rook components for the list of key value pairs:

  • all: Set annotations / labels for all components except clusterMetadata.
  • mgr: Set annotations / labels for MGRs
  • mon: Set annotations / labels for mons
  • osd: Set annotations / labels for OSDs
  • dashboard: Set annotations / labels for the dashboard service
  • prepareosd: Set annotations / labels for OSD Prepare Jobs
  • monitoring: Set annotations / labels for service monitor
  • crashcollector: Set annotations / labels for crash collectors
  • clusterMetadata: Set annotations only to rook-ceph-mon-endpoints configmap and the rook-ceph-mon and rook-ceph-admin-keyring secrets. These annotations will not be merged with the all annotations. The common usage is for backing up these critical resources with kubed. Note the clusterMetadata annotation will not be merged with the all annotation. When other keys are set, all will be merged together with the specific component.

Placement Configuration Settings

Placement configuration for the cluster services. It includes the following keys: mgr, mon, arbiter, osd, prepareosd, cleanup, and all. Each service will have its placement configuration generated by merging the generic configuration under all with the most specific one (which will override any attributes).

In stretch clusters, if the arbiter placement is specified, that placement will only be applied to the arbiter. Neither will the arbiter placement be merged with the all placement to allow the arbiter to be fully independent of other daemon placement. The remaining mons will still use the mon and/or all sections.

Note

Placement of OSD pods is controlled using the Storage Class Device Set, not the general placement configuration.

A Placement configuration is specified (according to the kubernetes PodSpec) as:

If you use labelSelector for osd pods, you must write two rules both for rook-ceph-osd and rook-ceph-osd-prepare like the example configuration. It comes from the design that there are these two pods for an OSD. For more detail, see the osd design doc and the related issue.

The Rook Ceph operator creates a Job called rook-ceph-detect-version to detect the full Ceph version used by the given cephVersion.image. The placement from the mon section is used for the Job except for the PodAntiAffinity field.

Placement Example

To control where various services will be scheduled by kubernetes, use the placement configuration sections below. The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node. Specific node affinity and tolerations that only apply to themondaemons in this example require the labelrole=storage-mon-node` and also tolerate the control plane taint.

 1
+

Some modules will have special configuration to ensure the module is fully functional after being enabled. Specifically:

  • pg_autoscaler: Rook will configure all new pools with PG autoscaling by setting: osd_pool_default_pg_autoscale_mode = on

Network Configuration Settings

If not specified, the default SDN will be used. Configure the network that will be enabled for the cluster and services.

  • provider: Specifies the network provider that will be used to connect the network interface. You can choose between host, and multus.
  • selectors: Used for multus provider only. Select NetworkAttachmentDefinitions to use for Ceph networks.
    • public: Select the NetworkAttachmentDefinition to use for the public network.
    • cluster: Select the NetworkAttachmentDefinition to use for the cluster network.
  • addressRanges: Used for host or multus providers only. Allows overriding the address ranges (CIDRs) that Ceph will listen on.
    • public: A list of individual network ranges in CIDR format to use for Ceph's public network.
    • cluster: A list of individual network ranges in CIDR format to use for Ceph's cluster network.
  • ipFamily: Specifies the network stack Ceph daemons should listen on.
  • dualStack: Specifies that Ceph daemon should listen on both IPv4 and IPv6 network stacks.
  • connections: Settings for network connections using Ceph's msgr2 protocol
    • requireMsgr2: Whether to require communication over msgr2. If true, the msgr v1 port (6789) will be disabled and clients will be required to connect to the Ceph cluster with the v2 port (3300). Requires a kernel that supports msgr2 (kernel 5.11 or CentOS 8.4 or newer). Default is false.
    • encryption: Settings for encryption on the wire to Ceph daemons
      • enabled: Whether to encrypt the data in transit across the wire to prevent eavesdropping the data on the network. The default is false. When encryption is enabled, all communication between clients and Ceph daemons, or between Ceph daemons will be encrypted. When encryption is not enabled, clients still establish a strong initial authentication and data integrity is still validated with a crc check. IMPORTANT: Encryption requires the 5.11 kernel for the latest nbd and cephfs drivers. Alternatively for testing only, set "mounter: rbd-nbd" in the rbd storage class, or "mounter: fuse" in the cephfs storage class. The nbd and fuse drivers are not recommended in production since restarting the csi driver pod will disconnect the volumes. If this setting is enabled, CephFS volumes also require setting CSI_CEPHFS_KERNEL_MOUNT_OPTIONS to "ms_mode=secure" in operator.yaml.
    • compression:
      • enabled: Whether to compress the data in transit across the wire. The default is false. See the kernel requirements above for encryption.

Caution

Changing networking configuration after a Ceph cluster has been deployed is only supported for the network encryption settings. Changing other network settings is NOT supported and will likely result in a non-functioning cluster.

Provider

Selecting a non-default network provider is an advanced topic. Read more in the Network Providers documentation.

IPFamily

Provide single-stack IPv4 or IPv6 protocol to assign corresponding addresses to pods and services. This field is optional. Possible inputs are IPv6 and IPv4. Empty value will be treated as IPv4. To enable dual stack see the network configuration section.

Node Settings

In addition to the cluster level settings specified above, each individual node can also specify configuration to override the cluster level settings and defaults. If a node does not specify any configuration then it will inherit the cluster level settings.

  • name: The name of the node, which should match its kubernetes.io/hostname label.
  • config: Config settings applied to all OSDs on the node unless overridden by devices. See the config settings below.
  • storage selection settings

When useAllNodes is set to true, Rook attempts to make Ceph cluster management as hands-off as possible while still maintaining reasonable data safety. If a usable node comes online, Rook will begin to use it automatically. To maintain a balance between hands-off usability and data safety, Nodes are removed from Ceph as OSD hosts only (1) if the node is deleted from Kubernetes itself or (2) if the node has its taints or affinities modified in such a way that the node is no longer usable by Rook. Any changes to taints or affinities, intentional or unintentional, may affect the data reliability of the Ceph cluster. In order to help protect against this somewhat, deletion of nodes by taint or affinity modifications must be "confirmed" by deleting the Rook Ceph operator pod and allowing the operator deployment to restart the pod.

For production clusters, we recommend that useAllNodes is set to false to prevent the Ceph cluster from suffering reduced data reliability unintentionally due to a user mistake. When useAllNodes is set to false, Rook relies on the user to be explicit about when nodes are added to or removed from the Ceph cluster. Nodes are only added to the Ceph cluster if the node is added to the Ceph cluster resource. Similarly, nodes are only removed if the node is removed from the Ceph cluster resource.

Node Updates

Nodes can be added and removed over time by updating the Cluster CRD, for example with kubectl -n rook-ceph edit cephcluster rook-ceph. This will bring up your default text editor and allow you to add and remove storage nodes from the cluster. This feature is only available when useAllNodes has been set to false.

Storage Selection Settings

Below are the settings for host-based cluster. This type of cluster can specify devices for OSDs, both at the cluster and individual node level, for selecting which storage resources will be included in the cluster.

  • useAllDevices: true or false, indicating whether all devices found on nodes in the cluster should be automatically consumed by OSDs. Not recommended unless you have a very controlled environment where you will not risk formatting of devices with existing data. When true, all devices and partitions will be used. Is overridden by deviceFilter if specified. LVM logical volumes are not picked by useAllDevices.
  • deviceFilter: A regular expression for short kernel names of devices (e.g. sda) that allows selection of devices and partitions to be consumed by OSDs. LVM logical volumes are not picked by deviceFilter.If individual devices have been specified for a node then this filter will be ignored. This field uses golang regular expression syntax. For example:
    • sdb: Only selects the sdb device if found
    • ^sd.: Selects all devices starting with sd
    • ^sd[a-d]: Selects devices starting with sda, sdb, sdc, and sdd if found
    • ^s: Selects all devices that start with s
    • ^[^r]: Selects all devices that do not start with r
  • devicePathFilter: A regular expression for device paths (e.g. /dev/disk/by-path/pci-0:1:2:3-scsi-1) that allows selection of devices and partitions to be consumed by OSDs. LVM logical volumes are not picked by devicePathFilter.If individual devices or deviceFilter have been specified for a node then this filter will be ignored. This field uses golang regular expression syntax. For example:
    • ^/dev/sd.: Selects all devices starting with sd
    • ^/dev/disk/by-path/pci-.*: Selects all devices which are connected to PCI bus
  • devices: A list of individual device names belonging to this node to include in the storage cluster.
    • name: The name of the devices and partitions (e.g., sda). The full udev path can also be specified for devices, partitions, and logical volumes (e.g. /dev/disk/by-id/ata-ST4000DM004-XXXX - this will not change after reboots).
    • config: Device-specific config settings. See the config settings below

Host-based cluster supports raw devices, partitions, logical volumes, encrypted devices, and multipath devices. Be sure to see the quickstart doc prerequisites for additional considerations.

Below are the settings for a PVC-based cluster.

Storage Class Device Sets

The following are the settings for Storage Class Device Sets which can be configured to create OSDs that are backed by block mode PVs.

  • name: A name for the set.
  • count: The number of devices in the set.
  • resources: The CPU and RAM requests/limits for the devices. (Optional)
  • placement: The placement criteria for the devices. (Optional) Default is no placement criteria.

    The syntax is the same as for other placement configuration. It supports nodeAffinity, podAffinity, podAntiAffinity and tolerations keys.

    It is recommended to configure the placement such that the OSDs will be as evenly spread across nodes as possible. At a minimum, anti-affinity should be added so at least one OSD will be placed on each available nodes.

    However, if there are more OSDs than nodes, this anti-affinity will not be effective. Another placement scheme to consider is to add labels to the nodes in such a way that the OSDs can be grouped on those nodes, create multiple storageClassDeviceSets, and add node affinity to each of the device sets that will place the OSDs in those sets of nodes.

    Rook will automatically add required nodeAffinity to the OSD daemons to match the topology labels that are found on the nodes where the OSD prepare jobs ran. To ensure data durability, the OSDs are required to run in the same topology that the Ceph CRUSH map expects. For example, if the nodes are labeled with rack topology labels, the OSDs will be constrained to a certain rack. Without the topology labels, Rook will not constrain the OSDs beyond what is required by the PVs, for example to run in the zone where provisioned. See the OSD Topology section for the related labels.

  • preparePlacement: The placement criteria for the preparation of the OSD devices. Creating OSDs is a two-step process and the prepare job may require different placement than the OSD daemons. If the preparePlacement is not specified, the placement will instead be applied for consistent placement for the OSD prepare jobs and OSD deployments. The preparePlacement is only useful for portable OSDs in the device sets. OSDs that are not portable will be tied to the host where the OSD prepare job initially runs.

    • For example, provisioning may require topology spread constraints across zones, but the OSD daemons may require constraints across hosts within the zones.
  • portable: If true, the OSDs will be allowed to move between nodes during failover. This requires a storage class that supports portability (e.g. aws-ebs, but not the local storage provisioner). If false, the OSDs will be assigned to a node permanently. Rook will configure Ceph's CRUSH map to support the portability.
  • tuneDeviceClass: For example, Ceph cannot detect AWS volumes as HDDs from the storage class "gp2-csi", so you can improve Ceph performance by setting this to true.
  • tuneFastDeviceClass: For example, Ceph cannot detect Azure disks as SSDs from the storage class "managed-premium", so you can improve Ceph performance by setting this to true..
  • volumeClaimTemplates: A list of PVC templates to use for provisioning the underlying storage devices.
    • metadata.name: "data", "metadata", or "wal". If a single template is provided, the name must be "data". If the name is "metadata" or "wal", the devices are used to store the Ceph metadata or WAL respectively. In both cases, the devices must be raw devices or LVM logical volumes.
      • resources.requests.storage: The desired capacity for the underlying storage devices.
      • storageClassName: The StorageClass to provision PVCs from. Default would be to use the cluster-default StorageClass.
      • volumeMode: The volume mode to be set for the PVC. Which should be Block
      • accessModes: The access mode for the PVC to be bound by OSD.
  • schedulerName: Scheduler name for OSD pod placement. (Optional)
  • encrypted: whether to encrypt all the OSDs in a given storageClassDeviceSet

See the table in OSD Configuration Settings to know the allowed configurations.

OSD Configuration Settings

The following storage selection settings are specific to Ceph and do not apply to other backends. All variables are key-value pairs represented as strings.

  • metadataDevice: Name of a device, partition or lvm to use for the metadata of OSDs on each node. Performance can be improved by using a low latency device (such as SSD or NVMe) as the metadata device, while other spinning platter (HDD) devices on a node are used to store data. Provisioning will fail if the user specifies a metadataDevice but that device is not used as a metadata device by Ceph. Notably, ceph-volume will not use a device of the same device class (HDD, SSD, NVMe) as OSD devices for metadata, resulting in this failure.
  • databaseSizeMB: The size in MB of a bluestore database. Include quotes around the size.
  • walSizeMB: The size in MB of a bluestore write ahead log (WAL). Include quotes around the size.
  • deviceClass: The CRUSH device class to use for this selection of storage devices. (By default, if a device's class has not already been set, OSDs will automatically set a device's class to either hdd, ssd, or nvme based on the hardware properties exposed by the Linux kernel.) These storage classes can then be used to select the devices backing a storage pool by specifying them as the value of the pool spec's deviceClass field. If updating the device class of an OSD after the OSD is already created, allowDeviceClassUpdate: true must be set. Otherwise updates to this deviceClass will be ignored.
  • initialWeight: The initial OSD weight in TiB units. By default, this value is derived from OSD's capacity.
  • primaryAffinity: The primary-affinity value of an OSD, within range [0, 1] (default: 1).
  • osdsPerDevice**: The number of OSDs to create on each device. High performance devices such as NVMe can handle running multiple OSDs. If desired, this can be overridden for each node and each device.
  • encryptedDevice**: Encrypt OSD volumes using dmcrypt ("true" or "false"). By default this option is disabled. See encryption for more information on encryption in Ceph. (Resizing is not supported for host-based clusters.)
  • crushRoot: The value of the root CRUSH map label. The default is default. Generally, you should not need to change this. However, if any of your topology labels may have the value default, you need to change crushRoot to avoid conflicts, since CRUSH map values need to be unique.
  • enableCrushUpdates: Enables rook to update the pool crush rule using Pool Spec. Can cause data remapping if crush rule changes, Defaults to false.

Allowed configurations are:

block device type host-based cluster PVC-based cluster
disk
part encryptedDevice must be false encrypted must be false
lvm metadataDevice must be "", osdsPerDevice must be 1, and encryptedDevice must be false metadata.name must not be metadata or wal and encrypted must be false
crypt
mpath

Limitations of metadata device

  • If metadataDevice is specified in the global OSD configuration or in the node level OSD configuration, the metadata device will be shared between all OSDs on the same node. In other words, OSDs will be initialized by lvm batch. In this case, we can't use partition device.
  • If metadataDevice is specified in the device local configuration, we can use partition as metadata device. In other words, OSDs are initialized by lvm prepare.

Annotations and Labels

Annotations and Labels can be specified so that the Rook components will have those annotations / labels added to them.

You can set annotations / labels for Rook components for the list of key value pairs:

  • all: Set annotations / labels for all components except clusterMetadata.
  • mgr: Set annotations / labels for MGRs
  • mon: Set annotations / labels for mons
  • osd: Set annotations / labels for OSDs
  • dashboard: Set annotations / labels for the dashboard service
  • prepareosd: Set annotations / labels for OSD Prepare Jobs
  • monitoring: Set annotations / labels for service monitor
  • crashcollector: Set annotations / labels for crash collectors
  • clusterMetadata: Set annotations only to rook-ceph-mon-endpoints configmap and the rook-ceph-mon and rook-ceph-admin-keyring secrets. These annotations will not be merged with the all annotations. The common usage is for backing up these critical resources with kubed. Note the clusterMetadata annotation will not be merged with the all annotation. When other keys are set, all will be merged together with the specific component.

Placement Configuration Settings

Placement configuration for the cluster services. It includes the following keys: mgr, mon, arbiter, osd, prepareosd, cleanup, and all. Each service will have its placement configuration generated by merging the generic configuration under all with the most specific one (which will override any attributes).

In stretch clusters, if the arbiter placement is specified, that placement will only be applied to the arbiter. Neither will the arbiter placement be merged with the all placement to allow the arbiter to be fully independent of other daemon placement. The remaining mons will still use the mon and/or all sections.

Note

Placement of OSD pods is controlled using the Storage Class Device Set, not the general placement configuration.

A Placement configuration is specified (according to the kubernetes PodSpec) as:

If you use labelSelector for osd pods, you must write two rules both for rook-ceph-osd and rook-ceph-osd-prepare like the example configuration. It comes from the design that there are these two pods for an OSD. For more detail, see the osd design doc and the related issue.

The Rook Ceph operator creates a Job called rook-ceph-detect-version to detect the full Ceph version used by the given cephVersion.image. The placement from the mon section is used for the Job except for the PodAntiAffinity field.

Placement Example

To control where various services will be scheduled by kubernetes, use the placement configuration sections below. The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node. Specific node affinity and tolerations that only apply to themondaemons in this example require the labelrole=storage-mon-node` and also tolerate the control plane taint.

 1
  2
  3
  4
diff --git a/docs/rook/v1.15/CRDs/Cluster/external-cluster/advance-external/index.html b/docs/rook/v1.15/CRDs/Cluster/external-cluster/advance-external/index.html
index fb7c15f0a..b3e24b4e9 100644
--- a/docs/rook/v1.15/CRDs/Cluster/external-cluster/advance-external/index.html
+++ b/docs/rook/v1.15/CRDs/Cluster/external-cluster/advance-external/index.html
@@ -2,13 +2,13 @@
 2
toolbox=$(kubectl get pod -l app=rook-ceph-tools -n rook-ceph -o jsonpath='{.items[*].metadata.name}')
 kubectl -n rook-ceph cp deploy/examples/external/create-external-cluster-resources.py $toolbox:/etc/ceph
 
  • Exec to the toolbox pod and execute create-external-cluster-resources.py with needed options to create required users and keys.

  • Important

    For other clusters to connect to storage in this cluster, Rook must be configured with a networking configuration that is accessible from other clusters. Most commonly this is done by enabling host networking in the CephCluster CR so the Ceph daemons will be addressable by their host IPs.

    Admin privileges

    If in case the cluster needs the admin keyring to configure, update the admin key rook-ceph-mon secret with client.admin keyring

    Note

    Sharing the admin key with the external cluster is not generally recommended

    1. Get the client.admin keyring from the ceph cluster

      ceph auth get client.admin
      -
    2. Update two values in the rook-ceph-mon secret:

      • ceph-username: Set to client.admin
      • ceph-secret: Set the client.admin keyring

    After restarting the rook operator (and the toolbox if in use), rook will configure ceph with admin privileges.

    Connect to an External Object Store

    Create the external object store CR to configure connection to external gateways.

    1
    +
  • Update two values in the rook-ceph-mon secret:

    • ceph-username: Set to client.admin
    • ceph-secret: Set the client.admin keyring
  • After restarting the rook operator (and the toolbox if in use), rook will configure ceph with admin privileges.

    Connect to an External Object Store

    Create the external object store CR to configure connection to external gateways.

    cd deploy/examples/external
     kubectl create -f object-external.yaml
    -

    Consume the S3 Storage, in two different ways:

    1. Create an Object store user for credentials to access the S3 endpoint.

      1
      +

      Consume the S3 Storage, in two different ways:

      1. Create an Object store user for credentials to access the S3 endpoint.

        cd deploy/examples
         kubectl create -f object-user.yaml
        -
      2. Create a bucket storage class where a client can request creating buckets and then create the Object Bucket Claim, which will create an individual bucket for reading and writing objects.

        1
        +
      3. Create a bucket storage class where a client can request creating buckets and then create the Object Bucket Claim, which will create an individual bucket for reading and writing objects.

        1
         2
         3
        cd deploy/examples/external
         kubectl create -f storageclass-bucket-delete.yaml
        diff --git a/docs/rook/v1.15/CRDs/Cluster/external-cluster/consumer-import/index.html b/docs/rook/v1.15/CRDs/Cluster/external-cluster/consumer-import/index.html
        index 94f111736..e0e11c316 100644
        --- a/docs/rook/v1.15/CRDs/Cluster/external-cluster/consumer-import/index.html
        +++ b/docs/rook/v1.15/CRDs/Cluster/external-cluster/consumer-import/index.html
        @@ -11,8 +11,8 @@
         helm install --create-namespace --namespace $clusterNamespace rook-ceph rook-release/rook-ceph -f values.yaml
         helm install --create-namespace --namespace $clusterNamespace rook-ceph-cluster \
         --set operatorNamespace=$operatorNamespace rook-release/rook-ceph-cluster -f values-external.yaml
        -

        Manifest Installation

        If not installing with Helm, here are the steps to install with manifests.

        1. Deploy Rook, create common.yaml, crds.yaml and operator.yaml manifests.

        2. Create common-external.yaml and cluster-external.yaml

        Import the Provider Data

        1. Paste the above output from create-external-cluster-resources.py into your current shell to allow importing the provider data.

        2. The import script in the next step uses the current kubeconfig context by default. If you want to specify the kubernetes cluster to use without changing the current context, you can specify the cluster name by setting the KUBECONTEXT environment variable.

          export KUBECONTEXT=<cluster-name>
          -
        3. Here is the link for import script. The script has used the rook-ceph namespace and few parameters that also have referenced from namespace variable. If user's external cluster has a different namespace, change the namespace parameter in the script according to their external cluster. For example with new-namespace namespace, this change is needed on the namespace parameter in the script.

          NAMESPACE=${NAMESPACE:="new-namespace"}
          +

          Manifest Installation

          If not installing with Helm, here are the steps to install with manifests.

          1. Deploy Rook, create common.yaml, crds.yaml and operator.yaml manifests.

          2. Create common-external.yaml and cluster-external.yaml

          Import the Provider Data

          1. Paste the above output from create-external-cluster-resources.py into your current shell to allow importing the provider data.

          2. The import script in the next step uses the current kubeconfig context by default. If you want to specify the kubernetes cluster to use without changing the current context, you can specify the cluster name by setting the KUBECONTEXT environment variable.

            export KUBECONTEXT=<cluster-name>
            +
          3. Here is the link for import script. The script has used the rook-ceph namespace and few parameters that also have referenced from namespace variable. If user's external cluster has a different namespace, change the namespace parameter in the script according to their external cluster. For example with new-namespace namespace, this change is needed on the namespace parameter in the script.

            NAMESPACE=${NAMESPACE:="new-namespace"}
             
          4. Run the import script.

            Note

            If your Rook cluster nodes are running a kernel earlier than or equivalent to 5.4, remove fast-diff, object-map, deep-flatten,exclusive-lock from the imageFeatures line.

            . import-external-cluster.sh
             

          Cluster Verification

          1. Verify the consumer cluster is connected to the provider ceph cluster:

            1
             2
            @@ -20,4 +20,4 @@
             NAME                 DATADIRHOSTPATH   MONCOUNT   AGE    STATE       HEALTH
             rook-ceph-external   /var/lib/rook                162m   Connected   HEALTH_OK
             
          2. Verify the creation of the storage class depending on the rbd pools and filesystem provided. ceph-rbd and cephfs would be the respective names for the RBD and CephFS storage classes.

            kubectl -n rook-ceph get sc
            -
          3. Create a persistent volume based on these StorageClass.

    \ No newline at end of file +
  • Create a persistent volume based on these StorageClass.

  • \ No newline at end of file diff --git a/docs/rook/v1.15/CRDs/Cluster/external-cluster/provider-export/index.html b/docs/rook/v1.15/CRDs/Cluster/external-cluster/provider-export/index.html index c33c770b9..023f46971 100644 --- a/docs/rook/v1.15/CRDs/Cluster/external-cluster/provider-export/index.html +++ b/docs/rook/v1.15/CRDs/Cluster/external-cluster/provider-export/index.html @@ -1,4 +1,4 @@ - Export config from the Ceph provider cluster - Rook Ceph Documentation
    Skip to content

    Export config from the Ceph provider cluster

    In order to configure an external Ceph cluster with Rook, we need to extract some information in order to connect to that cluster.

    1. Create all users and keys

    Run the python script create-external-cluster-resources.py for creating all users and keys.

    python3 create-external-cluster-resources.py --rbd-data-pool-name <pool_name> --cephfs-filesystem-name <filesystem-name> --rgw-endpoint  <rgw-endpoint> --namespace <namespace> --format bash
    + Export config from the Ceph provider cluster - Rook Ceph Documentation      

    Export config from the Ceph provider cluster

    In order to configure an external Ceph cluster with Rook, we need to extract some information in order to connect to that cluster.

    1. Create all users and keys

    Run the python script create-external-cluster-resources.py for creating all users and keys.

    python3 create-external-cluster-resources.py --rbd-data-pool-name <pool_name> --cephfs-filesystem-name <filesystem-name> --rgw-endpoint  <rgw-endpoint> --namespace <namespace> --format bash
     
    • --namespace: Namespace where CephCluster will run, for example rook-ceph
    • --format bash: The format of the output
    • --rbd-data-pool-name: The name of the RBD data pool
    • --alias-rbd-data-pool-name: Provides an alias for the RBD data pool name, necessary if a special character is present in the pool name such as a period or underscore
    • --rgw-endpoint: (optional) The RADOS Gateway endpoint in the format <IP>:<PORT> or <FQDN>:<PORT>.
    • --rgw-pool-prefix: (optional) The prefix of the RGW pools. If not specified, the default prefix is default
    • --rgw-tls-cert-path: (optional) RADOS Gateway endpoint TLS certificate (or intermediate signing certificate) file path
    • --rgw-skip-tls: (optional) Ignore TLS certification validation when a self-signed certificate is provided (NOT RECOMMENDED)
    • --rbd-metadata-ec-pool-name: (optional) Provides the name of erasure coded RBD metadata pool, used for creating ECRBDStorageClass.
    • --monitoring-endpoint: (optional) Ceph Manager prometheus exporter endpoints (comma separated list of IP entries of active and standby mgrs)
    • --monitoring-endpoint-port: (optional) Ceph Manager prometheus exporter port
    • --skip-monitoring-endpoint: (optional) Skip prometheus exporter endpoints, even if they are available. Useful if the prometheus module is not enabled
    • --ceph-conf: (optional) Provide a Ceph conf file
    • --keyring: (optional) Path to Ceph keyring file, to be used with --ceph-conf
    • --k8s-cluster-name: (optional) Kubernetes cluster name
    • --output: (optional) Output will be stored into the provided file
    • --dry-run: (optional) Prints the executed commands without running them
    • --run-as-user: (optional) Provides a user name to check the cluster's health status, must be prefixed by client.
    • --cephfs-metadata-pool-name: (optional) Provides the name of the cephfs metadata pool
    • --cephfs-filesystem-name: (optional) The name of the filesystem, used for creating CephFS StorageClass
    • --cephfs-data-pool-name: (optional) Provides the name of the CephFS data pool, used for creating CephFS StorageClass
    • --rados-namespace: (optional) Divides a pool into separate logical namespaces, used for creating RBD PVC in a CephBlockPoolRadosNamespace (should be lower case)
    • --subvolume-group: (optional) Provides the name of the subvolume group, used for creating CephFS PVC in a subvolumeGroup
    • --rgw-realm-name: (optional) Provides the name of the rgw-realm
    • --rgw-zone-name: (optional) Provides the name of the rgw-zone
    • --rgw-zonegroup-name: (optional) Provides the name of the rgw-zone-group
    • --upgrade: (optional) Upgrades the cephCSIKeyrings(For example: client.csi-cephfs-provisioner) and client.healthchecker ceph users with new permissions needed for the new cluster version and older permission will still be applied.
    • --restricted-auth-permission: (optional) Restrict cephCSIKeyrings auth permissions to specific pools, and cluster. Mandatory flags that need to be set are --rbd-data-pool-name, and --k8s-cluster-name. --cephfs-filesystem-name flag can also be passed in case of CephFS user restriction, so it can restrict users to particular CephFS filesystem.
    • --v2-port-enable: (optional) Enables the v2 mon port (3300) for mons.
    • --topology-pools: (optional) Comma-separated list of topology-constrained rbd pools
    • --topology-failure-domain-label: (optional) K8s cluster failure domain label (example: zone, rack, or host) for the topology-pools that match the ceph domain
    • --topology-failure-domain-values: (optional) Comma-separated list of the k8s cluster failure domain values corresponding to each of the pools in the topology-pools list
    • --config-file: Path to the configuration file, Priority: command-line-args > config.ini values > default values

    2. Copy the bash output

    Example Output:

     1
      2
      3
    diff --git a/docs/rook/v1.15/CRDs/Cluster/external-cluster/topology-for-external-mode/index.html b/docs/rook/v1.15/CRDs/Cluster/external-cluster/topology-for-external-mode/index.html
    index 2a1fe8458..e6a0fcec5 100644
    --- a/docs/rook/v1.15/CRDs/Cluster/external-cluster/topology-for-external-mode/index.html
    +++ b/docs/rook/v1.15/CRDs/Cluster/external-cluster/topology-for-external-mode/index.html
    @@ -106,4 +106,4 @@
     provisioner: rook-ceph.rbd.csi.ceph.com
     reclaimPolicy: Delete
     volumeBindingMode: WaitForFirstConsumer
    -

    Set two values in the rook-ceph-operator-config configmap:

    • CSI_ENABLE_TOPOLOGY: "true": Enable the feature
    • CSI_TOPOLOGY_DOMAIN_LABELS: "topology.kubernetes.io/zone": Set the topology domain labels that the CSI driver will analyze on the nodes during scheduling.

    Create a Topology-Based PVC

    The topology-based storage class is ready to be consumed! Create a PVC from the ceph-rbd-topology storage class above, and watch the OSD usage to see how the data is spread only among the topology-based CRUSH buckets.

    \ No newline at end of file +

    Set two values in the rook-ceph-operator-config configmap:

    • CSI_ENABLE_TOPOLOGY: "true": Enable the feature
    • CSI_TOPOLOGY_DOMAIN_LABELS: "topology.kubernetes.io/zone": Set the topology domain labels that the CSI driver will analyze on the nodes during scheduling.

    Create a Topology-Based PVC

    The topology-based storage class is ready to be consumed! Create a PVC from the ceph-rbd-topology storage class above, and watch the OSD usage to see how the data is spread only among the topology-based CRUSH buckets.

    \ No newline at end of file diff --git a/docs/rook/v1.15/CRDs/Cluster/network-providers/index.html b/docs/rook/v1.15/CRDs/Cluster/network-providers/index.html index 1cec3fdd0..a883a51cb 100644 --- a/docs/rook/v1.15/CRDs/Cluster/network-providers/index.html +++ b/docs/rook/v1.15/CRDs/Cluster/network-providers/index.html @@ -67,7 +67,7 @@

    Validating Multus configuration

    We highly recommend validating your Multus configuration before you install a CephCluster. A tool exists to facilitate validating the Multus configuration. After installing the Rook operator and before installing any Custom Resources, run the tool from the operator pod.

    The tool's CLI is designed to be as helpful as possible. Get help text for the multus validation tool like so:

    1. Exec into the Rook operator pod

      kubectl --namespace rook-ceph exec -it deploy/rook-ceph-operator -- bash
       
    2. Output and read the tool's help text

      rook multus validation run --help
       
    3. Use the validation tool config file for advanced configuration.

      rook multus validation config --help
      -

      Generate a sample config, that includes commented help text, using one of the available templates.

    4. Run the tool after configuring. If the tool fails, it will suggest what things may be preventing Multus networks from working properly, and it will request the logs and outputs that will help debug issues.

    Note

    The tool requires host network access. Many Kubernetes distros have security limitations. Use the tool's serviceAccountName config option or --service-account-name CLI flag to instruct the tool to run using a particular ServiceAccount in order to allow necessary permissions. An example compatible with openshift is provided in the Rook repository at deploy/examples/multus-validation-test-openshift.yaml

    Known limitations with Multus

    Daemons leveraging Kubernetes service IPs (Monitors, Managers, Rados Gateways) are not listening on the NAD specified in the selectors. Instead the daemon listens on the default network, however the NAD is attached to the container, allowing the daemon to communicate with the rest of the cluster. There is work in progress to fix this issue in the multus-service repository. At the time of writing it's unclear when this will be supported.

    Multus examples

    Macvlan, Whereabouts, Node Dynamic IPs

    The network plan for this cluster will be as follows:

    Node configuration must allow nodes to route to pods on the Multus public network.

    Because pods will be connecting via Macvlan, and because Macvlan does not allow hosts and pods to route between each other, the host must also be connected via Macvlan.

    Because the host IP range is different from the pod IP range, a route must be added to include the pod range.

    Such a configuration should be equivalent to the following:

    1
    +

    Generate a sample config, that includes commented help text, using one of the available templates.

  • Run the tool after configuring. If the tool fails, it will suggest what things may be preventing Multus networks from working properly, and it will request the logs and outputs that will help debug issues.

  • Note

    The tool requires host network access. Many Kubernetes distros have security limitations. Use the tool's serviceAccountName config option or --service-account-name CLI flag to instruct the tool to run using a particular ServiceAccount in order to allow necessary permissions. An example compatible with openshift is provided in the Rook repository at deploy/examples/multus-validation-test-openshift.yaml

    Known limitations with Multus

    Daemons leveraging Kubernetes service IPs (Monitors, Managers, Rados Gateways) are not listening on the NAD specified in the selectors. Instead the daemon listens on the default network, however the NAD is attached to the container, allowing the daemon to communicate with the rest of the cluster. There is work in progress to fix this issue in the multus-service repository. At the time of writing it's unclear when this will be supported.

    Multus examples

    Macvlan, Whereabouts, Node Dynamic IPs

    The network plan for this cluster will be as follows:

    Node configuration must allow nodes to route to pods on the Multus public network.

    Because pods will be connecting via Macvlan, and because Macvlan does not allow hosts and pods to route between each other, the host must also be connected via Macvlan.

    Because the host IP range is different from the pod IP range, a route must be added to include the pod range.

    Such a configuration should be equivalent to the following:

    1
     2
     3
     4
    ip link add public-shim link eth0 type macvlan mode bridge
    diff --git a/docs/rook/v1.15/CRDs/Cluster/stretch-cluster/index.html b/docs/rook/v1.15/CRDs/Cluster/stretch-cluster/index.html
    index 4cc3c46df..2824feddd 100644
    --- a/docs/rook/v1.15/CRDs/Cluster/stretch-cluster/index.html
    +++ b/docs/rook/v1.15/CRDs/Cluster/stretch-cluster/index.html
    @@ -77,4 +77,4 @@
                   values:
                   - b
                   - c
    -

    For more details, see the Stretch Cluster design doc.

    \ No newline at end of file +

    For more details, see the Stretch Cluster design doc.

    \ No newline at end of file diff --git a/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-filesystem-crd/index.html b/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-filesystem-crd/index.html index 8ce08be72..7558b02fb 100644 --- a/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-filesystem-crd/index.html +++ b/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-filesystem-crd/index.html @@ -83,7 +83,7 @@ # requests: # cpu: "500m" # memory: "1024Mi" -

    (These definitions can also be found in the filesystem.yaml file)

    Erasure Coded

    Erasure coded pools require the OSDs to use bluestore for the configured storeType. Additionally, erasure coded pools can only be used with dataPools. The metadataPool must use a replicated pool.

    Note

    This sample requires at least 3 bluestore OSDs, with each OSD located on a different node.

    The OSDs must be located on different nodes, because the failureDomain will be set to host by default, and the erasureCoded chunk settings require at least 3 different OSDs (2 dataChunks + 1 codingChunks).

     1
    +

    (These definitions can also be found in the filesystem.yaml file)

    Erasure Coded

    Erasure coded pools require the OSDs to use bluestore for the configured storeType. Additionally, erasure coded pools can only be used with dataPools. The metadataPool must use a replicated pool.

    Note

    This sample requires at least 3 bluestore OSDs, with each OSD located on a different node.

    The OSDs must be located on different nodes, because the failureDomain will be set to host by default, and the erasureCoded chunk settings require at least 3 different OSDs (2 dataChunks + 1 codingChunks).

     1
      2
      3
      4
    @@ -122,4 +122,4 @@
       metadataServer:
         activeCount: 1
         activeStandby: true
    -

    IMPORTANT: For erasure coded pools, we have to create a replicated pool as the default data pool and an erasure-coded pool as a secondary pool.

    (These definitions can also be found in the filesystem-ec.yaml file. Also see an example in the storageclass-ec.yaml for how to configure the volume.)

    Filesystem Settings

    Metadata

    Pools

    The pools allow all of the settings defined in the Pool CRD spec. For more details, see the Pool CRD settings. In the example above, there must be at least three hosts (size 3) and at least eight devices (6 data + 2 coding chunks) in the cluster.

    Metadata Server Settings

    The metadata server settings correspond to the MDS daemon settings.

    MDS Resources Configuration Settings

    The format of the resource requests/limits structure is the same as described in the Ceph Cluster CRD documentation.

    If the memory resource limit is declared Rook will automatically set the MDS configuration mds_cache_memory_limit. The configuration value is calculated with the aim that the actual MDS memory consumption remains consistent with the MDS pods' resource declaration.

    In order to provide the best possible experience running Ceph in containers, Rook internally recommends the memory for MDS daemons to be at least 4096MB. If a user configures a limit or request value that is too low, Rook will still run the pod(s) and print a warning to the operator log.

    \ No newline at end of file +

    IMPORTANT: For erasure coded pools, we have to create a replicated pool as the default data pool and an erasure-coded pool as a secondary pool.

    (These definitions can also be found in the filesystem-ec.yaml file. Also see an example in the storageclass-ec.yaml for how to configure the volume.)

    Filesystem Settings

    Metadata

    Pools

    The pools allow all of the settings defined in the Pool CRD spec. For more details, see the Pool CRD settings. In the example above, there must be at least three hosts (size 3) and at least eight devices (6 data + 2 coding chunks) in the cluster.

    Metadata Server Settings

    The metadata server settings correspond to the MDS daemon settings.

    MDS Resources Configuration Settings

    The format of the resource requests/limits structure is the same as described in the Ceph Cluster CRD documentation.

    If the memory resource limit is declared Rook will automatically set the MDS configuration mds_cache_memory_limit. The configuration value is calculated with the aim that the actual MDS memory consumption remains consistent with the MDS pods' resource declaration.

    In order to provide the best possible experience running Ceph in containers, Rook internally recommends the memory for MDS daemons to be at least 4096MB. If a user configures a limit or request value that is too low, Rook will still run the pod(s) and print a warning to the operator log.

    \ No newline at end of file diff --git a/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-fs-mirror-crd/index.html b/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-fs-mirror-crd/index.html index 32170abb8..a4d259b00 100644 --- a/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-fs-mirror-crd/index.html +++ b/docs/rook/v1.15/CRDs/Shared-Filesystem/ceph-fs-mirror-crd/index.html @@ -9,4 +9,4 @@ name: my-fs-mirror namespace: rook-ceph spec: {} -

    Settings

    If any setting is unspecified, a suitable default will be used automatically.

    FilesystemMirror metadata

    FilesystemMirror Settings

    Configuring mirroring peers

    In order to configure mirroring peers, please refer to the CephFilesystem documentation.

    \ No newline at end of file +

    Settings

    If any setting is unspecified, a suitable default will be used automatically.

    FilesystemMirror metadata

    FilesystemMirror Settings

    Configuring mirroring peers

    In order to configure mirroring peers, please refer to the CephFilesystem documentation.

    \ No newline at end of file diff --git a/docs/rook/v1.15/CRDs/ceph-client-crd/index.html b/docs/rook/v1.15/CRDs/ceph-client-crd/index.html index 704363949..444d5d0f0 100644 --- a/docs/rook/v1.15/CRDs/ceph-client-crd/index.html +++ b/docs/rook/v1.15/CRDs/ceph-client-crd/index.html @@ -44,7 +44,7 @@ export CEPH_KEYRING=/libsqliteceph/ceph.keyring; export CEPH_ARGS=--id example; ceph status -

    With this config, the ceph tools (ceph CLI, in-program access, etc) can connect to and utilize the Ceph cluster.

    Use Case: SQLite

    The Ceph project contains a SQLite VFS that interacts with RADOS directly, called libcephsqlite.

    First, on your workload ensure that you have the appropriate packages installed that make libcephsqlite.so available:

    Without the appropriate package (or a from-scratch build of SQLite), you will be unable to load libcephsqlite.so.

    After creating a CephClient similar to deploy/examples/sqlitevfs-client.yaml and retrieving it's credentials, you may set the following ENV variables:

    1
    +

    With this config, the ceph tools (ceph CLI, in-program access, etc) can connect to and utilize the Ceph cluster.

    Use Case: SQLite

    The Ceph project contains a SQLite VFS that interacts with RADOS directly, called libcephsqlite.

    First, on your workload ensure that you have the appropriate packages installed that make libcephsqlite.so available:

    Without the appropriate package (or a from-scratch build of SQLite), you will be unable to load libcephsqlite.so.

    After creating a CephClient similar to deploy/examples/sqlitevfs-client.yaml and retrieving it's credentials, you may set the following ENV variables:

    1
     2
     3
    export CEPH_CONF=/libsqliteceph/ceph.conf;
     export CEPH_KEYRING=/libsqliteceph/ceph.keyring;
    diff --git a/docs/rook/v1.15/CRDs/ceph-nfs-crd/index.html b/docs/rook/v1.15/CRDs/ceph-nfs-crd/index.html
    index a93fad7e6..ec852ab70 100644
    --- a/docs/rook/v1.15/CRDs/ceph-nfs-crd/index.html
    +++ b/docs/rook/v1.15/CRDs/ceph-nfs-crd/index.html
    @@ -141,7 +141,7 @@
             debugLevel: 0
     
             resources: {}
    -

    NFS Settings

    Server

    The server spec sets configuration for Rook-created NFS-Ganesha server pods.

    Security

    The security spec sets security configuration for the NFS cluster.

    Scaling the active server count

    It is possible to scale the size of the cluster up or down by modifying the spec.server.active field. Scaling the cluster size up can be done at will. Once the new server comes up, clients can be assigned to it immediately.

    The CRD always eliminates the highest index servers first, in reverse order from how they were started. Scaling down the cluster requires that clients be migrated from servers that will be eliminated to others. That process is currently a manual one and should be performed before reducing the size of the cluster.

    Warning

    See the known issue below about setting this value greater than one.

    Known issues

    server.active count greater than 1

    Ceph v17.2.1