Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per cluster member resource reservations #14247

Closed
wants to merge 32 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
5329730
lxd/cluster: Document/validate `limits.reserve.{cpu,memory}`
MggMuggins Sep 25, 2024
0dc3fd8
doc: Run `make update-metadata`
MggMuggins Oct 11, 2024
caf4602
lxd/project/limits: Remove unused tx param
MggMuggins Sep 27, 2024
80e8817
lxd/project/limits: Remove unused tx param
MggMuggins Sep 27, 2024
ef1d388
lxd/cluster: Split LocalSysInfo from MemberState
MggMuggins Sep 27, 2024
553877a
lxd/cluster: Remove unused memberName parameter
MggMuggins Sep 30, 2024
29a6ecf
lxd: Update MemberState usage
MggMuggins Sep 30, 2024
9148443
shared/api: Add CPUThreads to ClusterMemberSysInfo
MggMuggins Sep 27, 2024
10b514e
lxd/cluster: Populate CPUThreads
MggMuggins Sep 27, 2024
5a800ac
doc: Run `make update-api`
MggMuggins Oct 11, 2024
b416d85
lxd/db: Implement rudimentary query builder
MggMuggins Oct 7, 2024
01c6c55
lxd/db/node: Implement GetMemberConfigWithGlobalDefault
MggMuggins Oct 3, 2024
713f773
lxd/db/node: Test GetMemberConfigWithGlobalDefault
MggMuggins Oct 3, 2024
1c9037f
WIP getClusterMemberAggregateLimits
MggMuggins Oct 16, 2024
a9236ed
lxd/project/limits: Implement CheckReservationsWithInstance
MggMuggins Sep 27, 2024
455b231
lxd/project/limits: Check resource reservations during AllowInstanceC…
MggMuggins Sep 30, 2024
7dfea32
lxd/project/limits: Update AllowInstanceCreation tests
MggMuggins Oct 3, 2024
2ec28ac
lxd: Pass server name & sysinfo to AllowInstanceCreation
MggMuggins Sep 30, 2024
bcb5ff9
lxd: Run AllowInstanceCreation on target member
MggMuggins Oct 11, 2024
1b7029f
lxd/project/limits: Check reservations during AllowInstanceUpdate
MggMuggins Sep 30, 2024
bff566a
lxd: Update AllowInstanceUpdate usage
MggMuggins Sep 30, 2024
376a2d1
lxd/project/limits: Implement AllowClusterMemberUpdate
MggMuggins Oct 1, 2024
19a7b7a
lxd: Use AllowClusterMemberUpdate
MggMuggins Oct 1, 2024
51b84f6
lxd/project/limits: Implement AllowClusterUpdate
MggMuggins Oct 3, 2024
31681ff
lxd/cluster: Implement ClusterSysInfo
MggMuggins Oct 9, 2024
4f2c7aa
lxd: Remove uneeded iteration
MggMuggins Oct 3, 2024
85354f4
lxd: Use AllowClusterUpdate
MggMuggins Oct 7, 2024
4fa4bd1
lxd: Check reservations during instance placement
MggMuggins Oct 10, 2024
592d889
test/clustering: Resource reservation tests
MggMuggins Sep 30, 2024
b2d9555
lxd/db/node: Fix linter errors
MggMuggins Oct 9, 2024
a420e8b
WIP Patch instances during getClusterMemberAggregateLimits
MggMuggins Oct 17, 2024
6b68db7
WIP Use getPatchedClusterMemberAggregateLimits
MggMuggins Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions doc/metadata.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
// Code generated by lxd-metadata; DO NOT EDIT.

<!-- config group cluster-cluster start -->
```{config:option} limits.reserve.cpu cluster-cluster
:shortdesc: "Number of CPUs to reserve for the LXD server"
:type: "integer"
Number of CPUs to reserve for the LXD server. This setting only limits
the sum of instance `limits.cpu` that can be located on a cluster member.

When this key is set, all instances on that member must have
`limits.cpu` set.
```

```{config:option} limits.reserve.memory cluster-cluster
:shortdesc: "Amount of memory to reserve for the LXD server"
:type: "string"
Amount of memory to reserve for the LXD server. This setting only limits
the sum of instance `limits.memory` that can be located on a cluster member.

When this key is set, all instances on that member must have
`limits.memory` set.
```

```{config:option} scheduler.instance cluster-cluster
:defaultdesc: "`all`"
:shortdesc: "Controls how instances are scheduled to run on this member"
Expand Down Expand Up @@ -4623,6 +4643,30 @@ When using custom automatic instance placement logic, this option stores the scr
See {ref}`clustering-instance-placement-scriptlet` for more information.
```

```{config:option} limits.reserve.cpu server-miscellaneous
:scope: "global"
:shortdesc: "Number CPUs to reserve for the LXD server"
:type: "integer"
Number of CPUs to reserve for the LXD server. This setting only limits
the sum of instance `limits.cpu` that can be located on any cluster member.

This value is overridden by the corresponding cluster member configuration key.

When this key is set, all instances must have `limits.cpu` set.
```

```{config:option} limits.reserve.memory server-miscellaneous
:scope: "global"
:shortdesc: "Amount of memory to reserve for the LXD server"
:type: "string"
Amount of memory to reserve for the LXD server. This setting only limits
the sum of instance `limits.memory` that can be located on any cluster member.

This value is overridden by the corresponding cluster member configuration key.

When this key is set, all instances must have `limits.memory` set.
```

```{config:option} maas.api.key server-miscellaneous
:scope: "global"
:shortdesc: "API key to manage MAAS"
Expand Down
4 changes: 4 additions & 0 deletions doc/rest-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,10 @@ definitions:
format: uint64
type: integer
x-go-name: BufferRAM
cpu_threads:
format: uint64
type: integer
x-go-name: CPUThreads
free_ram:
format: uint64
type: integer
Expand Down
54 changes: 51 additions & 3 deletions lxd/api_1.0.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"net/http"
"os"
"slices"
"strings"

"github.com/canonical/lxd/client"
"github.com/canonical/lxd/lxd/auth"
Expand All @@ -21,6 +22,7 @@ import (
"github.com/canonical/lxd/lxd/instance/instancetype"
"github.com/canonical/lxd/lxd/lifecycle"
"github.com/canonical/lxd/lxd/node"
"github.com/canonical/lxd/lxd/project/limits"
"github.com/canonical/lxd/lxd/request"
"github.com/canonical/lxd/lxd/response"
scriptletLoad "github.com/canonical/lxd/lxd/scriptlet/load"
Expand Down Expand Up @@ -699,7 +701,38 @@ func doAPI10Update(d *Daemon, r *http.Request, req api.ServerPut, patch bool) re
// Then deal with cluster wide configuration
var clusterChanged map[string]string
var newClusterConfig *clusterConfig.Config
oldClusterConfig := make(map[string]any)
var oldClusterConfig map[string]any

// If req.Config doesn't contain a limits.reserve.* then we don't need to
// call AllowClusterUpdate at all; Removing a global limits.reserve.*
// doesn't change the effective limit on any cluster member.
changesReservation := false
for key := range req.Config {
if strings.Contains(key, "limits.reserve") {
changesReservation = true
break
}
}

var sysinfo map[string]api.ClusterMemberSysInfo
if changesReservation {
var members []db.NodeInfo

err = s.DB.Cluster.Transaction(r.Context(), func(ctx context.Context, tx *db.ClusterTx) error {
members, err = tx.GetNodes(ctx)
return err
})
if err != nil {
return response.SmartError(err)
}

sysinfo, err = cluster.ClusterSysInfo(s, members)
if err == cluster.ErrorClusterUnavailable {
return response.Unavailable(fmt.Errorf("Cannot set limits.reserve.{cpu,memory} when cluster members are unreachble"))
} else if err != nil {
return response.SmartError(err)
}
}

err = s.DB.Cluster.Transaction(context.Background(), func(ctx context.Context, tx *db.ClusterTx) error {
var err error
Expand All @@ -709,8 +742,23 @@ func doAPI10Update(d *Daemon, r *http.Request, req api.ServerPut, patch bool) re
}

// Keep old config around in case something goes wrong. In that case the config will be reverted.
for k, v := range newClusterConfig.Dump() {
oldClusterConfig[k] = v
oldClusterConfig = newClusterConfig.Dump()

if changesReservation {
// TODO Don't do the patch logic twice :facepalm:
validationConfig := newClusterConfig.Dump()
if patch {
for key, val := range req.Config {
validationConfig[key] = val
}
} else {
validationConfig = req.Config
}

err = limits.AllowClusterUpdate(ctx, tx, sysinfo, validationConfig)
if err != nil {
return err
}
}

if patch {
Expand Down
45 changes: 44 additions & 1 deletion lxd/api_cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import (
"github.com/canonical/lxd/lxd/lifecycle"
"github.com/canonical/lxd/lxd/node"
"github.com/canonical/lxd/lxd/operations"
"github.com/canonical/lxd/lxd/project/limits"
"github.com/canonical/lxd/lxd/request"
"github.com/canonical/lxd/lxd/response"
"github.com/canonical/lxd/lxd/scriptlet"
Expand Down Expand Up @@ -1621,6 +1622,11 @@ func updateClusterNode(s *state.State, gateway *cluster.Gateway, r *http.Request
return response.SmartError(err)
}

resp := forwardedResponseToNode(s, r, name)
if resp != nil {
return resp
}

leaderAddress, err := gateway.LeaderAddress()
if err != nil {
return response.InternalError(err)
Expand Down Expand Up @@ -1715,6 +1721,16 @@ func updateClusterNode(s *state.State, gateway *cluster.Gateway, r *http.Request
newRoles = append(newRoles, db.ClusterRole(role))
}

sysinfo, err := cluster.LocalSysInfo()
if err != nil {
return response.InternalError(err)
}

var globalConfigDump map[string]any
if s.GlobalConfig != nil {
globalConfigDump = s.GlobalConfig.Dump()
}

// Update the database
err = s.DB.Cluster.Transaction(context.TODO(), func(ctx context.Context, tx *db.ClusterTx) error {
nodeInfo, err := tx.GetNodeByName(ctx, name)
Expand All @@ -1741,6 +1757,11 @@ func updateClusterNode(s *state.State, gateway *cluster.Gateway, r *http.Request
}
}

err = limits.AllowClusterMemberUpdate(ctx, tx, globalConfigDump, name, sysinfo, req.Config)
if err != nil {
return fmt.Errorf("Permission denied: %w", err)
}

// Update node config.
err = tx.UpdateNodeConfig(ctx, nodeInfo.ID, req.Config)
if err != nil {
Expand Down Expand Up @@ -1829,6 +1850,28 @@ func clusterValidateConfig(config map[string]string) error {
// defaultdesc: `all`
// shortdesc: Controls how instances are scheduled to run on this member
"scheduler.instance": validate.Optional(validate.IsOneOf("all", "group", "manual")),

// lxdmeta:generate(entities=cluster; group=cluster; key=limits.reserve.cpu)
// Number of CPUs to reserve for the LXD server. This setting only limits
// the sum of instance `limits.cpu` that can be located on a cluster member.
//
// When this key is set, all instances on that member must have
// `limits.cpu` set.
// ---
// type: integer
// shortdesc: Number of CPUs to reserve for the LXD server
"limits.reserve.cpu": validate.IsAny,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this be an optional integer?


// lxdmeta:generate(entities=cluster; group=cluster; key=limits.reserve.memory)
// Amount of memory to reserve for the LXD server. This setting only limits
// the sum of instance `limits.memory` that can be located on a cluster member.
//
// When this key is set, all instances on that member must have
// `limits.memory` set.
// ---
// type: string
// shortdesc: Amount of memory to reserve for the LXD server
"limits.reserve.memory": validate.IsAny,
}

for k, v := range config {
Expand Down Expand Up @@ -2961,7 +3004,7 @@ func clusterNodeStateGet(d *Daemon, r *http.Request) response.Response {
return resp
}

memberState, err := cluster.MemberState(r.Context(), s, memberName)
memberState, err := cluster.MemberState(r.Context(), s)
if err != nil {
return response.SmartError(err)
}
Expand Down
26 changes: 26 additions & 0 deletions lxd/cluster/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -734,6 +734,32 @@ var ConfigSchema = config.Schema{
// defaultdesc: Content of `/etc/ovn/key_host` if present
// shortdesc: OVN SSL client key
"network.ovn.client_key": {Default: ""},

// lxdmeta:generate(entities=server; group=miscellaneous; key=limits.reserve.cpu)
// Number of CPUs to reserve for the LXD server. This setting only limits
// the sum of instance `limits.cpu` that can be located on any cluster member.
//
// This value is overridden by the corresponding cluster member configuration key.
//
// When this key is set, all instances must have `limits.cpu` set.
// ---
// type: integer
// scope: global
// shortdesc: Number CPUs to reserve for the LXD server
"limits.reserve.cpu": {Type: config.Int64},

// lxdmeta:generate(entities=server; group=miscellaneous; key=limits.reserve.memory)
// Amount of memory to reserve for the LXD server. This setting only limits
// the sum of instance `limits.memory` that can be located on any cluster member.
//
// This value is overridden by the corresponding cluster member configuration key.
//
// When this key is set, all instances must have `limits.memory` set.
// ---
// type: string
// scope: global
// shortdesc: Amount of memory to reserve for the LXD server
"limits.reserve.memory": {Type: config.String},
}

func expiryValidator(value string) error {
Expand Down
Loading
Loading