-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow user to specify order of network interfaces inside application #74
Conversation
Why changes in appconfig? Ordering is for physical devices on am I wrong? |
Ordering is for application interfaces (i.e. inside the app, not in EVE/host) - this includes the virtual (virtio) interfaces and directly assigned network devices. |
@rucoder do I remember it correctly that there're some gotchas in enforcing ordering of PCI devices, can you confirm or deny that? :D |
I'm hoping that this is as simple as setting the Addr to respect the configured interface order. |
But how do you link those? Maybe I'm lacking a bit of understanding: I have 3 NICs which are in PCI slot 1, 2, 3 I want them to be eth1, eth2 and eth3 respectively on host and then in EdgeApp I want eth1 from host to be eth2 inside my app 1 and in app 2 I want eth2 and eth3 to be eth1 and eth2 inside the app 2. Is it what you're trying to achieve in a nutshell (I'm not taking yet into consideration how we do network plumbing in EVE) |
Yes, but the real complication is that we need to allow specifying order across both virtual and directly assigned interfaces. So for example, you may have config where (inside app) eth1 is a virtual interface, eth2 is directly assigned, eth3 is another virtual interface, eth3 is another directly assigned NIC, etc. So somehow we need to be able to interleave virtio with passthrough devices. |
When we do passthrough of the device we do it on PCI level, so in guest system I see PCI 0000:00:01 or whichever and when I pass virtio device I create a tap interface vie qemu on guest side. Can you specify in passthrough address to be assigned in guest system? |
I mean, this is what you describe, right?
|
Isn't it this one https://github.com/lf-edge/eve/blob/master/pkg/pillar/hypervisor/kvm.go#L383 ? |
Okay, seems like the one, so basically you need to have a universal map which tracks those convertions and be sure that one resource is not used twice. End point would be qemu/xen configuraiton. I mean that's the question, can we do it on xen as well. |
from https://github.com/lf-edge/eve-api/blob/main/proto/config/appconfig.proto#L126 Isn't the order already determined (at least from the point of the API)? |
but holdup this thing is specifying PCI address, because guest OS does all the naming, it might or might not be based on PCI order, we can't just say to guest OS from HV perspective that I want this interface to be 1, 3, 17, johnDoe, because we Windows and Ubuntu are different, right? Yes, it's determined from API point, but we need to do chekcs on EVE IMO, controller might not have checks, would be useful to have them |
Yes, " it might or might not be based on PCI order", it might also be the order it is in the configuration file (see lf-edge/eve#3369 (comment) ). But in the end EVE can only give a hint to the guest OS and hope for the best. Also this is not eve-api repo specific but rather for the PR into eve, is it? |
Sure, but without understanding the plan of implementation, how can you be sure of API? :) |
I would go even further and say without the implementation I cannot be sure of the API ;-) |
@milan-zededa can't we just use a convention for Logical labels and enforce them to have a continuous numbering. It is probably may break current application but IMO this is the most strate forward and non-confusing way to enforce numbering in the application |
Regarding PCI enumeration. IN THEORY devices are enumerated in the order they appear on the PCI bus however the driver can assign a device ID according to it's internal logic e.g. looking at MAC address etc. I do no think this is possible to reliably enforce PCI devices to be enumerated in required order but this is a logical assumption in case we have a flat PCI topology i.e no bridges/switches. In case of bridges we cannot guaranty the order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what we can achieve with the current approach, as I see it.
First of all, we can guarantee any order only in the case of a Linux VM using systemd (https://www.freedesktop.org/software/systemd/man/latest/systemd.net-naming-scheme.html). We don't know how another guest handles the enumeration. It should be explicitly stated.
Then, we can guarantee orders only of devices of the same types; otherwise, the interface prefixes and interface naming scheme will be different, and we cannot talk about an order of devices of different types.
Then, let's consider the following setup:
Devices A and B: Each is directly attached and has a dedicated PCIe root port.
Devices C and D: Both are under a bridge (multifunction device) with no root port.
Now let's check if we can guarantee an order for different cases:
Order 1: A, B, C, D
# QEMU configuration for Order 1: A, B, C, D
# Device A's Root Port
[device "pcie_root_port_a"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x1"
multifunction = "on"
port = "1"
chassis = "1"
slot = "1"
id = "root_port_a"
# Device A
[device "device_a"]
driver = "<driver_for_A>"
bus = "root_port_a"
addr = "0x0"
# Device B's Root Port
[device "pcie_root_port_b"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x2"
port = "2"
chassis = "2"
slot = "2"
id = "root_port_b"
# Device B
[device "device_b"]
driver = "<driver_for_B>"
bus = "root_port_b"
addr = "0x0"
# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
driver = "pcie-pci-bridge"
bus = "pcie.0"
addr = "0x3"
chassis = "3"
id = "bridge_cd"
# Device C
[device "device_c"]
driver = "<driver_for_C>"
bus = "bridge_cd"
addr = "0x1"
# Device D
[device "device_d"]
driver = "<driver_for_D>"
bus = "bridge_cd"
addr = "0x2"
and I will look like:
pcie.0 (Bus 0)
├── [0x1] Root Port A (root_port_a)
│ └── Bus 1
│ └── [0x0] Device A (device_a)
├── [0x2] Root Port B (root_port_b)
│ └── Bus 2
│ └── [0x0] Device B (device_b)
└── [0x3] PCIe to PCI Bridge (bridge_cd)
└── Bus 3
├── [0x1] Device C (device_c)
└── [0x2] Device D (device_d)
In this configuration, devices A and B are connected to the root bus (pcie.0
) via dedicated PCIe root ports, with device numbers assigned to ensure they are enumerated first. The root port for device A is assigned addr = "0x1"
, and the root port for device B is assigned addr = "0x2"
. The PCIe to PCI bridge for devices C and D is connected to the root bus with addr = "0x3"
, ensuring it is enumerated after the root ports.
Order 2: B, A, C, D
# QEMU configuration for Order 2: B, A, C, D
# Device B's Root Port
[device "pcie_root_port_b"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x1"
multifunction = "on"
port = "1"
chassis = "1"
slot = "1"
id = "root_port_b"
# Device B
[device "device_b"]
driver = "<driver_for_B>"
bus = "root_port_b"
addr = "0x0"
# Device A's Root Port
[device "pcie_root_port_a"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x2"
port = "2"
chassis = "2"
slot = "2"
id = "root_port_a"
# Device A
[device "device_a"]
driver = "<driver_for_A>"
bus = "root_port_a"
addr = "0x0"
# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
driver = "pcie-pci-bridge"
bus = "pcie.0"
addr = "0x3"
chassis = "3"
id = "bridge_cd"
# Device C
[device "device_c"]
driver = "<driver_for_C>"
bus = "bridge_cd"
addr = "0x1"
# Device D
[device "device_d"]
driver = "<driver_for_D>"
bus = "bridge_cd"
addr = "0x2"
which brings us to
pcie.0 (Bus 0)
├── [0x1] Root Port B (root_port_b)
│ └── Bus 1
│ └── [0x0] Device B (device_b)
├── [0x2] Root Port A (root_port_a)
│ └── Bus 2
│ └── [0x0] Device A (device_a)
└── [0x3] PCIe to PCI Bridge (bridge_cd)
└── Bus 3
├── [0x1] Device C (device_c)
└── [0x2] Device D (device_d)
In this configuration, we swap the device numbers of the root ports for devices A and B on the root bus. The root port for device B is assigned addr = "0x1"
, and the root port for device A is assigned addr = "0x2"
. This ensures that the root port for device B is enumerated before the root port for device A. The bridge for devices C and D remains at addr = "0x3"
.
Order 3: C, D, A, B
# QEMU configuration for Order 3: C, D, A, B
# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
driver = "pcie-pci-bridge"
bus = "pcie.0"
addr = "0x1"
chassis = "1"
id = "bridge_cd"
# Device C
[device "device_c"]
driver = "<driver_for_C>"
bus = "bridge_cd"
addr = "0x1"
# Device D
[device "device_d"]
driver = "<driver_for_D>"
bus = "bridge_cd"
addr = "0x2"
# Device A's Root Port
[device "pcie_root_port_a"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x2"
port = "2"
chassis = "2"
slot = "2"
id = "root_port_a"
# Device A
[device "device_a"]
driver = "<driver_for_A>"
bus = "root_port_a"
addr = "0x0"
# Device B's Root Port
[device "pcie_root_port_b"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x3"
port = "3"
chassis = "3"
slot = "3"
id = "root_port_b"
# Device B
[device "device_b"]
driver = "<driver_for_B>"
bus = "root_port_b"
addr = "0x0"
gives us
pcie.0 (Bus 0)
├── [0x1] PCIe to PCI Bridge (bridge_cd)
│ └── Bus 1
│ ├── [0x1] Device C (device_c)
│ └── [0x2] Device D (device_d)
├── [0x2] Root Port A (root_port_a)
│ └── Bus 2
│ └── [0x0] Device A (device_a)
└── [0x3] Root Port B (root_port_b)
└── Bus 3
└── [0x0] Device B (device_b)
In this configuration, we assign the bridge for devices C and D the lowest device number on the root bus (addr = "0x1"
), ensuring it is enumerated first. The root ports for devices A and B are assigned higher device numbers (addr = "0x2"
and addr = "0x3"
respectively).
Order 4: D, C, A, B
# QEMU configuration for Order 4: D, C, A, B
# PCIe to PCI Bridge for devices C and D
[device "pcie_pci_bridge"]
driver = "pcie-pci-bridge"
bus = "pcie.0"
addr = "0x1"
chassis = "1"
id = "bridge_cd"
# Device D
[device "device_d"]
driver = "<driver_for_D>"
bus = "bridge_cd"
addr = "0x1"
# Device C
[device "device_c"]
driver = "<driver_for_C>"
bus = "bridge_cd"
addr = "0x2"
# Device A's Root Port
[device "pcie_root_port_a"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x2"
port = "2"
chassis = "2"
slot = "2"
id = "root_port_a"
# Device A
[device "device_a"]
driver = "<driver_for_A>"
bus = "root_port_a"
addr = "0x0"
# Device B's Root Port
[device "pcie_root_port_b"]
driver = "pcie-root-port"
bus = "pcie.0"
addr = "0x3"
port = "3"
chassis = "3"
slot = "3"
id = "root_port_b"
# Device B
[device "device_b"]
driver = "<driver_for_B>"
bus = "root_port_b"
addr = "0x0"
results in this
pcie.0 (Bus 0)
├── [0x1] PCIe to PCI Bridge (bridge_cd)
│ └── Bus 1
│ ├── [0x1] Device D (device_d)
│ └── [0x2] Device C (device_c)
├── [0x2] Root Port A (root_port_a)
│ └── Bus 2
│ └── [0x0] Device A (device_a)
└── [0x3] Root Port B (root_port_b)
└── Bus 3
└── [0x0] Device B (device_b)
In this configuration, we aim to have device D enumerated before device C. We achieve this by swapping their device numbers on the subordinate bus of the bridge (bridge_cd
). Device D is assigned addr = "0x1"
, and device C is assigned addr = "0x2"
. The bridge itself is assigned the lowest device number on the root bus (addr = "0x1"
), ensuring it is encountered first during enumeration.
Order 5: C, A, B, D
This order is not achievable under the given constraints. Devices under the same bridge (C and D) are always enumerated together and cannot be interleaved with devices on different buses without moving them to separate bridges or changing the physical connections, which is not allowed in this setup. Therefore, we cannot have device C enumerated first, then devices A and B, and then device D. The enumeration process does not support interleaving devices from different buses in this manner when they are connected under the same bridge.
@milan-zededa, @christoph-zededa, @rucoder, fix me if I made a mistake somewhere.
proto/config/appconfig.proto
Outdated
// of application network interfaces. The controller can check | ||
// ZInfoDevice.api_capability to verify if the configured device supports the | ||
// API capability API_CAPABILITY_ENFORCED_NET_INTERFACE_ORDER. | ||
bool enforce_network_interface_order = 28; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it a "fixed" option? Once set, we are not going to change it, I guess? I mean, it looks like a proper part of AppInstanceConfig.FixedResources
. In this case, it should go to vm.proto
.
@rucoder Yes this could be done as an alternative to the order field. But from what I'm reading from you and Nikolay we will have to abandon this ordering scheme anyway and provide different method for the guest to map device to its configured order. |
What we discussed @milan-zededa might be useful, how about we allow EVE user to specify PCI address for guest device they're either passing through or adding as virtual Eth bridges via TAP iface? Or am I missing something? |
If I paraphrase the original ask: "we want to see interfaces in the guest nubered in the same order as they appear in the application manifest". Is my understanding correct? |
Yes, that is correct. |
@rucoder After looking at the controller API, it turns out that logical labels of application interfaces are user-defined and not necessarily follow the desired interface order, which is instead defined by the order of a list with app interface configs. Since in EVE API we have direct attachments and virtual interfaces in two separate lists, we will need additional integer indexes to know the desired order between them. |
proto/config/devcommon.proto
Outdated
|
||
// Optionally defines the network interface order relative to other directly assigned | ||
// network devices and virtual network adapters (see "interface_order" in | ||
// NetworkAdapter, file "netconfig.proto"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere you should document that the numbering is across the Adapter and NetworkAdapter for a given AppInstanceConfig thus each must have a unique number (when the enforce_network_interface_order is set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in terms of the API we can't do much better - one comment comment though.
I see a fair bit of discussion about the implementation with KVM - I don't know if there are different implementation considerations with Xen or with kubevirt.
@OhmSpectator Are you sure that is the case? I thought a Linux guest VM using the classical enN naming scheme will enumerate in the PCI enumeration order i.e., walk the PCI roots, buses, and devices in numerical order. The issue with that scheme is that names might change if a new PCI root, bus, or device is added (e.g., by plugging in a new PCI card) but it will have a predictable assignment of "N" as long as the enumeration of roots, buses, and devices are predictable. |
@eriknordmark |
Currently, when application has both virtual network interfaces configured and some network devices directly assigned, EVE will first attach virtual interfaces in the order that follows ACL IDs (a historical workaround for missing interface order field), followed by directly assigned network adapters, in the order of the AppInstanceConfig.adapters list. To allow the user to specify the order between all application network interfaces (across both virtual and passthrough devices), we introduce a new boolean flag enforce_network_interface_order inside the application instance config and allow the controller to pass the order requirements for all application network adapters. For backward compatibility reasons, by default this will be disabled and the original ordering method will remain in use. Signed-off-by: Milan Lenco <[email protected]>
Signed-off-by: Milan Lenco <[email protected]>
b00b814
to
eab5d2e
Compare
@eriknordmark, it's at least mentioned here: https://systemd.io/PREDICTABLE_INTERFACE_NAMES/ So, summarizing:
And the challenge of separating the order of interfaces within the same multifunctional device remains unsolved. Nevertheless, this approach is still fine for our particular case. We should just document the limitations. |
Marking the PR as ready for review&merge. The cloud team confirmed that this is OK for the controller and for EVE I have implementation already prepared (will open PR once this is merged). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently, when application has both virtual network interfaces
configured and some network devices directly assigned, EVE will first
attach virtual interfaces in the order that follows ACL IDs (a historical
workaround for missing interface order field), followed by directly assigned
network adapters, in the order of the
AppInstanceConfig.adapters
list.
To allow the user to specify the order between all application
network interfaces (across both virtual and passthrough devices), we
introduce a new boolean flag
enforce_network_interface_order
insidethe application instance config and allow the controller to pass the order
requirements for all the application network adapters.
For backward compatibility reasons, by default this will be disabled
and the original ordering method will remain in use.