-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Improve NVMe controller init robustness #515
Labels
state:needs-triage
Needs to triaged to determine next steps
type:feature-request
A new feature proposal
urgency:low
Little to no impact
Comments
makubacki
added
state:needs-triage
Needs to triaged to determine next steps
type:feature-request
A new feature proposal
labels
Aug 9, 2023
5 tasks
makubacki
added a commit
that referenced
this issue
Aug 14, 2023
## Description Resolves #515 The main change the first commit: ``` MdeModulePkg/NvmExpressDxe: Improve NVMe controller init robustness It has been observed that some faulty NVMe devices may return invalid values. The code in NvmExpressDxe recognizes the controller is not responding correctly and issues an ASSERT() often in DEBUG builds or a reset in RELEASE builds. The following changes are made to NvmeControllerInit() to prevent a faulty NVMe device from halting the overall boot: 1. Check the Vendor ID and Device ID to verify they are read properly and if not, return EFI_DEVICE_ERROR 2. Replace the ASSERT() when Memory Page Size Minimum (Cap.Mpsmin) with an error message and return EFI_DEVICE_ERROR ``` The additional changes are added due to improvements found in nearby code. 2: ``` MdeModulePkg/NvmExpressDxe: Correct function parameter modifer Updates the `Cap` parameter for `ReadNvmeControllerCapabilities()` to be `OUT` since the buffer pointed to is written within the function. ``` 3: ``` MdePkg/Nvme.h: Add missing NVMe capability descriptions Adds missing structure member documentation. ``` This will also be sent to the edk2 mailing list, however, it is being checked in here first due to some devices that may be able to benefit from the change and the upcoming 202308 freeze. - [ ] Impacts functionality? - **Functionality** - Does the change ultimately impact how firmware functions? - Examples: Add a new library, publish a new PPI, update an algorithm, ... - [ ] Impacts security? - **Security** - Does the change have a direct security impact on an application, flow, or firmware? - Examples: Crypto algorithm change, buffer overflow fix, parameter validation improvement, ... - [ ] Breaking change? - **Breaking change** - Will anyone consuming this change experience a break in build or boot behavior? - Examples: Add a new library class, move a module to a different repo, call a function in a new library class in a pre-existing module, ... - [ ] Includes tests? - **Tests** - Does the change include any explicit test code? - Examples: Unit tests, integration tests, robot tests, ... - [ ] Includes documentation? - **Documentation** - Does the change contain explicit documentation additions outside direct code modifications (and comments)? - Examples: Update readme file, add feature readme file, link to documentation on an a separate Web page, ... ## How This Was Tested - Verified NVMe boot on a physical system - Verified NVMe driver binding on QEMU Q35 with an emulated NVMe device ## Integration Instructions N/A --------- Signed-off-by: Michael Kubacki <[email protected]>
kenlautner
pushed a commit
that referenced
this issue
Oct 17, 2023
## Description Resolves #515 The main change the first commit: ``` MdeModulePkg/NvmExpressDxe: Improve NVMe controller init robustness It has been observed that some faulty NVMe devices may return invalid values. The code in NvmExpressDxe recognizes the controller is not responding correctly and issues an ASSERT() often in DEBUG builds or a reset in RELEASE builds. The following changes are made to NvmeControllerInit() to prevent a faulty NVMe device from halting the overall boot: 1. Check the Vendor ID and Device ID to verify they are read properly and if not, return EFI_DEVICE_ERROR 2. Replace the ASSERT() when Memory Page Size Minimum (Cap.Mpsmin) with an error message and return EFI_DEVICE_ERROR ``` The additional changes are added due to improvements found in nearby code. 2: ``` MdeModulePkg/NvmExpressDxe: Correct function parameter modifer Updates the `Cap` parameter for `ReadNvmeControllerCapabilities()` to be `OUT` since the buffer pointed to is written within the function. ``` 3: ``` MdePkg/Nvme.h: Add missing NVMe capability descriptions Adds missing structure member documentation. ``` This will also be sent to the edk2 mailing list, however, it is being checked in here first due to some devices that may be able to benefit from the change and the upcoming 202308 freeze. - [ ] Impacts functionality? - **Functionality** - Does the change ultimately impact how firmware functions? - Examples: Add a new library, publish a new PPI, update an algorithm, ... - [ ] Impacts security? - **Security** - Does the change have a direct security impact on an application, flow, or firmware? - Examples: Crypto algorithm change, buffer overflow fix, parameter validation improvement, ... - [ ] Breaking change? - **Breaking change** - Will anyone consuming this change experience a break in build or boot behavior? - Examples: Add a new library class, move a module to a different repo, call a function in a new library class in a pre-existing module, ... - [ ] Includes tests? - **Tests** - Does the change include any explicit test code? - Examples: Unit tests, integration tests, robot tests, ... - [ ] Includes documentation? - **Documentation** - Does the change contain explicit documentation additions outside direct code modifications (and comments)? - Examples: Update readme file, add feature readme file, link to documentation on an a separate Web page, ... ## How This Was Tested - Verified NVMe boot on a physical system - Verified NVMe driver binding on QEMU Q35 with an emulated NVMe device ## Integration Instructions N/A --------- Signed-off-by: Michael Kubacki <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
state:needs-triage
Needs to triaged to determine next steps
type:feature-request
A new feature proposal
urgency:low
Little to no impact
Feature Overview
Some SSD devices have been observed to stop responding during boot after initially succeeding during initialization. When the failure occurs, the following message is observed from
NvmeExpressPassThru
:NvmExpressPassThru: Timeout occurs for an NVMe command.
[ref]
Consequently,
NvmeControllerInit()
is called and the following assert is encountered:ASSERT ((Private->Cap.Mpsmin + 12) <= EFI_PAGE_SHIFT);
[ref]
It is suspected that the
Cap.Mpsmin
value is invalid. In any case, this can cause an assertion to stop DEBUG builds and reset on RELEASE builds preventing recovery from UEFI shell or USB boot.Solution Overview
NvmeControllerInit()
to verify the device is accessibleEFI_DEVICE_ERROR
instead of asserting if theCap.Mpsmin
value is invalidAlternatives Considered
No response
Urgency
Low
Are you going to implement the feature request?
I will implement the feature
Do you need maintainer feedback?
No maintainer feedback needed
Anything else?
No response
The text was updated successfully, but these errors were encountered: