Skip to content

Commit

Permalink
Merge tag 'rolling-lts/wsl/5.15.167.4' into linux-msft-wsl-5.15.y
Browse files Browse the repository at this point in the history
Signed-off-by: Mitchell Levy <[email protected]>
  • Loading branch information
chessturo committed Nov 5, 2024
2 parents 33cad98 + 3b1eeb4 commit 6ac7abb
Show file tree
Hide file tree
Showing 2,264 changed files with 34,691 additions and 19,298 deletions.
1 change: 1 addition & 0 deletions Documentation/ABI/testing/sysfs-devices-system-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,7 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/meltdown
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
/sys/devices/system/cpu/vulnerabilities/retbleed
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/spectre_v1
Expand Down
78 changes: 78 additions & 0 deletions Documentation/admin-guide/filesystem-monitoring.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
.. SPDX-License-Identifier: GPL-2.0
====================================
File system Monitoring with fanotify
====================================

File system Error Reporting
===========================

Fanotify supports the FAN_FS_ERROR event type for file system-wide error
reporting. It is meant to be used by file system health monitoring
daemons, which listen for these events and take actions (notify
sysadmin, start recovery) when a file system problem is detected.

By design, a FAN_FS_ERROR notification exposes sufficient information
for a monitoring tool to know a problem in the file system has happened.
It doesn't necessarily provide a user space application with semantics
to verify an IO operation was successfully executed. That is out of
scope for this feature. Instead, it is only meant as a framework for
early file system problem detection and reporting recovery tools.

When a file system operation fails, it is common for dozens of kernel
errors to cascade after the initial failure, hiding the original failure
log, which is usually the most useful debug data to troubleshoot the
problem. For this reason, FAN_FS_ERROR tries to report only the first
error that occurred for a file system since the last notification, and
it simply counts additional errors. This ensures that the most
important pieces of information are never lost.

FAN_FS_ERROR requires the fanotify group to be setup with the
FAN_REPORT_FID flag.

At the time of this writing, the only file system that emits FAN_FS_ERROR
notifications is Ext4.

A FAN_FS_ERROR Notification has the following format::

::

[ Notification Metadata (Mandatory) ]
[ Generic Error Record (Mandatory) ]
[ FID record (Mandatory) ]

The order of records is not guaranteed, and new records might be added
in the future. Therefore, applications must not rely on the order and
must be prepared to skip over unknown records. Please refer to
``samples/fanotify/fs-monitor.c`` for an example parser.

Generic error record
--------------------

The generic error record provides enough information for a file system
agnostic tool to learn about a problem in the file system, without
providing any additional details about the problem. This record is
identified by ``struct fanotify_event_info_header.info_type`` being set
to FAN_EVENT_INFO_TYPE_ERROR.

::

struct fanotify_event_info_error {
struct fanotify_event_info_header hdr;
__s32 error;
__u32 error_count;
};

The `error` field identifies the type of error using errno values.
`error_count` tracks the number of errors that occurred and were
suppressed to preserve the original error information, since the last
notification.

FID record
----------

The FID record can be used to uniquely identify the inode that triggered
the error through the combination of fsid and file handle. A file system
specific application can use that information to attempt a recovery
procedure. Errors that are not related to an inode are reported with an
empty file handle of type FILEID_INVALID.
4 changes: 2 additions & 2 deletions Documentation/admin-guide/hw-vuln/core-scheduling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ arg4:
will be performed for all tasks in the task group of ``pid``.

arg5:
userspace pointer to an unsigned long for storing the cookie returned by
``PR_SCHED_CORE_GET`` command. Should be 0 for all other commands.
userspace pointer to an unsigned long long for storing the cookie returned
by ``PR_SCHED_CORE_GET`` command. Should be 0 for all other commands.

In order for a process to push a cookie to, or pull a cookie from a process, it
is required to have the ptrace access mode: `PTRACE_MODE_READ_REALCREDS` to the
Expand Down
1 change: 1 addition & 0 deletions Documentation/admin-guide/hw-vuln/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ are configurable at compile, boot or run time.
cross-thread-rsb.rst
gather_data_sampling.rst
srso
reg-file-data-sampling
104 changes: 104 additions & 0 deletions Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
==================================
Register File Data Sampling (RFDS)
==================================

Register File Data Sampling (RFDS) is a microarchitectural vulnerability that
only affects Intel Atom parts(also branded as E-cores). RFDS may allow
a malicious actor to infer data values previously used in floating point
registers, vector registers, or integer registers. RFDS does not provide the
ability to choose which data is inferred. CVE-2023-28746 is assigned to RFDS.

Affected Processors
===================
Below is the list of affected Intel processors [#f1]_:

=================== ============
Common name Family_Model
=================== ============
ATOM_GOLDMONT 06_5CH
ATOM_GOLDMONT_D 06_5FH
ATOM_GOLDMONT_PLUS 06_7AH
ATOM_TREMONT_D 06_86H
ATOM_TREMONT 06_96H
ALDERLAKE 06_97H
ALDERLAKE_L 06_9AH
ATOM_TREMONT_L 06_9CH
RAPTORLAKE 06_B7H
RAPTORLAKE_P 06_BAH
ALDERLAKE_N 06_BEH
RAPTORLAKE_S 06_BFH
=================== ============

As an exception to this table, Intel Xeon E family parts ALDERLAKE(06_97H) and
RAPTORLAKE(06_B7H) codenamed Catlow are not affected. They are reported as
vulnerable in Linux because they share the same family/model with an affected
part. Unlike their affected counterparts, they do not enumerate RFDS_CLEAR or
CPUID.HYBRID. This information could be used to distinguish between the
affected and unaffected parts, but it is deemed not worth adding complexity as
the reporting is fixed automatically when these parts enumerate RFDS_NO.

Mitigation
==========
Intel released a microcode update that enables software to clear sensitive
information using the VERW instruction. Like MDS, RFDS deploys the same
mitigation strategy to force the CPU to clear the affected buffers before an
attacker can extract the secrets. This is achieved by using the otherwise
unused and obsolete VERW instruction in combination with a microcode update.
The microcode clears the affected CPU buffers when the VERW instruction is
executed.

Mitigation points
-----------------
VERW is executed by the kernel before returning to user space, and by KVM
before VMentry. None of the affected cores support SMT, so VERW is not required
at C-state transitions.

New bits in IA32_ARCH_CAPABILITIES
----------------------------------
Newer processors and microcode update on existing affected processors added new
bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
vulnerability and mitigation capability:

- Bit 27 - RFDS_NO - When set, processor is not affected by RFDS.
- Bit 28 - RFDS_CLEAR - When set, processor is affected by RFDS, and has the
microcode that clears the affected buffers on VERW execution.

Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control RFDS mitigation at boot time with the
parameter "reg_file_data_sampling=". The valid arguments are:

========== =================================================================
on If the CPU is vulnerable, enable mitigation; CPU buffer clearing
on exit to userspace and before entering a VM.
off Disables mitigation.
========== =================================================================

Mitigation default is selected by CONFIG_MITIGATION_RFDS.

Mitigation status information
-----------------------------
The Linux kernel provides a sysfs interface to enumerate the current
vulnerability status of the system: whether the system is vulnerable, and
which mitigations are active. The relevant sysfs file is:

/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling

The possible values in this file are:

.. list-table::

* - 'Not affected'
- The processor is not vulnerable
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: No microcode'
- The processor is vulnerable but microcode is not updated.
* - 'Mitigation: Clear Register File'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.

References
----------
.. [#f1] Affected Processors
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
62 changes: 50 additions & 12 deletions Documentation/admin-guide/hw-vuln/spectre.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,10 @@ associated with the source address of the indirect branch. Specifically,
the BHB might be shared across privilege levels even in the presence of
Enhanced IBRS.

Currently the only known real-world BHB attack vector is via
unprivileged eBPF. Therefore, it's highly recommended to not enable
unprivileged eBPF, especially when eIBRS is used (without retpolines).
For a full mitigation against BHB attacks, it's recommended to use
retpolines (or eIBRS combined with retpolines).
Previously the only known real-world BHB attack vector was via unprivileged
eBPF. Further research has found attacks that don't require unprivileged eBPF.
For a full mitigation against BHB attacks it is recommended to set BHI_DIS_S or
use the BHB clearing sequence.

Attack scenarios
----------------
Expand Down Expand Up @@ -430,6 +429,23 @@ The possible values in this file are:
'PBRSB-eIBRS: Not affected' CPU is not affected by PBRSB
=========================== =======================================================

- Branch History Injection (BHI) protection status:

.. list-table::

* - BHI: Not affected
- System is not affected
* - BHI: Retpoline
- System is protected by retpoline
* - BHI: BHI_DIS_S
- System is protected by BHI_DIS_S
* - BHI: SW loop, KVM SW loop
- System is protected by software clearing sequence
* - BHI: Vulnerable
- System is vulnerable to BHI
* - BHI: Vulnerable, KVM: SW loop
- System is vulnerable; KVM is protected by software clearing sequence

Full mitigation might require a microcode update from the CPU
vendor. When the necessary microcode is not available, the kernel will
report vulnerability.
Expand Down Expand Up @@ -484,11 +500,18 @@ Spectre variant 2

Systems which support enhanced IBRS (eIBRS) enable IBRS protection once at
boot, by setting the IBRS bit, and they're automatically protected against
Spectre v2 variant attacks, including cross-thread branch target injections
on SMT systems (STIBP). In other words, eIBRS enables STIBP too.
some Spectre v2 variant attacks. The BHB can still influence the choice of
indirect branch predictor entry, and although branch predictor entries are
isolated between modes when eIBRS is enabled, the BHB itself is not isolated
between modes. Systems which support BHI_DIS_S will set it to protect against
BHI attacks.

Legacy IBRS systems clear the IBRS bit on exit to userspace and
therefore explicitly enable STIBP for that
On Intel's enhanced IBRS systems, this includes cross-thread branch target
injections on SMT systems (STIBP). In other words, Intel eIBRS enables
STIBP, too.

AMD Automatic IBRS does not protect userspace, and Legacy IBRS systems clear
the IBRS bit on exit to userspace, therefore both explicitly enable STIBP.

The retpoline mitigation is turned on by default on vulnerable
CPUs. It can be forced on or off by the administrator
Expand Down Expand Up @@ -622,9 +645,10 @@ kernel command line.
retpoline,generic Retpolines
retpoline,lfence LFENCE; indirect branch
retpoline,amd alias for retpoline,lfence
eibrs enhanced IBRS
eibrs,retpoline enhanced IBRS + Retpolines
eibrs,lfence enhanced IBRS + LFENCE
eibrs Enhanced/Auto IBRS
eibrs,retpoline Enhanced/Auto IBRS + Retpolines
eibrs,lfence Enhanced/Auto IBRS + LFENCE
ibrs use IBRS to protect kernel

Not specifying this option is equivalent to
spectre_v2=auto.
Expand Down Expand Up @@ -684,6 +708,20 @@ For user space mitigation:
spectre_v2=off. Spectre variant 1 mitigations
cannot be disabled.

spectre_bhi=

[X86] Control mitigation of Branch History Injection
(BHI) vulnerability. This setting affects the deployment
of the HW BHI control and the SW BHB clearing sequence.

on
(default) Enable the HW or SW mitigation as
needed.
off
Disable the mitigation.

For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt

Mitigation selection guide
--------------------------

Expand Down
1 change: 1 addition & 0 deletions Documentation/admin-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
filesystem-monitoring
nfs/index
gpio/index
highuid
Expand Down
Loading

0 comments on commit 6ac7abb

Please sign in to comment.