Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 9edd998e (Aug 29) (14) #367

Open
wants to merge 315 commits into
base: bump_to_d4f97da1
Choose a base branch
from

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

adrian-prantl and others added 30 commits August 27, 2024 10:59
This patch removes all of the Set.* methods from Status.

This cleanup is part of a series of patches that make it harder use the
anti-pattern of keeping a long-lives Status object around and updating
it while dropping any errors it contains on the floor.

This patch is largely NFC, the more interesting next steps this enables
is to:
1. remove Status.Clear()
2. assert that Status::operator=() never overwrites an error
3. remove Status::operator=()

Note that step (2) will bring 90% of the benefits for users, and step
(3) will dramatically clean up the error handling code in various
places. In the end my goal is to convert all APIs that are of the form

`    ResultTy DoFoo(Status& error)
`
to

`    llvm::Expected<ResultTy> DoFoo()
`
How to read this patch?

The interesting changes are in Status.h and Status.cpp, all other
changes are mostly

` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git
grep -l SetErrorString lldb/source)
`
plus the occasional manual cleanup.
…#102860)

This patch switches most of the uses of intptr_t to uintptr_t within
llvm-exegesis for the subprocess memory support. In the vast majority of
cases we do not want a signed component of the address, hence making
intptr_t undesirable. intptr_t is left for error handling, for example
when making syscalls and we need to see if the syscall returned -1.
We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`.

rdar://134425695
…lvm#102940)

Problem:
On AIX, functions registered by atexit in a shared library are not run
when the library is dlclosed, but instead run (and fail because the
function pointer is no longer valid) during main program exit.

The profile-rt registers some functions with atexit:

 1. writeFileWithoutReturn that writes out the profile file
 2. llvm_delete_reset_function_list that does some cleanup in the gcov 
    instrumentation library (not sure)

And so right now, we get an "Illegal instruction (core dumped)" when an
instrumented shared object is dlopen'ed and dlclosed.

Solution:
  When a shared library is dlclose'd, destructors from the library are
  called. So create a destructor function that iterates over all known
  functions that profile-rt registers with atexit, and unregister the ones
  that have been registered and execute them.

Scenarios tested:
(0) gcov dlopen/dlclose                                       (AIX/gcov-dlopen-dlclose.test)
(1) multiple dlopen/dlclose of the same lib and multiple libs (instrprof-dlopen-dlclose.test)
(2) dlopen but no dlclose                                     (exists: Posix/instrprof-dlopen.test)
(3) a simple fork testcase with dlopen/dlclose                (instrprof-dlopen-dlclose.test)
(4) dlopen/dlclose by multiple threads.                       (instrprof-dlopen-dlclose.test)
(5) regular dynamic-linking of instrumented shared libs       (exists: AIX/shared-bexpall-pgo.c)
(6) a simple fork testcase produces correct profile           (instrprof-fork.c)


---------

Co-authored-by: Hubert Tong <[email protected]>
Move handling of all internal calls into the designated pass. Preserve
NOPs and mark functions as non-simple on non-X86 platforms.
This patch implements sandboxir::VAArgInst mirroring llvm::VAArgInst.
…use-count

Added folds:
    - `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)`
    - `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)`

The fold typically is handled in the `Reassosiate` pass, but it fails
if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate
doesn't propagate flags correctly.

This patch adds the fold explicitly the InstCombine

Proofs: https://alive2.llvm.org/ce/z/p6JyRP

Closes llvm#105866
…parable with function count for each candidate (llvm#106260)

The current cost-benefit analysis between vtable comparison and function
comparison require the indirect fallback branch to be cold. This is too
conservative.

This change allows vtable-comparison as long as vtable count is
comparable with function count for each function candidate and removes
the cold indirect fallback requirement.

Tested:
1. Testing this on benchmarks uplifts the measurable performance wins.
Counting the (possibly-duplicated) remarks (because of linkonce_odr
functions, cross-module import of functions) show the number of vtable
remarks increases from ~30k-ish to 50k-ish.
2. https://gcc.godbolt.org/z/sbGK7Pacn shows vtable-comparison doesn't
happen today (using the same IR input)
…profiles for given functions (llvm#104654)

Currently in extended binary format, sample reader only read the
profiles when the function are in the current module at initialization
time, this extends the support to read the arbitrary profiles for given
input functions in later stage. It's used for
llvm#101053.
We recently added various CPU_SUBTYPE_ARM64E values, notably including
CPU_SUBTYPE_ARM64E_VERSIONED_PTRAUTH_ABI_MASK, which is 0x80000000U.
The enum is better off as a uint32_t to accomodate that.

This also hopefully helps silence GCC warnings reported on a ternary in
CPU_SUBTYPE_ARM64E_WITH_PTRAUTH_VERSION.

The subtype is already generally treated as a uint32_t elsewhere, so
while there, change the new helpers to explicitly pass/return the
subtype as uint32_t, and the individual narrower components as either
bool or unsigned.
…#106035)

In the clobbered FP/BP range, we can't use it as normal FP/BP to access
stack. So if there are stack accesses due to register spill, scheduling
or other back end optimization, we should report an error instead of
silently generate wrong code.

Also try to minimize the save/restore range of the clobbered FP/BP if
the FrameSetup doesn't change stack size.
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.

Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
   when halfing the search VL.  The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
   of two vectors.  This is required to support <3 x Ty> cases.

One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.
This patch fixes:

  llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:845:12:
  error: variable 'RemainingVTableCount' set but not used
  [-Werror,-Wunused-but-set-variable]

  llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:306:23:
  error: private field 'PSI' is not used
  [-Werror,-Wunused-private-field]

Here are a couple of domino effects:

- Once I remove PSI, I need to update the contructor and its caller.

- Once I remove RemainingVTableCount, I don't need TotalCount, so I am
  updating the caller as well.
…NFC) (llvm#106251)

This patch forward ports the heterogeneous std::map::operator[]() from
C++26 so that we can look up the map without allocating an instance of
std::string when the key-value pair exists in the map.

The background is as follows.  I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list.  The new list will be a hash set of
tuples (SourceModule, GUID, ImportType) represented in a space
efficient manner.  That means that as we iterate over the hash set, we
encounter SourceModule as many times as GUID.  We don't want to create
a temporary instance of std::string every time we look up
ModuleToSummariesForIndex like:

auto &SummariesForIndex =
ModuleToSummariesForIndex[std::string(ILI.first)];

This patch removes the need to create the temporaries by enabling the
hetegeneous lookup with std::set<K, V, std::less<>> and forward
porting std::map::operator[]() from C++26.
llvm#105478)

Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.

This change also makes `exceedsNaturalStackAlignment` method redundant.
Make some minor tweaks to AMDGPU tests to ensure they still work as
intended after llvm#97762. These
tests can be radically simplified after bitcast aware fpclass deduction.
…6238)

This code has been unchanged for two years; let's simplify the code
and remove configurability which makes the code harder to follow.
…m#105832)

llvm#78086 provided the trait we want to use for this: `__libcpp_integer`.

In some `libcxx/containers/views/mdspan` tests, improper uses of `char` 
are replaced with `signed char`. 

Fixes llvm#73715
Works towards P0619R4/llvm#99985.

- std::uncaught_exception was not previously deprecated. This patch
  deprecates it since C++17 as per N4259. std::uncaught_exceptions is
  used instead as libc++ unconditionally provides this function.

- _LIBCPP_ENABLE_CXX20_REMOVED_UNCAUGHT_EXCEPTION restores
  std::uncaught_exception.

- As a drive-by, this patch updates the C++20 status page to 
  explain that D.11 is already done, since it was done in 
  578d09c.
Certain intrinsics map to builtins that require an immediate (literal)
argument; make sure we report non-literal arguments.

This has been kicking around downstream for a while, and the recent
removal of the MMX builtins caused me to notice it again.
alexey-bataev and others added 29 commits August 29, 2024 03:53
…pes.

Need to use original cmp type i1 when estimating the cost for the
buildvector node, not its operand types to prevent compiler crash upon
TTI cost estimation.
Fixes failure on the llvm-clang-aarch64-darwin buildbot:
https://lab.llvm.org/buildbot/#/builders/190/builds/4660/

The test mentioned does not rely on any unique property of X86, but does
rely on the layout of the basic blocks produced by llc, which varies
between targets. Although the test could be duplicated for other targets,
it seems unnecessary since the behaviour being tested is not
target-specific.
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bb.

Fixes llvm#106248.
…n in BB (llvm#105524)"

Reverted (along with the NFC followup fix) due to buildbot failure:
https://lab.llvm.org/buildbot/#/builders/160/builds/4142

This reverts commit 3ef37e2, and commit
616f7d3.
The underlying issue was discovered by an assert added in
a800533 by a test case provided by @mstorsjo.
If the global variable is constant (but not constexpr), we need to
diagnose, but keep evaluating.
… values

VPERMILPS lower bits0-3 (to index per-lane i32/f32 0-3)
VPERMILPD uses bit1  (to index per-lane i64/f64 0-1)

Use SimplifyDemandedBits to ignore anything touching the remaining bits.

Part of llvm#106413
)

When including all targets, some files become too large for the NSIS
installer to handle.

Fixes llvm#101994
Add Windows include equivalents for includes and shell command.
Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we
prefetch memory for read on the source buffer. This patch adds prefetch
for write on the destination buffer.
…it (llvm#106430)

We were reporting ambigious references from using declarations as user
can be depending on different overloads of a function just because they
are visible in the TU.
This doesn't apply to records, or primary templates as declaration being
referenced in such cases is unambigious, the ambiguity applies to
specializations though.

Hence this patch returns an explicit reference to record decls and
primary templates of those.
This follows Solaris behavior of allowing both mnemonics all the time.

Fixes llvm#105639.
Fix llvm#105571 which demonstrates an end() iterator dereference when
performing a non-empty splice to end() from a region that ends at
Src::end().

Rather than calling Instruction::adoptDbgRecords from Dest, create a marker
(which takes an iterator) and absorbDebugValues onto that. The "absorb" variant
doesn't clean up the source marker, which in this case we know is a trailing
marker, so we have to do that manually.
…106382)

Many tests were easy to update, but these are quite big and I think it's
better to autogenerate them to see the difference well.
This requires a bit of restructuring of ctor calls when checking for a
potential constant expression.
…/16 vector widths

This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
…r widths test checks

This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
These few worked without changes.
LLVM has a CMake variable to control whether to consider logf128
constant folding which libAnalysis ignores. This patch changes the
logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in
config-ix.cmake.
…on in BB (llvm#105524)"

Fixes the previous buildbot error by adding an explicit triple to the test,
ensuring that llc can produce a valid object file.

This reverts commit 926f097.
Reverts llvm#102147

It seems some systems which should support F128 are wrongly detected as
not supporting.

This might be due to checking `LDBL_MANT_DIG` instead of
`__LDBL_MANT_DIG__`. I will investigate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment