forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 9edd998e (Aug 29) (14) #367
Open
mgehre-amd
wants to merge
315
commits into
bump_to_d4f97da1
Choose a base branch
from
bump_to_9edd998e
base: bump_to_d4f97da1
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch removes all of the Set.* methods from Status. This cleanup is part of a series of patches that make it harder use the anti-pattern of keeping a long-lives Status object around and updating it while dropping any errors it contains on the floor. This patch is largely NFC, the more interesting next steps this enables is to: 1. remove Status.Clear() 2. assert that Status::operator=() never overwrites an error 3. remove Status::operator=() Note that step (2) will bring 90% of the benefits for users, and step (3) will dramatically clean up the error handling code in various places. In the end my goal is to convert all APIs that are of the form ` ResultTy DoFoo(Status& error) ` to ` llvm::Expected<ResultTy> DoFoo() ` How to read this patch? The interesting changes are in Status.h and Status.cpp, all other changes are mostly ` perl -pi -e 's/\.SetErrorString/ = Status::FromErrorString/g' $(git grep -l SetErrorString lldb/source) ` plus the occasional manual cleanup.
…#102860) This patch switches most of the uses of intptr_t to uintptr_t within llvm-exegesis for the subprocess memory support. In the vast majority of cases we do not want a signed component of the address, hence making intptr_t undesirable. intptr_t is left for error handling, for example when making syscalls and we need to see if the syscall returned -1.
We were using a `_LIBCPP_ASSERT_FOO` macro without including `<__assert>`. rdar://134425695
…lvm#102940) Problem: On AIX, functions registered by atexit in a shared library are not run when the library is dlclosed, but instead run (and fail because the function pointer is no longer valid) during main program exit. The profile-rt registers some functions with atexit: 1. writeFileWithoutReturn that writes out the profile file 2. llvm_delete_reset_function_list that does some cleanup in the gcov instrumentation library (not sure) And so right now, we get an "Illegal instruction (core dumped)" when an instrumented shared object is dlopen'ed and dlclosed. Solution: When a shared library is dlclose'd, destructors from the library are called. So create a destructor function that iterates over all known functions that profile-rt registers with atexit, and unregister the ones that have been registered and execute them. Scenarios tested: (0) gcov dlopen/dlclose (AIX/gcov-dlopen-dlclose.test) (1) multiple dlopen/dlclose of the same lib and multiple libs (instrprof-dlopen-dlclose.test) (2) dlopen but no dlclose (exists: Posix/instrprof-dlopen.test) (3) a simple fork testcase with dlopen/dlclose (instrprof-dlopen-dlclose.test) (4) dlopen/dlclose by multiple threads. (instrprof-dlopen-dlclose.test) (5) regular dynamic-linking of instrumented shared libs (exists: AIX/shared-bexpall-pgo.c) (6) a simple fork testcase produces correct profile (instrprof-fork.c) --------- Co-authored-by: Hubert Tong <[email protected]>
Move handling of all internal calls into the designated pass. Preserve NOPs and mark functions as non-simple on non-X86 platforms.
This patch implements sandboxir::VAArgInst mirroring llvm::VAArgInst.
…use-count Added folds: - `(add (sub X, Y), (sub Z, X))` -> `(sub Z, Y)` - `(sub (add X, Y), (add X, Z))` -> `(sub Y, Z)` The fold typically is handled in the `Reassosiate` pass, but it fails if the inner `sub`/`add` are multi-use. Less importantly, Reassosiate doesn't propagate flags correctly. This patch adds the fold explicitly the InstCombine Proofs: https://alive2.llvm.org/ce/z/p6JyRP Closes llvm#105866
…parable with function count for each candidate (llvm#106260) The current cost-benefit analysis between vtable comparison and function comparison require the indirect fallback branch to be cold. This is too conservative. This change allows vtable-comparison as long as vtable count is comparable with function count for each function candidate and removes the cold indirect fallback requirement. Tested: 1. Testing this on benchmarks uplifts the measurable performance wins. Counting the (possibly-duplicated) remarks (because of linkonce_odr functions, cross-module import of functions) show the number of vtable remarks increases from ~30k-ish to 50k-ish. 2. https://gcc.godbolt.org/z/sbGK7Pacn shows vtable-comparison doesn't happen today (using the same IR input)
…profiles for given functions (llvm#104654) Currently in extended binary format, sample reader only read the profiles when the function are in the current module at initialization time, this extends the support to read the arbitrary profiles for given input functions in later stage. It's used for llvm#101053.
We recently added various CPU_SUBTYPE_ARM64E values, notably including CPU_SUBTYPE_ARM64E_VERSIONED_PTRAUTH_ABI_MASK, which is 0x80000000U. The enum is better off as a uint32_t to accomodate that. This also hopefully helps silence GCC warnings reported on a ternary in CPU_SUBTYPE_ARM64E_WITH_PTRAUTH_VERSION. The subtype is already generally treated as a uint32_t elsewhere, so while there, change the new helpers to explicitly pass/return the subtype as uint32_t, and the individual narrower components as either bool or unsigned.
…#106035) In the clobbered FP/BP range, we can't use it as normal FP/BP to access stack. So if there are stack accesses due to register spill, scheduling or other back end optimization, we should report an error instead of silently generate wrong code. Also try to minimize the save/restore range of the clobbered FP/BP if the FrameSetup doesn't change stack size.
Build on the -slp-vectorize-non-power-of-2 experimental option, and support vectorizing reductions with 2^N-1 sized vector. Specifically, two related changes: 1) When searching for a profitable VL, start with the 2^N-1 reduction width. If cost model does not select that VL, return to power of two boundaries when halfing the search VL. The later is mostly for simplicity. 2) Reduce the minimum reduction width from 4 to 3 when supporting non-power of two vectors. This is required to support <3 x Ty> cases. One thing which isn't directly related to this change, but I want to note for clarity is that the non-power-of-two vectorization appears to be sensative to operand order of reduction. I haven't yet fully figured out why, but I suspect this is non-power-of-two specific.
This patch fixes: llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:845:12: error: variable 'RemainingVTableCount' set but not used [-Werror,-Wunused-but-set-variable] llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp:306:23: error: private field 'PSI' is not used [-Werror,-Wunused-private-field] Here are a couple of domino effects: - Once I remove PSI, I need to update the contructor and its caller. - Once I remove RemainingVTableCount, I don't need TotalCount, so I am updating the caller as well.
…NFC) (llvm#106251) This patch forward ports the heterogeneous std::map::operator[]() from C++26 so that we can look up the map without allocating an instance of std::string when the key-value pair exists in the map. The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. The new list will be a hash set of tuples (SourceModule, GUID, ImportType) represented in a space efficient manner. That means that as we iterate over the hash set, we encounter SourceModule as many times as GUID. We don't want to create a temporary instance of std::string every time we look up ModuleToSummariesForIndex like: auto &SummariesForIndex = ModuleToSummariesForIndex[std::string(ILI.first)]; This patch removes the need to create the temporaries by enabling the hetegeneous lookup with std::set<K, V, std::less<>> and forward porting std::map::operator[]() from C++26.
llvm#105478) Currently, `getStackAlignment` asserts if the stack alignment wasn't specified. This makes it inconvenient to use and complicates testing. This change also makes `exceedsNaturalStackAlignment` method redundant.
Make some minor tweaks to AMDGPU tests to ensure they still work as intended after llvm#97762. These tests can be radically simplified after bitcast aware fpclass deduction.
…6238) This code has been unchanged for two years; let's simplify the code and remove configurability which makes the code harder to follow.
…m#105832) llvm#78086 provided the trait we want to use for this: `__libcpp_integer`. In some `libcxx/containers/views/mdspan` tests, improper uses of `char` are replaced with `signed char`. Fixes llvm#73715
Works towards P0619R4/llvm#99985. - std::uncaught_exception was not previously deprecated. This patch deprecates it since C++17 as per N4259. std::uncaught_exceptions is used instead as libc++ unconditionally provides this function. - _LIBCPP_ENABLE_CXX20_REMOVED_UNCAUGHT_EXCEPTION restores std::uncaught_exception. - As a drive-by, this patch updates the C++20 status page to explain that D.11 is already done, since it was done in 578d09c.
Certain intrinsics map to builtins that require an immediate (literal) argument; make sure we report non-literal arguments. This has been kicking around downstream for a while, and the recent removal of the MMX builtins caused me to notice it again.
…pes. Need to use original cmp type i1 when estimating the cost for the buildvector node, not its operand types to prevent compiler crash upon TTI cost estimation.
Fixes failure on the llvm-clang-aarch64-darwin buildbot: https://lab.llvm.org/buildbot/#/builders/190/builds/4660/ The test mentioned does not rely on any unique property of X86, but does rely on the layout of the basic blocks produced by llc, which varies between targets. Although the test could be duplicated for other targets, it seems unnecessary since the behaviour being tested is not target-specific.
Improve operand analysis using SCEV for cost purposes. This fixes a divergence between legacy and VPlan-based cost-modeling after 533e6bb. Fixes llvm#106248.
…n in BB (llvm#105524)" Reverted (along with the NFC followup fix) due to buildbot failure: https://lab.llvm.org/buildbot/#/builders/160/builds/4142 This reverts commit 3ef37e2, and commit 616f7d3.
If the global variable is constant (but not constexpr), we need to diagnose, but keep evaluating.
… values VPERMILPS lower bits0-3 (to index per-lane i32/f32 0-3) VPERMILPD uses bit1 (to index per-lane i64/f64 0-1) Use SimplifyDemandedBits to ignore anything touching the remaining bits. Part of llvm#106413
) When including all targets, some files become too large for the NSIS installer to handle. Fixes llvm#101994
Add Windows include equivalents for includes and shell command.
Link restored from the original policy outlined here https://discourse.llvm.org/t/code-of-conduct-changes-related-to-llvm-project-policy-changes/64197
Currently when `LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING` is set we prefetch memory for read on the source buffer. This patch adds prefetch for write on the destination buffer.
…it (llvm#106430) We were reporting ambigious references from using declarations as user can be depending on different overloads of a function just because they are visible in the TU. This doesn't apply to records, or primary templates as declaration being referenced in such cases is unambigious, the ambiguity applies to specializations though. Hence this patch returns an explicit reference to record decls and primary templates of those.
This follows Solaris behavior of allowing both mnemonics all the time. Fixes llvm#105639.
Fix llvm#105571 which demonstrates an end() iterator dereference when performing a non-empty splice to end() from a region that ends at Src::end(). Rather than calling Instruction::adoptDbgRecords from Dest, create a marker (which takes an iterator) and absorbDebugValues onto that. The "absorb" variant doesn't clean up the source marker, which in this case we know is a trailing marker, so we have to do that manually.
…106382) Many tests were easy to update, but these are quite big and I think it's better to autogenerate them to see the difference well.
This requires a bit of restructuring of ctor calls when checking for a potential constant expression.
…/16 vector widths This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
…r widths test checks This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
These few worked without changes.
LLVM has a CMake variable to control whether to consider logf128 constant folding which libAnalysis ignores. This patch changes the logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in config-ix.cmake.
…on in BB (llvm#105524)" Fixes the previous buildbot error by adding an explicit triple to the test, ensuring that llc can produce a valid object file. This reverts commit 926f097.
Reverts llvm#102147 It seems some systems which should support F128 are wrongly detected as not supporting. This might be due to checking `LDBL_MANT_DIG` instead of `__LDBL_MANT_DIG__`. I will investigate.
cferry-AMD
approved these changes
Sep 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.