[GEN] Update GENX branch to LLVM `765206e` #14025

whitneywhtsang · 2024-06-04T04:29:53Z

No description provided.

… (#85837) Introduced in #78041, originally reported as #79957 and fixed partially in #80050. `OpaqueValueExpr` used with `TemplateArgument::StructuralValue` has no corresponding source expression. A test case with subobject-referring NTTP added.

Use UTC. Add test coverage for AIX.

Even as the NPM has been in use by Polly for a while now, the majority of the tests continue using the LPM passes. This patch ports the tests to use the NPM passes (for example, by replacing a flag such as -polly-detect with -passes=polly-detect following the NPM syntax for specifying passes) with some exceptions for some missing features in the new passes. Relanding #90632.

Remove this test since it is marked as XFAIL and has some non-deterministic behaviour which causes it to spuriously pass on out-of-tree builds. Capturing this in llvm/llvm-project#93342 to make a proper fix and a test later.

We compute BF hashes in `YAMLProfileReader::readProfile` when first matching profile functions with binary functions, and second time in `YAMLProfileReader::parseFunctionProfile` during the profile assignment (we need to do that to account for LTO private functions with mismatching suffix). Avoid recomputing the hash if it's been set.

These are useful for finer-grain debugging and complement the already exposed global debug flag.

…(#89950) Previously, since response (.rsp) files weren't expanded at the very beginning of clang-scan-deps, we only parsed the command-line as provided in the Clang .cdb file. Unfortunately, when using Unreal Engine, arguments are always generated in a .rsp file (ie. `/path/to/clang-cl.exe @/path/to/filename_args.rsp`). After this patch, `/Fo` can be parsed and added to the final command-line. Without this option, the make targets that are emitted are made up from the input file name alone. We have some cases where the same input in the project generates several output files, so we end up with duplicate make targets in the scan-deps emitted dependency file.

llvm/llvm-project#93106 introduced some necessary fixes to module file generation, but has also caused a regression. The module file output can include bogus attempts to USE-associate symbols local to derived type scopes, like components and bindings. Fix, and extend a test.

We only know it expands to a 2 instruction sequence, not necessarily a sign extended sequence. Happened to notice while I was looking at naming for the proposed rematerializable LUI+ADDI for addresses.

Move out common X86MemOperand checks into helper lambdas. To be reused in #91667. Test Plan: NFC

As we have debuginfod as symbol locator available in lldb now, we want to make full use of it. In case of post mortem debugging, we don't always have the main executable available. However, the .note.gnu.build-id of the main executable(some other modules too), should be available in the core file, as those binaries are loaded in memory and dumped in the core file. We try to iterate through the NT_FILE entries, read and store the gnu build id if possible. This will be very useful as this id is the unique key which is needed for querying the debuginfod server. Test: Build and run lldb. Breakpoint set to https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147 Verified after this commit, module_uuid is the correct gnu build id of the main executable which caused the crash(first in the NT_FILE entry) Previous PR: llvm/llvm-project#92078 was mistakenly merged. This PR is re-opening the commit.

Simplify mutually exclusive sanity checks in analyzeIndirectBranch, where an UNKNOWN IndirectBranchType is to be returned. Reduces confusion and code duplication when adding a new IndirectBranchType (to be added in #91667). Test Plan: NFC

Inline traits used by `arith.select` only into `ArithOps.td`. Trim trailing whitespace in op description.

Move out common code extracting the address of a MCExpr. To be reused in #91667. Test Plan: NFC

Enables writing patterns where one has op creation with variadic in result pattern more easily. Signed-off-by: Jacques Pienaar <[email protected]>

…onment is MSVC (#91689) From looking at the rest of code and from my own understanding, the driver mode is supposed to be independent of MSVC compatibility when the target triple is `*-windows-msvc`. Therefore strict aliasing should be disabled by default when the target triple is `*-windows-msvc` so code assuming MSVC behaves as expected when compiled with Clang.

This reverts commit 098c6df. This reverts commit 8c718a3. This reverts commit 4fb02de.

The transform updates all users of inductions to work based on EVL, instead of the VF directly. At the moment, widened inductions cannot be updated, so bail out if the plan contains any. This patch introduces a check before applying EVL transform. If any recipes in loop rely on RuntimeVF, the plan is discarded.

@src

…fault with that case (#76669)" When the default branch is the last case, we can transform that branch into a concrete branch with an unreachable default branch. ```llvm target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define i64 @src(i64 %0) { %2 = urem i64 %0, 4 switch i64 %2, label %5 [ i64 1, label %3 i64 2, label %3 i64 3, label %4 ] 3: ; preds = %1, %1 br label %5 4: ; preds = %1 br label %5 5: ; preds = %1, %4, %3 %.0 = phi i64 [ 2, %4 ], [ 1, %3 ], [ 0, %1 ] ret i64 %.0 } define i64 @tgt(i64 %0) { %2 = urem i64 %0, 4 switch i64 %2, label %unreachable [ i64 0, label %5 i64 1, label %3 i64 2, label %3 i64 3, label %4 ] unreachable: ; preds = %1 unreachable 3: ; preds = %1, %1 br label %5 4: ; preds = %1 br label %5 5: ; preds = %1, %4, %3 %.0 = phi i64 [ 2, %4 ], [ 1, %3 ], [ 0, %1 ] ret i64 %.0 } ``` Alive2: https://alive2.llvm.org/ce/z/Y-PGXv After transform to a lookup table, I believe `tgt` is better code. The final instructions are as follows: ```asm src: # @src and edi, 3 lea rax, [rdi - 1] cmp rax, 2 ja .LBB0_1 mov rax, qword ptr [8*rdi + .Lswitch.table.src-8] ret .LBB0_1: xor eax, eax ret tgt: # @tgt and edi, 3 mov rax, qword ptr [8*rdi + .Lswitch.table.tgt] ret .Lswitch.table.src: .quad 1 # 0x1 .quad 1 # 0x1 .quad 2 # 0x2 .Lswitch.table.tgt: .quad 0 # 0x0 .quad 1 # 0x1 .quad 1 # 0x1 .quad 2 # 0x2 ``` Godbolt: https://llvm.godbolt.org/z/borME8znd Closes #73446. (cherry picked from commit 7d81e07)

…the Condition is Too Wide (#77831)" llvm/llvm-project#76669 taught SimplifyCFG to handle switches when `default` has only one case. When the `switch`'s condition is wider than 64 bit, the current implementation can calculate the wrong default value. This PR skips cases where the condition is too wide. (cherry picked from commit 39bb790)

This commit adds the /Zc:\_\_STDC\_\_ argument from MSVC, which defines \_\_STDC_\_. This means, alongside stronger feature parity with MSVC, that things that rely on \_\_STDC\_\_, such as autoconf, can work. Link to MSVC documentation of this flag: https://learn.microsoft.com/en-us/cpp/build/reference/zc-stdc?view=msvc-170

This reduces the effort of adding MVT strings every time.

…NFC. (#93363) * allow configuration for the target specific compiler flags. * allow lld linker for all linker outputs: shared, module and exe. * allow configuration of libc++ ABI version. * set MSVC runtime library to MultiThreadedDLL/MultiThreadedDebugDLL on Windows build hosts. * install UCRT libraries on Windows build hosts

The Range argument is not used by createWidenInductionRecipe; induction classification applies across the whole range of VFs. Remove the argument.

Update the folder titles for targets in the monorepository that have not seen taken care of for some time. These are the folders that targets are organized in Visual Studio and XCode (`set_property(TARGET <target> PROPERTY FOLDER "<title>")`) when using the respective CMake's IDE generator. * Ensure that every target is in a folder * Use a folder hierarchy with each LLVM subproject as a top-level folder * Use consistent folder names between subprojects * When using target-creating functions from AddLLVM.cmake, automatically deduce the folder. This reduces the number of `set_property`/`set_target_property`, but are still necessary when `add_custom_target`, `add_executable`, `add_library`, etc. are used. A LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's root CMakeLists.txt.

…P arithmetic. (#92799) This adds VPSExtPromotedInteger and VPZExtPromotedInteger and uses them to promote many arithmetic operations. VPSExtPromotedInteger uses a shift pair because we don't have VP_SIGN_EXTEND_INREG yet.

…erPC) (#93117) The original pull request (llvm/llvm-project#92838) was reverted due to a PowerPC buildbot breakage (llvm/llvm-project@df626dd). This reland limits the scope of the change to non-PowerPC platforms. I am unaware of any PowerPC use cases that would benefit from a larger kNumStackOriginDescrs constant. Original CL description: This increases the constant size of kNumStackOriginDescrs to 4M (64GB of BSS across two arrays), which ought to be enough for anybody. This is the easier alternative suggested by eugenis@ in llvm/llvm-project#92826.

This is an experiment to see if we can prevent some of the compiler OOMs happening without unduly impacting the Windows build latency.

- Reduce disk IO usage by adding cache to an realpath introduced by #81985

Swapped code blocks of parameter and variable, which have been confused (in a clang-tidy doc file)

I think test files for the legacy and the new EH (exnref) are better be separate, and I'd like to use the current test file names for the new EH, rather than keeping the current files and naming the new ones as `-new` or something.

This patch adds Version 3 for development purposes. For now, this patch adds V3 as a copy of V2. For the most part, this patch adds "case Version3:" wherever "case Version2:" appears. One exception is writeMemProfV3, which is copied from writeMemProfV2 but updated to write out memprof::Version3 to the MemProf header. We'll incrementally modify writeMemProfV3 in subsequent patches.

I discovered while working on something else that we were using the location of the directive name as the 'beginloc' which caused some problems in a few places. This patch makes it so our beginloc is the '#' as we originally designed, and then adds a DirectiveLoc concept to a construct for use diagnosing the name.

…n with a sample profile (#93286) Currently if a callsite is hot as determined by the sample profile, it is unconditionally inlined barring invalid cases (such as recursion). Inline cost check should still apply because a function's hotness and its inline cost are two different things. For example if a function is calling another very large function multiple times (at different code paths), the large function should not be inlined even if its hot.

…st JIT support (#84758)

Avoids regression in future commit which starts producing illegal instances.

…st check host JIT support (#84758) fea7399 had removed the unused function that was still there when I tested.

Add extra tests for llvm/llvm-project#93498.

For some reason I was using writeStmtRef when I meant writeStmt, so this corrects that.

…#93574) I plan to add other combines on TRUNCATE_VECTOR_VL.

This patch adds bind c names to functions and subroutines in cudadevice so they can be lowered and not hit the intrinsic procedure TODOs.

…obal addresses. (#93352) This allows register allocation to rematerialize these instead of spilling and reloading. We need to make it a single instruction due to limitations in rematerialization. This pseudo is expanded to an LUI+ADDI pair between regalloc and post RA scheduling. This improves the dynamic instruction count on 531.deepsjeng_r from spec2017 by 3.2% for the train dataset. 500.perlbench and 502.gcc see a 1% improvement. There are couple regressions, but they are 0.1% or smaller. AArch64 has similar pseudo instructions like MOVaddr

This patch adds hidden visibility to the variable that is used by the single byte counters mode in source-based code coverage.

Signed-off-by: Whitney Tsang <[email protected]>

bolshakov-a and others added 30 commits May 24, 2024 13:04

[test][EntryExitInstrumenter] Update/add tests

3ec57a7

Use UTC. Add test coverage for AIX.

[Flang][OpenMP] Remove the orphan section test (#93343)

57be0d2

Remove this test since it is marked as XFAIL and has some non-deterministic behaviour which causes it to spuriously pass on out-of-tree builds. Capturing this in llvm/llvm-project#93342 to make a proper fix and a test later.

[mlir] expose -debug-only equivalent to C and Python (#93175)

8f21909

These are useful for finer-grain debugging and complement the already exposed global debug flag.

[nfc][InstCombine]Find PHI incoming block by operand number (#93249)

56c5ca8

[RISCV] PseudoMovImm is not a IsSignExtendingOpW instruction.

9a038fc

We only know it expands to a 2 instruction sequence, not necessarily a sign extended sequence. Happened to notice while I was looking at naming for the proposed rematerializable LUI+ADDI for addresses.

[BOLT][NFC] Add isRIPRel and isIndexed helpers (#91661)

4658803

Move out common X86MemOperand checks into helper lambdas. To be reused in #91667. Test Plan: NFC

[mlir][arith] Clean up select op implementation (#93351)

8e3be5c

Inline traits used by `arith.select` only into `ArithOps.td`. Trim trailing whitespace in op description.

[BOLT][NFC] Define getExprValue helper (#91663)

f239490

Move out common code extracting the address of a MCExpr. To be reused in #91667. Test Plan: NFC

[mlir][drr] Allow variadic in rewrite side (#93340)

c26847d

Enables writing patterns where one has op creation with variadic in result pattern more easily. Signed-off-by: Jacques Pienaar <[email protected]>

[BOLT][NFCI] Fix return type of BC::getSignedValueAtAddress (#91664)

c460e45

[RISCV] Fix spelling error in test names. NFC

b13f799

Revert "[OpenMP][OMPX] Add shfl_down_sync (#93311)"

9b31cc7

This reverts commit 098c6df. This reverts commit 8c718a3. This reverts commit 4fb02de.

[analyzer] Allow recursive functions to be trivial. (#91876)

1c90de5

[llvm] Include the GenVT.inc to getEnumName (#93198)

85cf2e5

This reduces the effort of adding MVT strings every time.

[flang] [lldb] [llvm] Fix 'destory' comment typos [NFC] (#93260)

25f4ead

[VPlan] Remove unused Range arg from createWidenInductionRecipe (NFC).

8364659

The Range argument is not used by createWidenInductionRecipe; induction classification applies across the whole range of VFs. Remove the argument.

topperc and others added 21 commits May 28, 2024 12:49

[ci] limit parallel windows compile jobs to 24 (#93329)

d9dec10

This is an experiment to see if we can prevent some of the compiler OOMs happening without unduly impacting the Windows build latency.

[clang-tidy] Optimize realpath in readability-identifier-naming (#92659)

c96860a

- Reduce disk IO usage by adding cache to an realpath introduced by #81985

[clang-tidy][NFC] Update identifier-length.rst (#93467)

0aacef3

Swapped code blocks of parameter and variable, which have been confused (in a clang-tidy doc file)

[WebAssembly] Rename old EH tests to *-legacy (#93585)

c108c1e

I think test files for the legacy and the new EH (exnref) are better be separate, and I'd like to use the current test file names for the new EH, rather than keeping the current files and naming the new ones as `-new` or something.

[clang-repl] Even more tests create the Interpreter and must check ho…

6a47315

…st JIT support (#84758)

DAG: Handle vector splitting for fminnum_ieee/fmaxnum_ieee

98fa0f6

Avoids regression in future commit which starts producing illegal instances.

[Clang][NFC] remove CHAR_PUNCT duplication introduced by #93216 (#93605)

bbca20f

Fix build: [clang-repl] Even more tests create the Interpreter and mu…

df542e1

…st check host JIT support (#84758) fea7399 had removed the unused function that was still there when I tested.

[SCEV] Add tests for symbolic max BTC requiring predicates.

ed4227a

Add extra tests for llvm/llvm-project#93498.

[OpenACC] Correct serialization of certain clause sub-expressions

e3f74d4

For some reason I was using writeStmtRef when I meant writeStmt, so this corrects that.

[RISCV] Move TRUNCATE_VECTOR_VL combine into a helper function. NFC (…

060b302

…#93574) I plan to add other combines on TRUNCATE_VECTOR_VL.

[flang][cuda] Add bind c to cudadevice procedures (#92822)

00bd2fa

This patch adds bind c names to functions and subroutines in cudadevice so they can be lowered and not hit the intrinsic procedure TODOs.

[CodeGen] Hidden visibility for prof version var (#93496)

765206e

This patch adds hidden visibility to the variable that is used by the single byte counters mode in source-based code coverage.

Merge commit '765206e050453018e861637a08a4520f29238074'

84acabc

[GEN] Update libGenISAIntrinsics

58f1505

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang added the genx Pull requests or issues for genx branch label Jun 4, 2024

whitneywhtsang requested a review from a team June 4, 2024 04:29

whitneywhtsang self-assigned this Jun 4, 2024

whitneywhtsang requested a review from rengolin as a code owner June 4, 2024 04:29

sommerlukas approved these changes Jun 4, 2024

View reviewed changes

whitneywhtsang merged commit 58f1505 into intel:genx Jun 4, 2024
5 checks passed

whitneywhtsang deleted the merge branch June 4, 2024 14:47

whitneywhtsang mentioned this pull request Jun 8, 2024

Merge OpenAI Triton till June 7th intel/intel-xpu-backend-for-triton#1198

Closed

whitneywhtsang restored the merge branch June 16, 2024 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GEN] Update GENX branch to LLVM `765206e` #14025

[GEN] Update GENX branch to LLVM `765206e` #14025

Uh oh!

whitneywhtsang commented Jun 4, 2024

Uh oh!

Uh oh!

Uh oh!

[GEN] Update GENX branch to LLVM 765206e #14025

[GEN] Update GENX branch to LLVM 765206e #14025

Uh oh!

Conversation

whitneywhtsang commented Jun 4, 2024

Uh oh!

Uh oh!

Uh oh!

[GEN] Update GENX branch to LLVM `765206e` #14025

[GEN] Update GENX branch to LLVM `765206e` #14025