Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes)(Needs Torch Bump) #451

Open
wants to merge 863 commits into
base: feature/fused-ops
Choose a base branch
from

Conversation

jorickert
Copy link

@jorickert jorickert commented Jan 14, 2025

andfau-amd and others added 30 commits October 21, 2024 06:55
…vm#111575)

Adds a new mlir-opt test-only pass, -test-spirv-cpu-runner-pipeline,
which runs the set of MLIR passes needed for the mlir-spirv-cpu-runner,
and removes them from the runner. The tests are changed to invoke
mlir-opt with this flag before running the runner. The eventual goal is
to move all host/device code generation steps out of the runner, like
with some of the other runners.
6bac414 added this opcode with the wrong
number of operands. It didn't fail on check-llvm for me or on pre-commit CI,
but once committed we got buildbot failures. This patch fixes the definition
of the instruction and fixes the failing test.
With the truncssat nodes these are relatively simple tablegen patterns to add.
The existing intrinsics are converted to shift+truncsat to they can lower using
the new patterns.

Fixes llvm#112925.
This patch adds assembly/disassembly for the following instructions:
   ldfadd{a,al,l,}, ldbfadd{a,al,l,}
   ldfmax{a,al,l,}, ldbfmax{a,al,l,}
   ldfmaxnm{a,al,l,}, ldbfmaxnm{a,al,l,}
   ldfmin{a,al,l,}, ldbfmin{a,al,l,}
   ldfminnm{a,al,l,} ldbfminnm{a,al,l,}
   stfadd{l,}, stbfadd{l,}
   stfmax{l,}, stbfmax{l,}
   stfmaxnm{l,}, stbfmaxnm{l,}
   stfmin{l,}, stbfmin{l,}
   stfminnm{l,}, stbfminnm{l,}

According to [1]

[1]https://developer.arm.com/documentation/ddi0602

Co-authored-by: Spencer Abson
[[email protected]](mailto:[email protected])
Co-authored-by: Caroline Concatto
[[email protected]](mailto:[email protected])
Add new register classes/operands and their encoder/decoder behaviour
required for the new Armv9.6 instructions (see
https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension).

This work is the basis ofthe 2024 Armv9.6 architecture update effort for
SME.

Co-authored-by: Caroline Concatto [email protected]
Co-authored-by: Marian Lukac [email protected]
Co-authored-by: Momchil Velikov [email protected]
Improve the codegen for uaddo node for i64 in 64-bit mode and i32 in
32-bit mode by custom lowering.
Test added by commit 47a6da2 fails on the AIX bot. So XFAIL for now to investigate further.
Previously we were attempting to remove the memprof-related metadata
when iterating through instructions in the LTO backend. However, we
missed some as there are a number of cases where we skip instructions,
or even entire functions. Simplify the cleanup and ensure all is removed
by doing a full sweep over all instructions after completing cloning.

This is largely NFC except with -memprof-report-hinted-sizes enabled,
because we were propagating and simplifying the metadata after inlining
in the LTO backend, which caused some stray messages as metadata was
re-converted to attributes.
  movzx  r11d,BYTE PTR [rdx]

is four bytes long.

Follow-up to llvm#111638
Remove the unused functions and register classes from the change below
llvm@4679583
Root gather/buildvector node should be ignored when SLP vectorizer tries
to find matching gather nodes, vectorized earlier. This node is
definitely the last one in the pipeline and it does not have users. It
may cause the compiler crash

Fixes llvm#113143
This patch adds a LegalityResultWithReason class for describing the
reason why legality decided not to vectorize the code.
…llvm#109850)

Special case small int constant in the PPC custom lowering of
scalar_to_vector.
This is one of the many PRs to fix errors with LLVM_ENABLE_WERROR=on.
Built by GCC 11.

Fix warning

In destructor ‘llvm::APInt::~APInt()’,
inlined from ‘llvm::APInt::~APInt()’ at
llvm-project/llvm/include/llvm/ADT/APInt.h:190:3,
inlined from ‘llvm::APSInt::~APSInt()’ at
llvm-project/llvm/include/llvm/ADT/APSInt.h:23:21,
inlined from ‘bool
checkOMPArraySectionConstantForReduction(clang::ASTContext&, const
clang::ArraySectionExpr*, bool&, llvm::SmallVectorImpl<llvm::APSInt>&)’
at llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18357:45,
inlined from ‘bool actOnOMPReductionKindClause(clang::Sema&,
{anonymous}::DSAStackTy*, clang::OpenMPClauseKind,
llvm::ArrayRef<clang::Expr*>, clang::SourceLocation,
clang::SourceLocation, clang::SourceLocation, clang::SourceLocation,
clang::CXXScopeSpec&, const clang::DeclarationNameInfo&,
llvm::ArrayRef<clang::Expr*>, {anonymous}::ReductionData&)’ at
llvm-project/clang/lib/Sema/SemaOpenMP.cpp:18715:68:
llvm-project/llvm/include/llvm/ADT/APInt.h:192:18: error: ‘void operator
delete [](void*)’ called on a pointer to an unallocated object ‘1’
[-Werror=free-nonheap-object]
  192 |       delete[] U.pVal;
      |                  ^~~~
This patch fixes:

  llvm/include/llvm/Transforms/Vectorize/SandboxVectorizer/Legality.h:85:16:
  error: private field 'Reason' is not used
  [-Werror,-Wunused-private-field]
…2765)

MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for VGPR limited
kernels. A kernel becomes VGPR limited if a total number of VGPRs per
SIMD / number of used VGPRs is more than a number of wave slots.
…#112990)

Renames LegalizeData to LegalizeDataValues since this pass fixes up SSA
values. LegalizeData suggested that it fixed data mapping.

This change also adds support to fix up ssa values for data clause
operations. Effectively, compute regions within a data region use the
ssa values from data operations also. The ssa values within data regions
but not within compute regions are not updated.

This change is to support the requirement in the OpenACC spec which
notes that a visible data clause is not just one on the current compute
construct but on the lexically containing data construct or visible
declare directive.
Based on this RFC:
https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306

LLVM intrinsics do not support out params. To get around this limitation
implementers will make intrinsics return structs to capture a return
type and an out param. This implementation detail should not impact
scalarization since these cases should be elementwise operations.

## Three changes are needed. 
- The CallInst visitor needs to be updated to handle Structs
- A new visitor is needed for `ExtractValue` instructions
- finsh needs to be update to handle structs so that insert elements are
properly propogated.

## Testing changes
- Add support for `llvm.frexp`
- Add support for `llvm.dx.splitdouble`

fixes llvm#111437
…vm#112997)

The x86-fold-tables.td has been failing for me and [in
CI](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4)
if Git happens to decide to check out the baseline file with Windows
line endings.

This fix for this is to add the `--strip-trailing-cr` option to diff to
normalize the line endings before comparing them.
…lvm#112995)

llvm#98060 introduced a warning for unterminated string constants, however
it was only checking for `\n` which means that it produced strange
results on Windows (always blaming column 1) including having the
[associated test
fail](https://buildkite.com/llvm-project/github-pull-requests/builds/111277#0192a122-c5c9-4e4e-bc5b-7532fec99ae4)
if Git happened to use Windows newlines when creating the file.

This fix for this is to detect both `\r` and `\n`, but don't double-warn
for Windows newlines.
weliveindetail and others added 11 commits October 23, 2024 14:32
llvm#111531)

Bot maintainers should be aware and it became too much of a burden
for developers. In particular on Windows, where make.exe won't be
found in Path typically.
…2867)

The Intel C++ Compiler (ICX) passes linker flags through the driver
unlike MSVC and clang-cl, and therefore needs them to be prefixed with
`/Qoption,link` (the equivalent of -Wl, for gcc on *nix).

Use the `LINKER:` prefix for the `/EXPORT:` options in clang-repl, this
expands to the correct flag for ICX and nothing for MSVC / clang-cl.

RFC:
https://discourse.llvm.org/t/rfc-cmake-linker-flags-need-wl-equivalent-for-intel-c-icx-on-windows/82446
These two veclibs are only available for AArch64 targets, and as
mentioned in https://discourse.llvm.org/t/rfc-should-fveclib-imply-fno-math-errno-for-all-targets/81384,
we (Arm) think that `-fveclib` should imply `-fno-math-errno`. By
setting `-fveclib` the user shows they intend to use the vector math
functions, which implies they don't care about errno. However,
currently, the vector mappings won't be used in many cases without
setting `-fno-math-errno` separately.

Making this change would also help resolve some inconsistencies in how
vector mappings are applied (see llvm#108980 (comment)).

Note: Both SLEEF and ArmPL state that they do not set `errno`:

- https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions
  * "The vector functions in libamath which are available on Linux may not set errno nor raise exceptions"
- https://sleef.org/2-references/libm/
  *  "These functions do not set errno nor raise an exception."
…llvm#113167)

Define `OmpIteratorSpecifier` and `OmpIteratorModifier` parser classes,
and add parsing for them. Those are reusable between any clauses that
use iterator modifiers.

Add support for iterator modifiers to the MAP clause up to lowering,
where a TODO message is emitted.
…at container-inserter does (llvm#113103)

This patch implements LWG4016: container-insertable checks do not match
what container-inserter does.
When compiling for an SVE target we can use INDEX to generate constant
fixed-length step vectors, e.g.:
```
uint32x4_t foo() {
  return (uint32x4_t){0, 1, 2, 3};
}
```
Currently:
```
foo():
        adrp    x8, .LCPI1_0
        ldr     q0, [x8, :lo12:.LCPI1_0]
        ret
```
With INDEX:
```
foo():
        index   z0.s, #0, #1
        ret
```

The logic for this was already in `LowerBUILD_VECTOR`, though it was
hidden under a check for `!Subtarget->isNeonAvailable()`. This patch
refactors this to enable the corresponding code path unconditionally for
constant step vectors (as long as we can use SVE for them).
…sult type (llvm#113031)

This commit simplifies the result type of materialization functions.

Previously: `std::optional<Value>`
Now: `Value`

The previous implementation allowed 3 possible return values:
- Non-null value: The materialization function produced a valid
materialization.
- `std::nullopt`: The materialization function failed, but another
materialization can be attempted.
- `Value()`: The materialization failed and so should the dialect
conversion. (Previously: Dialect conversion can roll back.)

This commit removes the last variant. It is not particularly useful
because the dialect conversion will fail anyway if all other
materialization functions produced `std::nullopt`.

Furthermore, in contrast to type conversions, at least one
materialization callback is expected to succeed. In case of a failing
type conversion, the current dialect conversion can roll back and try a
different pattern. This also used to be the case for materializations,
but that functionality was removed with llvm#107109: failed materializations
can no longer trigger a rollback. (They can just make the entire dialect
conversion fail without rollback.) With this in mind, it is even less
useful to have an additional error state for materialization functions.

This commit is in preparation of merging the 1:1 and 1:N type
converters. Target materializations will have to return multiple values
instead of a single one. With this commit, we can keep the API simple:
`SmallVector<Value>` instead of `std::optional<SmallVector<Value>>`.

Note for LLVM integration: All 1:1 materializations should return
`Value` instead of `std::optional<Value>`. Instead of `std::nullopt`
return `Value()`.
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump) Jan 14, 2025
Base automatically changed from bump_to_f035d9f0 to feature/fused-ops January 23, 2025 14:10
@mgehre-amd
Copy link
Collaborator

Does a corresponding PR exist on Xilinx/onnx-mlir? If yes, could you please link them?

@jorickert
Copy link
Author

Xilinx/onnx-mlir#272 I will update it today

@jorickert
Copy link
Author

I update the onnx-bump, but we will need to include #455 in the same bump too

[AutoBump] Merge with a8d506b (Oct 22) (15)
[AutoBump] Merge with 8a9921f (Oct 23) (17)
[AutoBump] Merge with fixes of 519eef3 (Oct 22) (16) (Needs downstream changes)
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump) [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) Jan 24, 2025
[AutoBump] Merge with fixes of f18c3e4 (Oct 23) (18) (Needs ONNX Bump)(Needs Torch Bump)
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes) [AutoBump] Merge with fixes of 0a17bdfc (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes)(Needs Torch Bump) Jan 27, 2025
@mgehre-amd mgehre-amd self-requested a review January 28, 2025 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment