Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CodeGen] Support for _mm_prefetch . #1522

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

shrikardongre
Copy link
Contributor

This draft PR needs review
I have to confirm if the implementation in the clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp is correct.
also to add a proper FileCheck instruction in the clang/test/CIR/CodeGen/X86/builtins-x86.c .

shrikardongre and others added 2 commits March 25, 2025 01:26

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@@ -95,7 +95,16 @@ mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned BuiltinID,
default:
return nullptr;
case X86::BI_mm_prefetch: {
llvm_unreachable("_mm_prefetch NYI");
mlir::Location loc = mlir ::UnknownLoc::get(&getMLIRContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location is known here: you can use getLoc(E->getExprLoc()) to retrieve it.

auto C = dyn_cast<cir::IntAttr>(
cast<cir::ConstantOp>(Ops[1].getDefiningOp()).getValue());
auto Loc = mlir::IntegerAttr::get(
mlir::IntegerType::get(builder.getContext(), 64), C.getUInt());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably forgot an & 0x3 after C.getUInt(), and the integer type should be 32-bit rather than 64.
See OG:

Value *Locality = ConstantInt::get(Int32Ty, C->getZExtValue() & 0x3);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes , I was just trying something and didn't remove it from the commit. Thanks.

mlir::UnitAttr Write =
ReadWrite ? mlir::UnitAttr::get(&getMLIRContext()) : nullptr;
return builder.create<cir::PrefetchOp>(loc, Ops[0], Loc, Write)
->getResult(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cir::PrefetchOp expects a void*, but Ops[0] may be some other pointer type or an array type. You need to add builder.createBitcast(...) to convert between them.

Moreover, the PrefetchOp doesn't have a result, so ->getResult(0) will raise an error at runtime. But we can't return a nullptr or {} either, because the emitBuiltinExpr function will think these return values mean "unimplemented". Not sure what to return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a cir::CastOp will implement using that 👍🏻

Copy link
Contributor Author

@shrikardongre shrikardongre Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdUhTkJm @bcardosolopes I am finding it a little hard to implement the CastOp by just looking at the tablegen definition and the implementation given at https://llvm.github.io/clangir/Dialect/ops.html#circast-circastop

def CastOp : CIR_Op<"cast",
             [Pure,
              DeclareOpInterfaceMethods<PromotableOpInterface>]> {
operation ::= `cir.cast` `(` $kind `,` $src `:` type($src) `)`
              `,` type($result) attr-dict

I am not able to find any implementation can you suggest some source ??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I had a doubt , the Ops[0] doesnt seem to be a pointer type

llvm::SmallVector<mlir::Value, 4U> Ops

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at CIRGenBuilder.h and CIRGenBaseBuilder.h to find some useful functions. For example, to create a bitcast, obtain a builder and write builder.createBitcast(...).

The website doesn't give a full implementation of ops. Try looking at CIROps.td. To see what the fields mean, Operation Definition Specification on MLIR website will be helpful. You might also refer to TableGen programmer's reference for TableGen syntax.

Ops[0] is an mlir::Value; itself isn't a type, but it has some type (you can get it with getType()). In PrefetchOp the value is required to have a void pointer type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ,will look into it 👍

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@shrikardongre
Copy link
Contributor Author

 auto CasTtoVoid = builder.create<cir::CastOp>(
        loc, voidPtrType, cir::CastKind::bitcast, Ops[0]);

this is what I was able to write by taking the reference of the

 mlir::Value createFloatingCast(mlir::Value v, mlir::Type destType) {
   if (getIsFPConstrained())
     llvm_unreachable("constrainedfp NYI");

   return create<cir::CastOp>(v.getLoc(), destType, cir::CastKind::floating,
                              v);
 }

in clang/lib/CIR/CodeGen/CIRGenBuilder.h
Now the only problem to solve is what should we return right ?

@shrikardongre
Copy link
Contributor Author

shrikardongre commented Mar 27, 2025

the Current implementation is

case X86::BI_mm_prefetch: {
    mlir::Location loc = getLoc(E->getExprLoc());
    auto C = dyn_cast<cir::IntAttr>(
        cast<cir::ConstantOp>(Ops[1].getDefiningOp()).getValue());
    auto Locality = mlir::IntegerAttr::get(
        mlir::IntegerType::get(builder.getContext(), 32), C.getUInt() & 0x3);
    bool ReadWrite = ((C.getUInt() >> 2) & 0x1) == 1;
    mlir::UnitAttr isWrite =
        ReadWrite ? mlir::UnitAttr::get(&getMLIRContext()) : nullptr;
    mlir::Type voidPtrType =
        cir::PointerType::get(cir::VoidType::get(&getMLIRContext()));
    auto CastToVoid = builder.create<cir::CastOp>(
        loc, voidPtrType, cir::CastKind::bitcast, Ops[0]);
    builder.create<cir::PrefetchOp>(loc, CastToVoid, Locality,
                                    isWrite) ;
  }

this is without a return type using a PrefetchOp now , and finally this was the first time i did not get a stack dump :)
I was able to generate the CIR that is :

!s32i = !cir.int<s, 32>
!s8i = !cir.int<s, 8>
!void = !cir.void
#fn_attr = #cir<extra({inline = #cir.inline<no>, nothrow = #cir.nothrow, optnone = #cir.optnone})>
module @"/home/shrikardongre/clangir/clang/test/CIR/CodeGen/X86/new.cc" attributes {cir.lang = #cir.lang<cxx>, cir.sob = #cir.signed_overflow_behavior<undefined>, cir.triple = "x86_64-unknown-linux-gnu", cir.type_size_info = #cir.type_size_info<char = 8, int = 32, size_t = 64>, dlti.dl_spec = #dlti.dl_spec<!llvm.ptr = dense<64> : vector<4xi64>, i1 = dense<8> : vector<2xi64>, i8 = dense<8> : vector<2xi64>, !llvm.ptr<271> = dense<32> : vector<4xi64>, !llvm.ptr<270> = dense<32> : vector<4xi64>, f128 = dense<128> : vector<2xi64>, f80 = dense<128> : vector<2xi64>, i64 = dense<64> : vector<2xi64>, i32 = dense<32> : vector<2xi64>, !llvm.ptr<272> = dense<64> : vector<4xi64>, i16 = dense<16> : vector<2xi64>, i128 = dense<128> : vector<2xi64>, f16 = dense<16> : vector<2xi64>, f64 = dense<64> : vector<2xi64>, "dlti.endianness" = "little", "dlti.stack_alignment" = 128 : i64>} {
  cir.func @main() -> !s32i extra(#fn_attr) {
    %0 = cir.alloca !s32i, !cir.ptr<!s32i>, ["__retval"] {alignment = 4 : i64} loc(#loc2)
    %1 = cir.alloca !cir.ptr<!s8i>, !cir.ptr<!cir.ptr<!s8i>>, ["p"] {alignment = 8 : i64} loc(#loc9)
    %2 = cir.load %1 : !cir.ptr<!cir.ptr<!s8i>>, !cir.ptr<!s8i> loc(#loc5)
    %3 = cir.const #cir.int<0> : !s32i loc(#loc6)
    %4 = cir.cast(bitcast, %2 : !cir.ptr<!s8i>), !cir.ptr<!void> loc(#loc7)
    cir.prefetch(%4 : !cir.ptr<!void>) locality(0) read loc(#loc7)
    %5 = cir.llvm.intrinsic "x86.sse2.clflush" %2 : (!cir.ptr<!s8i>) -> !void loc(#loc7)
    %6 = cir.load %0 : !cir.ptr<!s32i>, !s32i loc(#loc2)
    cir.return %6 : !s32i loc(#loc2)
  } loc(#loc8)
} loc(#loc)
#loc = loc("/home/shrikardongre/clangir/clang/test/CIR/CodeGen/X86/new.cc":0:0)
#loc1 = loc("new.cc":1:1)
#loc2 = loc("new.cc":4:1)
#loc3 = loc("new.cc":2:5)
#loc4 = loc("new.cc":2:17)
#loc5 = loc("new.cc":3:18)
#loc6 = loc("new.cc":3:20)
#loc7 = loc("new.cc":3:5)
#loc8 = loc(fused[#loc1, #loc2])
#loc9 = loc(fused[#loc3, #loc4])

the test file is :

int main(){
    char const *p;
    _mm_prefetch(p,0);
}

But the generated CIR seems to be incorrect :( (if i am not wrong).

@AdUhTkJm
Copy link
Contributor

AdUhTkJm commented Mar 27, 2025

auto CasTtoVoid = builder.createcir::CastOp(
loc, voidPtrType, cir::CastKind::bitcast, Ops[0]);

I believe there is builder.createBitcast that you can directly use. This is also correct, but it might be more readable if we reuse existing facilities.

But the generated CIR seems to be incorrect

Without an return, the switch-case falls through to the next branch for _mm_clflush, and that's why you'll get extra instructions.

We still need to return a mlir::Value here, but the PrefetchOp doesn't have any results so we can't return it; moreover, an empty return will make emitBuiltinExpr think this intrinsic isn't implemented, and output an error diagnostics. Not sure what to do, need to seek advice.

@shrikardongre
Copy link
Contributor Author

shrikardongre commented Mar 27, 2025

Yes there are several occurances of builder.createBitcast in the OG file . I was just more curious into using the CIROps as @bcardosolopes mentioned me to use the PrefetchOp instead of the LLVMIntrinsicCallOp .
my doubt was can we cast a mlir value using it ? Will have to look its implementation once .

Anyways just making the file PR open for review to seek for advice .
Learning a lot from you @AdUhTkJm , Thanks for the help !!

@shrikardongre shrikardongre marked this pull request as ready for review March 27, 2025 11:15
@bcardosolopes
Copy link
Member

Thanks for the review @AdUhTkJm

I was just more curious into using the CIROps as ... mentioned me to use the PrefetchOp instead of the LLVMIntrinsicCallOp

This is orthogonal to the cast feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants