-
Notifications
You must be signed in to change notification settings - Fork 156
Open
Labels
good first issueGood for newcomersGood for newcomersoptimizationOptimizations that can be implemented at the ClangIR levelOptimizations that can be implemented at the ClangIR level
Description
CodeGen replaces trivial copy constructors and assignment operators with memcpy
. ClangIR intentionally doesn't do so when generating CIR, so that all those function calls are available for analysis: https://godbolt.org/z/xcKdzKaWM. We'd ideally switch to the memcpy at some point before generating LLVM though, to generate better code. One potentially idea would be to tag either the functions themselves or the call sites during CIR generation, and then have a later pass (e.g. LoweringPrepare) do the memcpy conversion. Search for isMemcpyEquivalentSpecialMember
under clang/lib/CodeGen for examples of how it handles this.
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomersoptimizationOptimizations that can be implemented at the ClangIR levelOptimizations that can be implemented at the ClangIR level
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
bcardosolopes commentedon Dec 6, 2024
Btw, we have a missing feature for
isMemcpyEquivalentSpecialMember
as well, not sure if it's in all places it should thoughshubhe25p commentedon Dec 12, 2024
Hey Bruno and Shoib, I am new to MLIR/ClangIR space and would love to contribute, I am not sure if I can solve it but I would like to try, can someone give more details on what exactly I need to do or resources that I can look into? Thank you!
smeenai commentedon Dec 13, 2024
Welcome aboard!
The ClangIR website has good resources around getting started, building and testing, etc. For this task, we're concerned with two parts of ClangIR:
Clang's CodeGen directly emits
memcpy
instead of calls to trivial copy constructors and assignment operators. CIRGen intentionally diverges from this behavior and emits actual function calls, but we'd also like to replace those calls with amemcpy
at some later stage in the pipeline.You can search for
isMemcpyEquivalentSpecialMember
under clang/lib/CodeGen to see the places where it producesmemcpy
instead of function calls. You'll also be able to find the corresponding CIRGen files and functions underclang/lib/CIR/CodeGen
and see how its behavior diverges.For introducing the actual
memcpy
operations, there's multiple possible approaches. One idea would be to tag the function calls generated by CIRGen with some attribute likememcpy_equivalent
, and then have a later pass like LoweringPrepare replace those calls withmemcpy
. You'll get a better sense of the potential designs and their trade-offs as you start working on this, and of course we're available to answer questions and give suggestions.The ClangIR Discord channel is another good place to get advice or just say hi (you'll need to join the LLVM Discord server first). Keep in mind though that winter holidays are approaching, so Bruno and I will both be only sporadically available from now till early January.
shubhe25p commentedon Dec 13, 2024
Hi Shoib, thank you so much for the detailed response, I am setting up local environment and will continue to explore the code, I am already on LLVM discord and also please enjoy your holidays, I will try to understand the task better on my own before asking for help.
smeenai commentedon Dec 13, 2024
Sounds good! A few other tips which might be useful:
-Xclang -ast-dump
to clang to dump the AST. That will be the input to both CodeGen and CIRGen, so it's a useful reference. https://godbolt.org/z/G1Mr13366 is an example; note how the implicitly-generated constructors and assignment operator are markedtrivial
. Each AST node has a corresponding C++ class, e.g. https://clang.llvm.org/doxygen/classclang_1_1CXXMethodDecl.html.shubhe25p commentedon Dec 24, 2024
Thank you! I will take this incrementally and work on it over the holidays. Apologies for the delay in my response, as I am currently wrapping up my internship. I have ClangIR installed on my remote machines and was able to generate the AST, CIR, and other components. I'm currently reviewing the Codegen code to understand how the AST is passed to the lower layers.
Additionally, I was wondering how frequently ClangIR is merged with the main LLVM project. I tried to build it first and encountered some errors.
bcardosolopes commentedon Jan 7, 2025
ClangIR is being incrementally being upstreamed to llvm-project, all development should be done in the incubator until we reach the point to move over (which is probably at least 6 months away).
Arthur-Chang016 commentedon May 15, 2025
Hi @smeenai @bcardosolopes Nice to meet you!
This issue looks interesting and seems to be unimplemented so far. I’ve drafted an initial version in #1616
Current able to shrink the copy constructor
Trivial b(a);
to prevent the big generated copy function
Instead of emitting
memcpy
, I chose to emitcir.copy
, for a few reasons:cir.copy
looks like a higher level version of memcpy and can let the later lowering prepare to decide which version of memcpy to generate or even other optimization like memset (just guess).Builder.createCopy()
implemented. Nocir.memcpy_inline
norcir.libc.memcpy
(though I’d be happy to implement them if needed!)If there’s a strong reason to prefer generating memcpy, please let me know — I’m happy to change the approach!
Also, assuming cir.copy is acceptable, I had a question:
Builder.createCopy()
directly, or go throughAggExprEmitter::emitCopy()
instead? The latter seems to perform more checks and may be safer.bcardosolopes commentedon May 16, 2025
@Arthur-Chang016 thanks for your interest. You are right, we prefer using
cir.copy
and only later lowering tomemcpy
.Our rule of thumb is to follow classic CodeGen code as much as possible, if it does call into
emitAggregateCopy
you should do the same (note that it also ultimately calls intocreateCopy()
).