Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

goalc: Some more macOS ARM64 work #3290

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

xTVaser
Copy link
Member

@xTVaser xTVaser commented Jan 3, 2024

Not a whole lot in this PR, but I want to get this off my laptop where it's been sitting for a few months.

Started to split apart the implementation for different architectures and get CodeTester tests passing. At one point I proved to myself that I got it working (it being a single simple instruction) but then things fell apart when i began the grueling refactor.

@xTVaser xTVaser changed the title goalc: Some more MacOS ARM64 work goalc: Some more macOS ARM64 work Jan 4, 2024
@xTVaser xTVaser marked this pull request as ready for review January 6, 2024 22:19
@xTVaser xTVaser requested a review from water111 January 9, 2024 21:51
Copy link
Collaborator

@water111 water111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this is a good start.

My main concern at this point is that ARM is actually quite a bit different from x86, so it will be hard to fit the ARM instructions into the existing pattern. I also think that using ifdefs to change between arm/x86 code generation will make it harder to maintain.

I'm looking through the compiler code now, and I'm wondering if we should just do the division at the codegen_object_file level - just totally new code for x86 and ARM at that point. We'll need a new linking format, different logic for picking instructions, different function prologues/epilogues, etc. With this approach, we'd probably want each IR to support an do_codegen_x86 and do_codegen_arm version, which would give a x86Instruction on x86 or ArmInstruction on ARM. I haven't thought this through fully yet.

if (saved_reg.is_xmm()) {
if (saved_reg.is_128bit_simd()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unrelated to your changes here) I think that we will probably want an entirely separate do_goal_function for x86 vs. ARM. I think the way that the stack pointer is manipulated will be different, and this currently has some x86-specific things like sub_gpr64_imm8s which is using an x86-only smaller size instruction because an immediate is only 8-bits.

if (saved_reg.is_xmm()) {
if (saved_reg.is_128bit_simd()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to keep these as is_xmm. I think having both an is_xmm (true only for x86 XMM) and an is_128bit_simd (true for both arm/x86 vector regs) is good, rather than trying to have a abstraction over "general purpose" and "128bit" registers at this level.

ARM will probably have different code for loading/storing register to the stack. It has instructions for storing multiple registers at once, rather than a sequence of push instructions like we used on x86.

@@ -862,7 +862,8 @@ RegAllocInstr IR_ConditionalBranch::to_rai() {
void IR_ConditionalBranch::do_codegen(emitter::ObjectGenerator* gen,
const AllocationResult& allocs,
emitter::IR_Record irec) {
Instruction jump_instr(0);
#ifndef __aarch64__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep the ability for the compiler to generate either ARM or x86 when built on either ARM or x86.

In C++, the code inside the #else here won't be checked by the compiler at all on x86, which will make it harder for developers on x86 to know if their changes broke the build on ARM.

if (cc.return_reg && cc.return_reg->is_xmm()) {
if (cc.return_reg && cc.return_reg->is_128bit_simd()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think this is a good place to use is_128bit_simd over is_xmm

InstructionInfo(const emitter::Instruction& _instruction, Kind _kind, int _ir_idx)
InstructionInfo(const emitter::Instruction _instruction, Kind _kind, int _ir_idx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I've read above, I think that you have classes which inherit from Instruction, and you'll pass them here. This will cause the copy constructor of Instruction (the base class) to be called, which will cause slicing (see https://stackoverflow.com/questions/274626/what-is-object-slicing#274636). This ends up being undefined behavior and probably means that any virtual methods of Instruction that you call will crash or silently do the wrong thing.

The way around this is usually to pass a std::unique_ptr<Instruction> or std::shared_ptr<Instruction>... but it might be worth checking what we actually need an instruction for here. As far as I see, we just need the length, rather than the whole instruction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(oops - I think I'm wrong since Instruction is just InstructionX86 or InstructionARM)
In that case, can we keep this as a const ref?

@@ -67,23 +67,37 @@ void CodeTester::emit_return() {
* Pops RSP always, which is weird, but doesn't cause issues.
*/
void CodeTester::emit_pop_all_gprs(bool exclude_rax) {
#ifndef __aarch64__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better not to have ifdef ARM and instead have some runtime variable. Otherwise it'll be really hard to make sure goalc actually builds on x86 and ARM.

Comment on lines +136 to +141
#if defined(__APPLE__) && defined(__aarch64__)
mprotect(code_buffer, code_buffer_capacity, PROT_EXEC | PROT_READ);
auto ret = ((u64(*)())code_buffer)();
mprotect(code_buffer, code_buffer_capacity, PROT_WRITE | PROT_READ);
return ret;
#else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is a good place to use the ifdef - it's relatively simple, and the mprotect doesn't exist on windows.

/*!
* Move data from src to dst. Moves all 64-bits of the GPR.
*/
extern Instruction mov_gpr64_gpr64(Register dst, Register src);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need extern here.

ASSERT(src.is_gpr());
InstructionX86 instr(0x89);
instr.set_modrm_and_rex(src.hw_id(), dst.hw_id(), 3, true);
return instr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an InstructionX86 by value like this will also cause slicing


Instruction movd_gpr32_xmm32(Register dst, Register src) {
ASSERT(dst.is_gpr());
ASSERT(src.is_128bit_simd());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case when we should keep is_xmm I think. In this case, we're only moving the lower 32 bits between an XMM and GPR, so the xmm is being used as a 32-bit floating point register. I tried to keep the concept of "register type" and "register size" separate, since there's a lot of cases where you only use part of a register (x86 GPRs are 64-bits, but for some goal ops we treat them as 32-bits too).

@xTVaser xTVaser marked this pull request as draft February 17, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants