-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reland "[ARM] Stop gluing FP comparisons to FMSTAT" #117248
base: main
Are you sure you want to change the base?
Conversation
[ARM] Stop gluing FP comparisons to FMSTAT (llvm#116676) Following llvm#116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.
@llvm/pr-subscribers-backend-arm Author: Sergei Barannikov (s-barannikov) ChangesFollowing #116547, this changes the result of This change allows comparisons to be CSEd and scheduled around as can be Note that This patch also sets
Patch is 365.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117248.diff 19 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
index aad305cce03961..a1f068f0e049bd 100644
--- a/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
@@ -299,6 +299,8 @@ const TargetRegisterClass *
ARMBaseRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const {
if (RC == &ARM::CCRRegClass)
return &ARM::rGPRRegClass; // Can't copy CCR registers.
+ if (RC == &ARM::cl_FPSCR_NZCVRegClass)
+ return &ARM::rGPRRegClass;
return RC;
}
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 84b37ae6833aed..6b290135c5bcba 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -4971,14 +4971,14 @@ SDValue ARMTargetLowering::getVFPCmp(SDValue LHS, SDValue RHS,
SelectionDAG &DAG, const SDLoc &dl,
bool Signaling) const {
assert(Subtarget->hasFP64() || RHS.getValueType() != MVT::f64);
- SDValue Cmp;
+ SDValue Flags;
if (!isFloatingPointZero(RHS))
- Cmp = DAG.getNode(Signaling ? ARMISD::CMPFPE : ARMISD::CMPFP,
- dl, MVT::Glue, LHS, RHS);
+ Flags = DAG.getNode(Signaling ? ARMISD::CMPFPE : ARMISD::CMPFP, dl, FlagsVT,
+ LHS, RHS);
else
- Cmp = DAG.getNode(Signaling ? ARMISD::CMPFPEw0 : ARMISD::CMPFPw0,
- dl, MVT::Glue, LHS);
- return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Cmp);
+ Flags = DAG.getNode(Signaling ? ARMISD::CMPFPEw0 : ARMISD::CMPFPw0, dl,
+ FlagsVT, LHS);
+ return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Flags);
}
/// duplicateCmp - Glue values can have only one use, so this function
@@ -4991,15 +4991,11 @@ ARMTargetLowering::duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const {
return DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),Cmp.getOperand(1));
assert(Opc == ARMISD::FMSTAT && "unexpected comparison operation");
- Cmp = Cmp.getOperand(0);
- Opc = Cmp.getOpcode();
- if (Opc == ARMISD::CMPFP)
- Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),Cmp.getOperand(1));
- else {
- assert(Opc == ARMISD::CMPFPw0 && "unexpected operand of FMSTAT");
- Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0));
- }
- return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Cmp);
+ SDValue Flags = Cmp.getOperand(0);
+ assert((Flags.getOpcode() == ARMISD::CMPFP ||
+ Flags.getOpcode() == ARMISD::CMPFPw0) &&
+ "unexpected operand of FMSTAT");
+ return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Flags);
}
// This function returns three things: the arithmetic computation itself
diff --git a/llvm/lib/Target/ARM/ARMInstrVFP.td b/llvm/lib/Target/ARM/ARMInstrVFP.td
index 5b49f728ebb8d8..a29753909ea992 100644
--- a/llvm/lib/Target/ARM/ARMInstrVFP.td
+++ b/llvm/lib/Target/ARM/ARMInstrVFP.td
@@ -10,7 +10,17 @@
//
//===----------------------------------------------------------------------===//
-def SDT_CMPFP0 : SDTypeProfile<0, 1, [SDTCisFP<0>]>;
+def SDT_CMPFP : SDTypeProfile<1, 2, [
+ SDTCisVT<0, FlagsVT>, // out flags
+ SDTCisFP<1>, // lhs
+ SDTCisSameAs<2, 1> // rhs
+]>;
+
+def SDT_CMPFP0 : SDTypeProfile<1, 1, [
+ SDTCisVT<0, FlagsVT>, // out flags
+ SDTCisFP<1> // operand
+]>;
+
def SDT_VMOVDRR : SDTypeProfile<1, 2, [SDTCisVT<0, f64>, SDTCisVT<1, i32>,
SDTCisSameAs<1, 2>]>;
def SDT_VMOVRRD : SDTypeProfile<2, 1, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>,
@@ -18,11 +28,18 @@ def SDT_VMOVRRD : SDTypeProfile<2, 1, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>,
def SDT_VMOVSR : SDTypeProfile<1, 1, [SDTCisVT<0, f32>, SDTCisVT<1, i32>]>;
-def arm_fmstat : SDNode<"ARMISD::FMSTAT", SDTNone, [SDNPInGlue, SDNPOutGlue]>;
-def arm_cmpfp : SDNode<"ARMISD::CMPFP", SDT_ARMCmp, [SDNPOutGlue]>;
-def arm_cmpfp0 : SDNode<"ARMISD::CMPFPw0", SDT_CMPFP0, [SDNPOutGlue]>;
-def arm_cmpfpe : SDNode<"ARMISD::CMPFPE", SDT_ARMCmp, [SDNPOutGlue]>;
-def arm_cmpfpe0: SDNode<"ARMISD::CMPFPEw0",SDT_CMPFP0, [SDNPOutGlue]>;
+def arm_cmpfp : SDNode<"ARMISD::CMPFP", SDT_CMPFP>;
+def arm_cmpfp0 : SDNode<"ARMISD::CMPFPw0", SDT_CMPFP0>;
+def arm_cmpfpe : SDNode<"ARMISD::CMPFPE", SDT_CMPFP>;
+def arm_cmpfpe0 : SDNode<"ARMISD::CMPFPEw0", SDT_CMPFP0>;
+
+def arm_fmstat : SDNode<"ARMISD::FMSTAT",
+ SDTypeProfile<0, 1, [
+ SDTCisVT<0, FlagsVT> // in flags
+ ]>,
+ [SDNPOutGlue] // TODO: Change Glue to a normal result.
+>;
+
def arm_fmdrr : SDNode<"ARMISD::VMOVDRR", SDT_VMOVDRR>;
def arm_fmrrd : SDNode<"ARMISD::VMOVRRD", SDT_VMOVRRD>;
def arm_vmovsr : SDNode<"ARMISD::VMOVSR", SDT_VMOVSR>;
@@ -606,12 +623,12 @@ let Defs = [FPSCR_NZCV] in {
def VCMPED : ADuI<0b11101, 0b11, 0b0100, 0b11, 0,
(outs), (ins DPR:$Dd, DPR:$Dm),
IIC_fpCMP64, "vcmpe", ".f64\t$Dd, $Dm", "",
- [(arm_cmpfpe DPR:$Dd, (f64 DPR:$Dm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfpe DPR:$Dd, (f64 DPR:$Dm)))]>;
def VCMPES : ASuI<0b11101, 0b11, 0b0100, 0b11, 0,
(outs), (ins SPR:$Sd, SPR:$Sm),
IIC_fpCMP32, "vcmpe", ".f32\t$Sd, $Sm", "",
- [(arm_cmpfpe SPR:$Sd, SPR:$Sm)]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe SPR:$Sd, SPR:$Sm))]> {
// Some single precision VFP instructions may be executed on both NEON and
// VFP pipelines on A8.
let D = VFPNeonA8Domain;
@@ -620,17 +637,17 @@ def VCMPES : ASuI<0b11101, 0b11, 0b0100, 0b11, 0,
def VCMPEH : AHuI<0b11101, 0b11, 0b0100, 0b11, 0,
(outs), (ins HPR:$Sd, HPR:$Sm),
IIC_fpCMP16, "vcmpe", ".f16\t$Sd, $Sm",
- [(arm_cmpfpe (f16 HPR:$Sd), (f16 HPR:$Sm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfpe (f16 HPR:$Sd), (f16 HPR:$Sm)))]>;
def VCMPD : ADuI<0b11101, 0b11, 0b0100, 0b01, 0,
(outs), (ins DPR:$Dd, DPR:$Dm),
IIC_fpCMP64, "vcmp", ".f64\t$Dd, $Dm", "",
- [(arm_cmpfp DPR:$Dd, (f64 DPR:$Dm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfp DPR:$Dd, (f64 DPR:$Dm)))]>;
def VCMPS : ASuI<0b11101, 0b11, 0b0100, 0b01, 0,
(outs), (ins SPR:$Sd, SPR:$Sm),
IIC_fpCMP32, "vcmp", ".f32\t$Sd, $Sm", "",
- [(arm_cmpfp SPR:$Sd, SPR:$Sm)]> {
+ [(set FPSCR_NZCV, (arm_cmpfp SPR:$Sd, SPR:$Sm))]> {
// Some single precision VFP instructions may be executed on both NEON and
// VFP pipelines on A8.
let D = VFPNeonA8Domain;
@@ -639,7 +656,7 @@ def VCMPS : ASuI<0b11101, 0b11, 0b0100, 0b01, 0,
def VCMPH : AHuI<0b11101, 0b11, 0b0100, 0b01, 0,
(outs), (ins HPR:$Sd, HPR:$Sm),
IIC_fpCMP16, "vcmp", ".f16\t$Sd, $Sm",
- [(arm_cmpfp (f16 HPR:$Sd), (f16 HPR:$Sm))]>;
+ [(set FPSCR_NZCV, (arm_cmpfp (f16 HPR:$Sd), (f16 HPR:$Sm)))]>;
} // Defs = [FPSCR_NZCV]
//===----------------------------------------------------------------------===//
@@ -669,7 +686,7 @@ let Defs = [FPSCR_NZCV] in {
def VCMPEZD : ADuI<0b11101, 0b11, 0b0101, 0b11, 0,
(outs), (ins DPR:$Dd),
IIC_fpCMP64, "vcmpe", ".f64\t$Dd, #0", "",
- [(arm_cmpfpe0 (f64 DPR:$Dd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe0 (f64 DPR:$Dd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -677,7 +694,7 @@ def VCMPEZD : ADuI<0b11101, 0b11, 0b0101, 0b11, 0,
def VCMPEZS : ASuI<0b11101, 0b11, 0b0101, 0b11, 0,
(outs), (ins SPR:$Sd),
IIC_fpCMP32, "vcmpe", ".f32\t$Sd, #0", "",
- [(arm_cmpfpe0 SPR:$Sd)]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe0 SPR:$Sd))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
@@ -689,7 +706,7 @@ def VCMPEZS : ASuI<0b11101, 0b11, 0b0101, 0b11, 0,
def VCMPEZH : AHuI<0b11101, 0b11, 0b0101, 0b11, 0,
(outs), (ins HPR:$Sd),
IIC_fpCMP16, "vcmpe", ".f16\t$Sd, #0",
- [(arm_cmpfpe0 (f16 HPR:$Sd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfpe0 (f16 HPR:$Sd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -697,7 +714,7 @@ def VCMPEZH : AHuI<0b11101, 0b11, 0b0101, 0b11, 0,
def VCMPZD : ADuI<0b11101, 0b11, 0b0101, 0b01, 0,
(outs), (ins DPR:$Dd),
IIC_fpCMP64, "vcmp", ".f64\t$Dd, #0", "",
- [(arm_cmpfp0 (f64 DPR:$Dd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfp0 (f64 DPR:$Dd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -705,7 +722,7 @@ def VCMPZD : ADuI<0b11101, 0b11, 0b0101, 0b01, 0,
def VCMPZS : ASuI<0b11101, 0b11, 0b0101, 0b01, 0,
(outs), (ins SPR:$Sd),
IIC_fpCMP32, "vcmp", ".f32\t$Sd, #0", "",
- [(arm_cmpfp0 SPR:$Sd)]> {
+ [(set FPSCR_NZCV, (arm_cmpfp0 SPR:$Sd))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
@@ -717,7 +734,7 @@ def VCMPZS : ASuI<0b11101, 0b11, 0b0101, 0b01, 0,
def VCMPZH : AHuI<0b11101, 0b11, 0b0101, 0b01, 0,
(outs), (ins HPR:$Sd),
IIC_fpCMP16, "vcmp", ".f16\t$Sd, #0",
- [(arm_cmpfp0 (f16 HPR:$Sd))]> {
+ [(set FPSCR_NZCV, (arm_cmpfp0 (f16 HPR:$Sd)))]> {
let Inst{3-0} = 0b0000;
let Inst{5} = 0;
}
@@ -2492,7 +2509,8 @@ let DecoderMethod = "DecodeForVMRSandVMSR" in {
let Defs = [CPSR], Uses = [FPSCR_NZCV], Predicates = [HasFPRegs],
Rt = 0b1111 /* apsr_nzcv */ in
def FMSTAT : MovFromVFP<0b0001 /* fpscr */, (outs), (ins),
- "vmrs", "\tAPSR_nzcv, fpscr", [(arm_fmstat)]>;
+ "vmrs", "\tAPSR_nzcv, fpscr",
+ [(arm_fmstat FPSCR_NZCV)]>;
// Application level FPSCR -> GPR
let hasSideEffects = 1, Uses = [FPSCR], Predicates = [HasFPRegs] in
diff --git a/llvm/lib/Target/ARM/ARMRegisterInfo.td b/llvm/lib/Target/ARM/ARMRegisterInfo.td
index f37d0fe542b4f7..f5a675e2976bb7 100644
--- a/llvm/lib/Target/ARM/ARMRegisterInfo.td
+++ b/llvm/lib/Target/ARM/ARMRegisterInfo.td
@@ -413,7 +413,9 @@ def VCCR : RegisterClass<"ARM", [i32, v16i1, v8i1, v4i1, v2i1], 32, (add VPR)> {
// FPSCR, when the flags at the top of it are used as the input or
// output to an instruction such as MVE VADC.
-def cl_FPSCR_NZCV : RegisterClass<"ARM", [i32], 32, (add FPSCR_NZCV)>;
+def cl_FPSCR_NZCV : RegisterClass<"ARM", [i32], 32, (add FPSCR_NZCV)> {
+ let CopyCost = -1;
+}
// Scalar single precision floating point register class..
// FIXME: Allocation order changed to s0, s2, ... or s0, s4, ... as a quick hack
diff --git a/llvm/test/CodeGen/ARM/fcmp-xo.ll b/llvm/test/CodeGen/ARM/fcmp-xo.ll
index 3d5972f065859f..908dbd7a11a6b6 100644
--- a/llvm/test/CodeGen/ARM/fcmp-xo.ll
+++ b/llvm/test/CodeGen/ARM/fcmp-xo.ll
@@ -54,12 +54,12 @@ define arm_aapcs_vfpcc float @float128(float %a0) local_unnamed_addr {
; NEON-LABEL: float128:
; NEON: @ %bb.0:
; NEON-NEXT: mov.w r0, #1124073472
-; NEON-NEXT: vmov.f32 s2, #5.000000e-01
-; NEON-NEXT: vmov d3, r0, r0
-; NEON-NEXT: vmov.f32 s4, #-5.000000e-01
-; NEON-NEXT: vcmp.f32 s6, s0
+; NEON-NEXT: vmov.f32 s4, #5.000000e-01
+; NEON-NEXT: vmov d1, r0, r0
+; NEON-NEXT: vmov.f32 s6, #-5.000000e-01
+; NEON-NEXT: vcmp.f32 s2, s0
; NEON-NEXT: vmrs APSR_nzcv, fpscr
-; NEON-NEXT: vselgt.f32 s0, s4, s2
+; NEON-NEXT: vselgt.f32 s0, s6, s4
; NEON-NEXT: bx lr
%1 = fcmp nsz olt float %a0, 128.000000e+00
%2 = select i1 %1, float -5.000000e-01, float 5.000000e-01
diff --git a/llvm/test/CodeGen/ARM/fp16-instructions.ll b/llvm/test/CodeGen/ARM/fp16-instructions.ll
index 1988cb1d2f9039..7a1d5ddfa301b6 100644
--- a/llvm/test/CodeGen/ARM/fp16-instructions.ll
+++ b/llvm/test/CodeGen/ARM/fp16-instructions.ll
@@ -700,9 +700,9 @@ define half @select_cc1(ptr %a0) {
; CHECK-LABEL: select_cc1:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vseleq.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vseleq.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -728,9 +728,9 @@ define half @select_cc_ge1(ptr %a0) {
; CHECK-LABEL: select_cc_ge1:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -751,9 +751,9 @@ define half @select_cc_ge2(ptr %a0) {
; CHECK-LABEL: select_cc_ge2:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -774,9 +774,9 @@ define half @select_cc_ge3(ptr %a0) {
; CHECK-LABEL: select_cc_ge3:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0,
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0,
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -797,9 +797,9 @@ define half @select_cc_ge4(ptr %a0) {
; CHECK-LABEL: select_cc_ge4:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselge.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselge.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -821,9 +821,9 @@ define half @select_cc_gt1(ptr %a0) {
; CHECK-LABEL: select_cc_gt1:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -844,9 +844,9 @@ define half @select_cc_gt2(ptr %a0) {
; CHECK-LABEL: select_cc_gt2:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -867,9 +867,9 @@ define half @select_cc_gt3(ptr %a0) {
; CHECK-LABEL: select_cc_gt3:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -890,9 +890,9 @@ define half @select_cc_gt4(ptr %a0) {
; CHECK-LABEL: select_cc_gt4:
-; CHECK-HARDFP-FULLFP16: vcmp.f16
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-HARDFP-FULLFP16-NEXT: vselgt.f16 s0, s{{.}}, s{{.}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
+; CHECK-HARDFP-FULLFP16: vselgt.f16 s0, s{{.}}, s{{.}}
; CHECK-SOFTFP-FP16-A32: vcmp.f32
; CHECK-SOFTFP-FP16-A32-NEXT: vmrs APSR_nzcv, fpscr
@@ -923,10 +923,10 @@ entry:
; CHECK-LABEL: select_cc4:
; CHECK-HARDFP-FULLFP16: vldr.16 [[S2:s[0-9]]], .LCPI{{.*}}
+; CHECK-HARDFP-FULLFP16: vcmp.f16 s0, [[S2]]
; CHECK-HARDFP-FULLFP16: vldr.16 [[S4:s[0-9]]], .LCPI{{.*}}
+; CHECK-HARDFP-FULLFP16: vmrs APSR_nzcv, fpscr
; CHECK-HARDFP-FULLFP16: vmov.f16 [[S6:s[0-9]]], #-2.000000e+00
-; CHECK-HARDFP-FULLFP16: vcmp.f16 s0, [[S2]]
-; CHECK-HARDFP-FULLFP16-NEXT: vmrs APSR_nzcv, fpscr
; CHECK-HARDFP-FULLFP16-NEXT: vseleq.f16 [[S0:s[0-9]]], [[S6]], [[S4]]
; CHECK-HARDFP-FULLFP16-NEXT: vselvs.f16 s0, [[S6]], [[S0]]
diff --git a/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll b/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll
index 56e734c4404336..996b46c51ab361 100644
--- a/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll
+++ b/llvm/test/CodeGen/ARM/fp16-vminmaxnm-safe.ll
@@ -5,11 +5,11 @@
define half @fp16_vminnm_o(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_o:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r1
-; CHECK-NEXT: vmov.f16 s2, r0
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r0
+; CHECK-NEXT: vmov.f16 s2, r1
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselgt.f16 s0, s2, s0
+; CHECK-NEXT: vselgt.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
@@ -37,11 +37,11 @@ entry:
define half @fp16_vminnm_u(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_u:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r0
-; CHECK-NEXT: vmov.f16 s2, r1
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r1
+; CHECK-NEXT: vmov.f16 s2, r0
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselge.f16 s0, s2, s0
+; CHECK-NEXT: vselge.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
@@ -53,11 +53,11 @@ entry:
define half @fp16_vminnm_ule(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_ule:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r0
-; CHECK-NEXT: vmov.f16 s2, r1
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r1
+; CHECK-NEXT: vmov.f16 s2, r0
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselgt.f16 s0, s2, s0
+; CHECK-NEXT: vselgt.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
@@ -69,11 +69,11 @@ entry:
define half @fp16_vminnm_u_rev(half %a, half %b) {
; CHECK-LABEL: fp16_vminnm_u_rev:
; CHECK: @ %bb.0: @ %entry
-; CHECK-NEXT: vmov.f16 s0, r1
-; CHECK-NEXT: vmov.f16 s2, r0
-; CHECK-NEXT: vcmp.f16 s0, s2
+; CHECK-NEXT: vmov.f16 s0, r0
+; CHECK-NEXT: vmov.f16 s2, r1
+; CHECK-NEXT: vcmp.f16 s2, s0
; CHECK-NEXT: vmrs APSR_nzcv, fpscr
-; CHECK-NEXT: vselge.f16 s0, s2, s0
+; CHECK-NEXT: vselge.f16 s0, s0, s2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: bx lr
entry:
diff --git a/llvm/test/CodeGen/ARM/fpscr-multi-use.ll b/llvm/test/CodeGen/ARM/fpscr-multi-use.ll
new file mode 100644
index 00000000000000..3e77ad65df9927
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/fpscr-multi-use.ll
@@ -0,0 +1,40 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=armv7 %s -o - | FileCheck %s
+
+declare double @fn()
+
+define void @test(ptr %p, ptr %res) nounwind {
+; CHECK-LABEL: test:
+; CHECK: @ %bb.0: @ %entry
+; CHECK-NEXT: push {r4, lr}
+; CHECK-NEXT: vpush {d8}
+; CHECK-NEXT: vldr d8, [r0]
+; CHECK-NEXT: mov r4, r1
+; CHECK-NEXT: vcmp.f64 d8, #0
+; CHECK-NEXT: vmrs APSR_nzcv, fpscr
+; CHECK-NEXT: vneg.f64 d16, d8
+; CHECK-NEXT: vmov.f64 d17, d8
+; CHECK-NEXT: vmovne.f64 d17, d16
+; CHECK-NEXT: vstr d17, [r1]
+; CHECK-NEXT: bl fn
+; CHECK-NEXT: vcmp.f64 d8, #0
+; CHECK-NEXT: vmrs APSR_nzcv, fpscr
+; CHECK-NEXT: vmov d16, r0, r1
+; CHECK-NEXT: eor r1, r1, #-2147483648
+; CHECK-NEXT: vmov d17, r0, r1
+; CHECK-NEXT: vmovne.f...
[truncated]
|
@mstorsjo Would it be possible for you to check if this version resolves the issue you're observing pre-commit? |
The first commit adds a test that was crashing with the previous version of the PR. |
Yes - this version does seem to run correctly for both of the cases that previous broke. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this no longer breaks my case.
Following #116547, this changes the result of
ARMISD::CMPFP*
and theoperand of
ARMISD::FMSTAT
from a specialGlue
type to a normal type.This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.
Note that
ARMISD::FMSTAT
is still glued to its consumer nodes; this isgoing to be changed in a separate patch.
This patch also sets
CopyCost
ofcl_FPSCR_NZCV
register class to anegative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of
FPCSR_NZCV
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.
TRI::getCrossCopyRegClass
is modified in a way that prevents DAGscheduler from copying FPSCR into a virtual register. The register
allocator might need to spill the virtual register, but that only seem
to work in Thumb mode.