Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releases/gcc 12 #65

Open
wants to merge 2,585 commits into
base: master
Choose a base branch
from
Open

Releases/gcc 12 #65

wants to merge 2,585 commits into from

Conversation

jacopobrusini
Copy link

Support for Apple Silicon!!!

@jwakely
Copy link
Contributor

jwakely commented Feb 21, 2024

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

jianghc724 and others added 28 commits July 29, 2024 10:16
There are several typo in AVX512 intrins macro define. Correct them to solve
errors when compiled with -O0.

gcc/ChangeLog:

	* config/i386/avx512dqintrin.h
	(_mm_mask_fpclass_ss_mask): Correct operand order.
	(_mm_mask_fpclass_sd_mask): Ditto.
	(_mm256_maskz_reduce_round_ss): Use __builtin_ia32_reducess_mask_round
	instead of __builtin_ia32_reducesd_mask_round.
	(_mm_reduce_round_sd): Use -1 as mask since it is non-mask.
	(_mm_reduce_round_ss): Ditto.
	* config/i386/avx512vlbwintrin.h
	(_mm256_mask_alignr_epi8): Correct operand usage.
	(_mm_mask_alignr_epi8): Ditto.
	* config/i386/avx512vlintrin.h (_mm_mask_alignr_epi64): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-vpalignr-1b.c: New test.
	* gcc.target/i386/avx512dq-vfpclasssd-1b.c: Ditto.
	* gcc.target/i386/avx512dq-vfpclassss-1b.c: Ditto.
	* gcc.target/i386/avx512dq-vreducesd-1b.c: Ditto.
	* gcc.target/i386/avx512dq-vreducess-1b.c: Ditto.
	* gcc.target/i386/avx512vl-valignq-1b.c: Ditto.
…13/12

In GCC13/12, there is no _mm_avx512_setzero_ps/d since it is introduced
in GCC14.

gcc/ChangeLog:

	* config/i386/avx512dqintrin.h (_mm_reduce_round_sd): Use
	_mm_setzero_pd instead of _mm_avx512_setzero_pd.
	(_mm_reduce_round_ss): Use _mm_setzero_ps instead of
	_mm_avx512_setzero_ps.
2024-07-18  Paul Thomas  <[email protected]>

gcc/fortran
	PR fortran/108889
	* gfortran.h: Add bit field 'allocated_in_scope' to gfc_symbol.
	* trans-array.cc (gfc_array_allocate): Set 'allocated_in_scope'
	after allocation if not a component reference.
	(gfc_alloc_allocatable_for_assignment): If 'allocated_in_scope'
	not set, not a component ref and not allocated, set the array
	bounds and offset to give zero length in all dimensions. Then
	set allocated_in_scope.

gcc/testsuite/
	PR fortran/108889
	* gfortran.dg/pr108889.f90: New test.

(cherry picked from commit c3aa339)
2024-07-19  Paul Thomas  <[email protected]>

libgomp/ChangeLog

	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Cut
	dg-note about 'a' and remove bogus warnings about its array
	descriptor components being used uninitialized.

(cherry picked from commit 8d6994f)
This was an interesting compare debug failure to debug. The first symptom
was in gcse which would produce different order of creating psedu-registers. This
was caused by a different order of a hashtable walk, due to the hash table having different
number of entries. Which in turn was due to the number of max insn being different between
the 2 runs. The place max insn uid comes from was in sh_recog_treg_set_expr which is called
via rtx_costs and fwprop would cause rtx_costs in some cases for debug insn related stuff.

Build and tested for sh4-linux-gnu.

	PR target/116189

gcc/ChangeLog:

	* config/sh/sh.cc (sh_recog_treg_set_expr): Don't call make_insn_raw,
	make the insn with a fake uid.

gcc/testsuite/ChangeLog:

	* c-c++-common/torture/pr116189-1.c: New test.

Signed-off-by: Andrew Pinski <[email protected]>
(cherry picked from commit 0355c94)
…ization

The constant C must be an integral multiple of the shift value in
the above optimization.  Non integral values can occur evaluating
IMAGPART_EXPR when the shadd constant is 8 and we have SFmode.

2024-08-06  John David Anglin  <[email protected]>

gcc/ChangeLog:

	PR target/113384
	* config/pa/pa.cc (hppa_legitimize_address): Add check to
	ensure constant is an integral multiple of shift the value.
For below pattern, RA may still allocate r162 as v/k register, try to
reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi
which result a linker error.

(set (reg:DI 162)
     (mem/u/c:DI
       (const:DI (unspec:DI
		 [(symbol_ref:DI ("a") [flags 0x60]  <var_decl 0x7f621f6e1c60 a>)]
		 UNSPEC_GOTNTPOFF))

Quote from H.J for why linker issue an error.
>What do these do:
>
>        leaq    __libc_tsd_CTYPE_B@gottpoff(%rip), %rax
>        vmovq   (%rax), %xmm0
>
>From x86-64 TLS psABI:
>
>The assembler generates for the x@gottpoff(%rip) expressions a R X86
>64 GOTTPOFF relocation for the symbol x which requests the linker to
>generate a GOT entry with a R X86 64 TPOFF64 relocation. The offset of
>the GOT entry relative to the end of the instruction is then used in
>the instruction. The R X86 64 TPOFF64 relocation is pro- cessed at
>program startup time by the dynamic linker by looking up the symbol x
>in the modules loaded at that point. The offset is written in the GOT
>entry and later loaded by the addq instruction.
>
>The above code sequence looks wrong to me.

gcc/ChangeLog:

	PR target/116043
	* config/i386/constraints.md (Bk): Refine to
	define_special_memory_constraint.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr116043.c: New test.

(cherry picked from commit bc1fda0)
Not sure how this happend, but: svsudot is supposed to be expanded
as USDOT with the operands swapped.  However, a thinko in the
expansion of svsudot meant that the arguments weren't in fact
swapped; the attempted swap was just a no-op.  And the testcases
blithely accepted that.

gcc/
	PR target/114607
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svusdot_impl::expand): Fix botched attempt to swap the operands
	for svsudot.

gcc/testsuite/
	PR target/114607
	* gcc.target/aarch64/sve/acle/asm/sudot_s32.c: New test.

(cherry picked from commit 2c1c248)
aarch64-sve.md had a pattern that combined:

	cmpeq	pb.T, pa/z, zc.T, #0
	mov	zd.T, pb/z, #1

into:

	cnot	zd.T, pa/m, zc.T

But this is only valid if pa.T is a ptrue.  In other cases, the
original would set inactive elements of zd.T to 0, whereas the
combined form would copy elements from zc.T.

gcc/
	PR target/114603
	* config/aarch64/aarch64-sve.md (@aarch64_pred_cnot<mode>): Replace
	with...
	(@aarch64_ptrue_cnot<mode>): ...this, requiring operand 1 to be
	a ptrue.
	(*cnot<mode>): Require operand 1 to be a ptrue.
	* config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand):
	Use aarch64_ptrue_cnot<mode> for _x operations that are predicated
	with a ptrue.  Represent other _x operations as fully-defined _m
	operations.

gcc/testsuite/
	PR target/114603
	* gcc.target/aarch64/sve/acle/general/cnot_1.c: New test.

(cherry picked from commit 67cbb1c)
GCC Administrator and others added 30 commits December 24, 2024 00:22
Initializing a vector using

 Vec : V.Vector := [Some_Type'(Some_Abstract_Type with F => 0)];

may crash the compiler. The expander marks the N_Extension_Aggregate for
delayed expansion which never happens and incorrectly ends up in gigi.

The delayed expansion is needed for nested aggregates, which the
original code is testing for, but container aggregates are handled
differently.

Such assignments to container aggregates are later transformed into
procedure calls to the procedures named in the Aggregate aspect
definition, for which the delayed expansion is not required/expected.

gcc/ada/
	PR ada/118234
	* exp_aggr.adb (Convert_To_Assignments): Do not mark node for
	delayed expansion if parent type has the Aggregate aspect.
	* sem_util.adb (Is_Container_Aggregate): Move...
	* sem_util.ads (Is_Container_Aggregate): ... here and make it
	public.
This just applies the same fix to Expand_Array_Aggregate as the one that was
recently applied to Convert_To_Assignments.

gcc/ada/
	PR ada/118234
	* exp_aggr.adb (Convert_To_Assignments): Tweak comment.
	(Expand_Array_Aggregate): Do not delay the expansion if the parent
	node is a container aggregate.
This handles the case where a component association is present.

gcc/ada/
	PR ada/118234
	* exp_aggr.adb (Convert_To_Assignments): In the case of a
	component association, call Is_Container_Aggregate on the parent's
	parent.
	(Expand_Array_Aggregate): Likewise.
gcc/ada
	* libgnarl/s-taprop__dummy.adb: Remove use clause for
	System.Parameters.
	(Unlock): Remove Global_Lock formal parameter.
	(Write_Lock): Likewise.
this patch adds support for new fussion in znver5 documented in the
optimization manual:

   The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions
   with certain ALU instructions. The following conditions need to be met for
   fusion to happen:
     - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B
     - The MOV is followed by an ALU instruction where the MOV and ALU destination register match.
     - The ALU instruction may source only registers or immediate data. There cannot be any memory source.
     - The ALU instruction sources either the source or dest of MOV instruction.
     - If ALU instruction has 2 reg sources, they should be different.
     - The following ALU instructions can fuse with an older qualified MOV instruction:
       ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR
       (I assume OP is OR)

I also increased issue rate from 4 to 6.  Theoretically znver5 can do more, but
with our model we can't realy use it.
Increasing issue rate to 8 leads to infinite loop in scheduler.

Finally, I also enabled fuse_alu_and_branch since it is supported by
znver5 (I think by earlier zens too).

New fussion pattern moves quite few instructions around in common code:
@@ -2210,13 +2210,13 @@
        .cfi_offset 3, -32
        leaq    63(%rsi), %rbx
        movq    %rbx, %rbp
+       shrq    $6, %rbp
+       salq    $3, %rbp
        subq    $16, %rsp
        .cfi_def_cfa_offset 48
        movq    %rdi, %r12
-       shrq    $6, %rbp
-       movq    %rsi, 8(%rsp)
-       salq    $3, %rbp
        movq    %rbp, %rdi
+       movq    %rsi, 8(%rsp)
        call    _Znwm
        movq    8(%rsp), %rsi
        movl    $0, 8(%r12)
@@ -2224,8 +2224,8 @@
        movq    %rax, (%r12)
        movq    %rbp, 32(%r12)
        testq   %rsi, %rsi
-       movq    %rsi, %rdx
        cmovns  %rsi, %rbx
+       movq    %rsi, %rdx
        sarq    $63, %rdx
        shrq    $58, %rdx
        sarq    $6, %rbx
which should help decoder bandwidth and perhaps also cache, though I was not
able to measure off-noise effect on SPEC.

gcc/ChangeLog:

	* config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune.
	* config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5.
	(ix86_adjust_cost): Add TODO about znver5 memory latency.
	(ix86_fuse_mov_alu_p): New.
	(ix86_macro_fusion_pair_p): Use it.
	* config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5.
	(X86_TUNE_FUSE_MOV_AND_ALU): New tune;

(cherry picked from commit e2125a6)
Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in
3 of them.  FP units can do 2 additions and 2 multiplications with latency 2
and 3.  This patch updates reassociation width accordingly.  This has potential
of increasing register pressure but unlike while benchmarking znver1 tuning
I did not noticed this actually causing problem on spec, so this patch bumps
up reassociation width to 6 for everything except for integer vectors, where
there are 4 units with typical latency of 1.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_reassociation_width): Update for Znver5.
	* config/i386/x86-tune-costs.h (znver5_costs): Update reassociation
	widths.

(cherry picked from commit f0ab3de)
gcc/ChangeLog:

	* doc/cpp.texi (Common Predefined Macros): Fix syntax.
The following makes analysis and transform agree on constraints.

	PR tree-optimization/115646
	* tree-call-cdce.cc (check_pow): Check for bit_sz values
	as allowed by transform.

	* gcc.dg/pr115646.c: New testcase.

(cherry picked from commit 453b1d2)
The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.

When we achieved SLP only we can move and update this meta-data.

	PR tree-optimization/115669
	* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
	chains that participate in a reduction.

	* gcc.dg/vect/pr115669.c: New testcase.

(cherry picked from commit 7886830)
The following fixes an issue with CCPs likely_value when faced with
a vector CTOR containing undef SSA names and constants.  This should
be classified as CONSTANT and not UNDEFINED.

	PR tree-optimization/116057
	* tree-ssa-ccp.cc (likely_value): Also walk CTORs in stmt
	operands to look for constants.

	* gcc.dg/torture/pr116057.c: New testcase.

(cherry picked from commit 1ea5515)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.