-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
#113 reminded me about this thing... The target instruction set or architecture isn't specified now, so AVX intrinsics don't work in C++ and C
error: always_inline function '_mm256_add_pd' requires target feature 'avx', but would be inlined into function 'add' that is compiled without support for 'avx'
I think it would be reasonable to use march=native, which allows using available intrinsics and optimization for the current CPU architecture as well.
uniapi
Metadata
Metadata
Assignees
Type
Projects
Status
Next Up
Milestone
Relationships
Development
Select code repository
Activity
kazk commentedon Apr 29, 2021
I can add
-march=x86-64 -mtune=generic.-march=nativewill be too specific to the host VM, and I don't want that.See https://stackoverflow.com/a/54163496 and https://stackoverflow.com/a/10134204
error256 commentedon Apr 29, 2021
-march=x86-64 -mtune=genericdoesn't really change anything here, I think it's the default.OK, I've solved my original problem with intrinsics with
__attribute__((__target__("avx"))), so it doesn't matter that much now, but anyway...Solutions are compiled to be executed just once on the same type of system. So what exactly is the issue here, is it that there can possibly be significantly different CPUs that the difference in performance with
nativewill be greater that without it? Technically, JIT compilers produce native code (at least by default, I think), so if this is a problem, I think it's already there with certain other languages and that's why C# or Java code can sometimes, at least in theory, be faster than C. Or am I missing something else here?kazk commentedon Apr 29, 2021
If
-march=x86-64doesn't do anything, forget it. I thought it still enabled what you wanted.No, I wasn't worried about performance much. I'll try explaining my thoughts based on my limited knowledge around this area. Please bear with me and let me know if I'm misunderstanding.
Assumptions:
-march=nativeadds compiler options very specific to the CPU-march=x86-64covers what you wantI guess my third assumption was incorrect?
Concern: Having code that depends on the current host VM CPU type. I don't change VM types often, but it's another variable that needs to be kept in mind if it was depended on, so I wanted to avoid it if possible (remember, I thought
-march=x86-64covered what you wanted). I also thought this might make testing the change against published kata more difficult.Just for an example, on Google Cloud, there are VM machine types that trades cost for less control, for example E2 that doesn't guarantee the processor types. If we used this machine type with
-march=native, submissions can be executed on different processor types (any of Intel Skylake, Broadwell, Haswell, and AMD EPYC Rome processors). Can this cause the same code to fail to compile depending on the VM it happened to be compiled on? Don't worry about performance differences. (I'm not planning to do this.)I guess my point is that
-march=nativeexpands to many compiler options that's difficult to keep track of.-march=nativeexpanded to:Instead of
-march=native, is it possible to add some flags explicitly to get what you want?kazk commentedon Apr 29, 2021
To be clear, if you think
-march=nativeis a safe default for our use case, I'm not against it. I trust you more on this.error256 commentedon Apr 30, 2021
I've never used it for anything serious, I've never even used C++ for anything serious, so don't just trust me.
Yes, code compiled with
-march=nativeis supposed to be executed on CPUs with the same architecture as where it's compiled. I didn't know that it added thecacheoptions, so it looks like it's supposed to be the same system, but I don't see how they can affect anything apart from optimization.Partially... Now that I've found the
__target__function attribute, it's not too important. For the whole program, instruction sets can be enabled separately, but all questions of compatibility will be mostly the same as withmarch, but if there's a known lower bound of CPUs, instruction sets can be enabled like-mavx2or-mavx.Yes, if it uses functions that use instruction sets that may of may not be available, but...
Things that work or don't work depending on the CPU model are already possible, even without the
__target__attribute there's inline asm, it's just much less convenient; so there isn't much that would change here.uniapi commentedon Jul 12, 2021
There should be no worrying about adding flags
-mavx -mavx2if you use machine not older than 2015 Q3 for AMD and Intel.Because adding
-march=nativemay not succeed to enable AVX and AVX2 support and in this case the compilation stops with errors like these:Just add: -mavx -mavx2error256 commentedon Jul 12, 2021
@uniapi I don't see how any other
-m*can be safer thannative.nativeshould always match the current platform, so it should be the safest of all-ms as long as the program is compiled for one-time usage on the same machine.What circumstances are you talking about? It will only fail when there really is no AVX/AVX2 support.
Do I misunderstand something?
uniapi commentedon Jul 12, 2021
@error256
Ok! One of my machine is
skylakeand gcc is11.1.0.Sure you do know skylake has support for AVX!
So when i compile with the flag
-march=nativei do get the following errors:But everything is ok when i do compile with:
-mavx -mavx2Yes
-march=nativeis completely safe but it may not succeed to enable AVXes as i've just explained above.But it seems that
Intel Pentium Dual-Core E6500 CPUdoes support AVX.error256 commentedon Jul 12, 2021
@uniapi First, it looks like you're using gcc. But the behavior should be the same: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
So I don't know how you got that error. Are you using a VM? Does
lscpureportavxin the flags? Whatmarchdoesgcc -v -E - -march=native </dev/null 2>&1 | grep marchshow?uniapi commentedon Jul 12, 2021
@error256
This is not truth!
Here is the option for
-march=nativeonx86-64 skylake:And this one for
-march=skylakeon the same platform:So
nativeis 10 lines shorter because not all instructions are enabled!Notice that
avx2is not enabled withnativebut is turned on withskylakeThat is the quoted statement is not true!
uniapi commentedon Jul 12, 2021
The OS should have support for AVX enabled.
And that is the same reason why
gdbdoes not showymmregisters while debugging though you may successfully run AVX code like me!error256 commentedon Jul 12, 2021
Interesting observation, but it doesn't change the fact that, according to the documentation,
nativeshould enable all supported instruction sets, so it's probably a bug. I don't know your OS, but I've found this bug for gcc in macOS.But that's gcc again. What about clang?
uniapi commentedon Jul 12, 2021
It's not a bug!
The same thing with clang: 'native' throws an error and
avx2does not!and gdb version is 10.2 but it does not recognize ymm registers to print.
It's because the OS support is not enabled for AVX2.
My OS is
SunOS 11.3orx86_64-pc-solaris2.11.And here the output you asked from
gcc -v -E - -march=native </dev/null 2>&1 | grep march:error256 commentedon Jul 12, 2021
It means the behavior of
-march=nativeis actually correct in this case and the actual problem is that "the OS support is not enabled for AVX2", which would need to be dealt with first in any serious scenario.uniapi commentedon Jul 12, 2021
Exactly!
So the first option is to check the OS support and add
-march=nativeif enabled.And the second option if the OS support is disabled is to check the CPU info (probably hardcoded) for AVX2 presence and add
-mavx -mavx2if present.uniapi commentedon Jul 12, 2021
Here was the hack for old ubuntu and i try to find the similar hack for my os.
Urfoex commentedon Jul 13, 2021
Interesting topic and interesting find.
But, isn't this for a coding challenge website?
I wouldn't make it all science and hacky to get things running in weird scenarios.
It should work in the general case without a huge support and maintenance cost.
And pointing out what @kazk said at:
#118 (comment)
This thing runs in the cloud without a fixed CPU. If you are going to specific, special and hacky, your code will work or not from one run to the next.
Phoronix sometimes does benchmarks on compiler optimizations:
https://www.phoronix.com/scan.php?page=article&item=gcc-10900k-compiler&num=1
https://www.phoronix.com/scan.php?page=article&item=clang-12-opt&num=1
https://www.phoronix.com/scan.php?page=article&item=amd-znver3-gcc11&num=1
https://www.phoronix.com/scan.php?page=article&item=gcc10-gcc11-5950x&num=1
https://www.phoronix.com/scan.php?page=article&item=clang-12-5950x&num=1
C++ is a beast (not just) when it comes to optimizations.
Without changing code, but the right compiler - flag - CPU combination, it can run way faster - or way slower. (Not to mention Profile-Guided-Optimizations.)
It is already annoying when challenge code passes or fails because of time constraint and being on a faster or slower machine.
But having it pass or fail because it does or doesn't support some CPU feature - even more annoying, I'd say.
uniapi commentedon Jul 13, 2021
@Urfoex i just shared the link for those who may have similar problems...
But the Codewars solution is much more easier as i've described in two scenarios.
error256 commentedon Jul 13, 2021
My suggestion is about enabling whatever the compiler thinks is available for the current system, not specifically about AVX/AVX2, which was supposed to be enabled automatically by that for any modern CPU. But it turns out it not necessarily is.
Does it even happen on Linux? (=> Is this observation even important in this context?) What exactly the OS support for AVX2 is and why can it even be turned off? Surely there must be a valid reason...
uniapi commentedon Jul 13, 2021
Also this scenario is observed when running on Virtual Machines even though you have a cool CPU.
But i have the same opinion that all instructions should be enabled but at least AVX/AVX2.
So then (according to you) the best scenario is to parse the target architecture and inject it to the compiler -march flag.
If on skylake then
-march=skylakeIf on skylake-avx512 then
-march=skylake-avx512If on pentium4m then
-march=pentium4mand so on...
And it seems this is the best that we could have from the running CPU!
error256 commentedon Jul 13, 2021
OS, VM, whatever, the question still holds. Why doesn't the virtual machine report AVX2? Is there a reason? Is it an option?
uniapi commentedon Jul 16, 2021
@error256 'm not sure about that for 100%. I do know that you won't be able to use XSAVE if using Hyper-V and also know that it's possible (at least it seems possible) to turn on AVX2 support with VboxManage.
So now i still do not have the answer! Probably the developers of those Operating Systems should know))
And yet! How did you succeed to use AVX2 intrinsics on Codewars? What constructions or attributes did you use?
error256 commentedon Jul 16, 2021
#118 (comment)
https://www.codewars.com/kumite/608aea3fa35c7c003251bb42?sel=608b0afb4865f700290a610f
(AVX, AVX2 - no difference here.)
uniapi commentedon Sep 13, 2021
@error256 thanks for your link!
And yeah! you are right! It's not appropriate to use
-march=avx2because it's C and it should be portable!So if running on arm we could
#include <arm_neon.h>.So vote for adding
-march=native