Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-configure fails on POWER9 #501

Closed
boegel opened this issue May 23, 2021 · 1 comment · Fixed by #647 · May be fixed by amd/blis#6
Closed

auto-configure fails on POWER9 #501

boegel opened this issue May 23, 2021 · 1 comment · Fixed by #647 · May be fixed by amd/blis#6

Comments

@boegel
Copy link

boegel commented May 23, 2021

When trying to build BLIS 0.8.1 on a POWER9 system (hosted by https://osuosl.org) with GCC 10.3 and binutils 2.36.1, I'm seeing the following problem with the configure script:

$ ./configure --prefix=/tmp/$USER/BLIS-0.8.1 --enable-cblas --enable-threading=openmp --enable-shared auto
configure: detected Linux kernel version 4.18.0-240.1.1.el8_3.ppc64le.
configure: python interpeter search list is: python python3 python2.
configure: using 'python' python interpreter.
configure: found python version 3.9.5 (maj: 3, min: 9, rev: 5).
configure: python 3.9.5 appears to be supported.
configure: C compiler search list is: gcc clang cc.
configure: using 'gcc' C compiler.
configure: C++ compiler search list is: g++ clang++ c++.
configure: using 'g++' C++ compiler (for sandbox only).
configure: found gcc version 10.3.0 (maj: 10, min: 3, rev: 0).
configure: checking for blacklisted configurations due to gcc 10.3.0.
configure: checking gcc 10.3.0 against known consequential version ranges.
configure: found assembler ('as') version 2.36.1 (maj: 2, min: 36, rev: 1).
configure: checking for blacklisted configurations due to as 2.36.1.
configure: warning: assembler ('as' 2.36.1) does not support 'bulldozer'; adding to blacklist.
configure: warning: assembler ('as' 2.36.1) does not support 'sandybridge'; adding to blacklist.
configure: warning: assembler ('as' 2.36.1) does not support 'haswell'; adding to blacklist.
configure: warning: assembler ('as' 2.36.1) does not support 'piledriver'; adding to blacklist.
configure: warning: assembler ('as' 2.36.1) does not support 'steamroller'; adding to blacklist.
configure: warning: assembler ('as' 2.36.1) does not support 'excavator'; adding to blacklist.
configure: warning: assembler ('as' 2.36.1) does not support 'skx'; adding to blacklist.
configure: warning: assembler ('as' 2.36.1) does not support 'knl'; adding to blacklist.
configure: configuration blacklist:
configure:   bulldozer sandybridge haswell piledriver steamroller excavator skx knl
configure: reading configuration registry...done.
configure: determining default version string.
configure: could not find '.git' directory; using unmodified version file.
configure: starting configuration of BLIS 0.8.1.
configure: configuring with official version string.
configure: found shared library .so version '3.0.0'.
configure:   .so major version: 3
configure:   .so minor.build version: 0.0
configure: automatic configuration requested.
/tmp/eb-al3haik0/ccXdk4Z9.o:config_detect.c:function main: error: undefined reference to 'bli_cpuid_query_id'
collect2: error: ld returned 1 exit status
./configure: line 1142: ./auto-detect.x: No such file or directory
configure: hardware detection driver returned ''.
configure: checking configuration against contents of 'config_registry'.
configure: 'auto-detected configuration '' is NOT registered!
configure:
configure: *** Cannot continue with unregistered configuration ''. ***
configure:

Is auto-configure expected to work on POWER9 systems?
Maybe something else going wrong here, perhaps something as simple as a missing #include statement?

@devinamatthews
Copy link
Member

devinamatthews commented May 23, 2021

POWER autoconfiguration is addressed by #345 which needs some work before it can be merged. Certainly configure power9 is an easy workaround for now.

Flamefire added a commit to Flamefire/blis that referenced this issue Jul 7, 2022
Read from `/proc/cpuinfo` as done for ARM.
Fixes flame#501
Flamefire added a commit to Flamefire/blis that referenced this issue Jul 21, 2022
Read from `/proc/cpuinfo` as done for ARM.
Fixes flame#501
Flamefire added a commit to Flamefire/blis that referenced this issue Jul 21, 2022
Read from `/proc/cpuinfo` as done for ARM.
Fixes flame#501
devinamatthews pushed a commit that referenced this issue Jul 21, 2022
Read from `/proc/cpuinfo` as done for ARM.
Fixes #501
fgvanzee added a commit that referenced this issue Oct 26, 2023
Details:
- Fixed a harmless bug that would have allowed C++ headers into the list
  of header suffices specifically reserved for C99 headers. In practice,
  this would have had no substantive effect on anything since the core
  BLIS framework does not use C++ headers.
- (cherry picked from commit bbaf29a)

CREDITS file update.

Details:
- Thanks to Kihiro Bando for assisting with issue #644.
- (cherry picked from commit a48e29d)

Removed buggy cruft from power10 subconfig.

Details:
- Removed #defines for BLIS_BBN_s and BLIS_BBN_d from
  bli_kernel_defs_power10.h. These were inadvertently set in ae10d94
  because the power10 subconfig was registering bb packm ukernels, but
  only for 6xk (power10 uses s8x16 and d8x8 ukernels) and only because
  the original author (probably) copy-pasted from power9 when getting
  started. That 6xk packm registration was effectively "dead code"
  prior to ae10d94, but was then mistaken as not-dead code during the
  ae10d94 refactor. These improper bb factors may have been causing
  bugs in power10 builds. Thanks to Nicholai Tukanov for helping remind
  me what the power10 subconfig was supposed to look like.
- Removed extraneous microkernel preference registrations from power10
  subconfig. Preferences for single and double complex gemm were being
  registered despite there being no complex gemm ukernels registered to
  go with them. Similarly, there were trsm preferences registered
  without any trsm ukernels registered (and BLIS doesn't actually use a
  preference for the trsm ukernel anyway). These extraneous
  registrations were almost surely not hurting anything, even if they
  were quite misleading.
- (cherry picked from commit 5b29893)

Disable modification of KC in the gemmsup kernels. (#648)

This led to a ~50% performance reduction for certain gemm operations (but not others?). See #644 for example.
- (cherry picked from commit 56de31b)

Fixed out-of-bounds bug in sup s6x16m haswell kernel.

Details:
- Fixed another out-of-bounds read access bug in the haswell sup
  assembly kernels. This bug is similar to the one fixed in 17b0caa
  and affects bli_sgemmsup_rv_haswell_asm_6x2m(). Thanks to Madeesh
  Kannan for reporting this bug (and a suitable fix) in #635.
- CREDITS file update.
- (cherry picked from commit 4dde947)

Add `#line` directives to flattened `blis.h`. (#643)

Details:
- Modified flatten-headers.py so that #line directives are inserted into
  the flattened blis.h file. This facilitates easier debugging when
  something is amiss in the flattened blis.h because the compiler will
  be able to refer to the line number within the original constituent
  header file (which is where the fix would go) rather than the line
  number within the flattened header (which is not as helpful).
- (cherry picked from commit 6826c1c)

Add autodetection for POWER7, POWER9 & POWER10 (#647)

Read from `/proc/cpuinfo` as done for ARM.
Fixes #501
- (cherry picked from commit af3a41e)

Fixed out-of-bounds read in haswell gemmsup kernels.

Details:
- Fixed memory access bugs in the bli_sgemmsup_rv_haswell_asm_Mx2()
  kernels, where M = {1,2,3,4,5,6}. The bugs were caused by loading four
  single-precision elements of C, via instructions such as:

        vfmadd231ps(mem(rcx, 0*32), xmm3, xmm4)

  in situations where only two elements are guaranteed to exist. (These
  bugs may not have manifested in earlier tests due to the leading
  dimension alignment that BLIS employs by default.) The issue was fixed
  by replacing lines like the one above with:

        vmovsd(mem(rcx), xmm0)
        vfmadd231ps(xmm0, xmm3, xmm4)

  Thus, we use vmovsd to explicitly load only two elements of C into
  registers, and then operate on those values using register addressing.
  Thanks to Daniël de Kok for reporting these bugs in #635, and to
  Bhaskar Nallani for proposing the fix).
- CREDITS file update.
- (cherry picked from commit 17b0caa)

Allow uniform max problem sizes in test/3/runme.sh.

Details:
- Tweaked test/3/runme.sh so that the test driver binaries for single-
  threaded (st), single-socket (1s), and dual-socket (2s) execution can
  be built using identical problem size ranges. Previously, this was not
  possible because runme.sh used the maximum problem size, which was
  embedded into the binary filename, to tell the three classes of
  binaries apart from one another. Now, runme.sh uses the binary suffix
  ("st", "1s", or "2s") to tell them apart. This required only a few
  changes to the logic, but it also required a change in format to the
  threading config strings themselves (replacing the max problem size
  with "st", "1s", or "2s"). Thanks to Jeff Diamond for inspiring this
  improvement.
- Comment updates.
- (cherry picked from commit cc260fd)

Use BLIS_ENABLE_COMPLEX_RETURN_INTEL in blastest files (#636)

Details:
- Fixed a crash that occurs when either cblat1 or zblat1 are linked
  with a build of BLIS that was compiled with '--complex-return=intel'.
  This fix involved inserting preprocessor macro guards based on
  BLIS_ENABLE_COMPLEX_RETURN_INTEL into blastest/src/cblat1.c and
  blastest/src/zblat1.c to correctly handle situations where BLIS is
  compiled with Intel/f2c-style calling conventions for complex numbers.
- Updated blastest/src/fortran/run-f2c.sh so that future executions
  will insert the aforementioned cpp macro conditional where
  appropriate.
- (cherry picked from commit 9b1beec)

Change complex_return='intel' for ifx. (#637)

Details:
- When checking the version string of the Fortran compiler for the
  purposes of determining a default return convention for complex
  domain values, grep for "IFORT" instead of "ifort" since that string
  is common to both the 'ifx' and 'ifort' binaries provided by Intel:

    $ ifx --version
    ifx (IFORT) 2022.1.0 20220316
    Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

    $ ifort --version
    ifort (IFORT) 2021.6.0 20220226
    Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

- (cherry picked from commit 98d4678)

Minor changes to .gitignore and LICENSE files. (#642)

Details:
- Macs create .DS_Store files in every directory visited. Updated
  .gitignore file so these files won't be reported as untracked by
  'git status'.
- Added Oracle Corporation to the LICENSE file.
- Updated UT copyright on behalf of SHPC.
- (cherry picked from commit ffde54c)

Minor cleanups, comment updates to bli_gks.c.

Details:
- Removed a redundant registration of 'a64fx' subconfig in
  bli_gks_init().
- Reordered registration of 'armsve', 'a64fx', and 'firestorm'
  subconfigs. Thanks to Jeff Diamond for his input on this reordering.
- Comment updates to bli_gks.c and arch_t enum in bli_type_defs.h.
- (cherry picked from commit 7cba7ce)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants