Skip to content
Martin Ueding edited this page Aug 23, 2017 · 2 revisions

Compiling Chroma

The whole suite of USQCD software consists of a handful of building modules. Before compiling, it is important to choose which modules to use. Not every module can be compiled on every architecture, not every modules works well with other modules.

Generic

GNU Autotools

All modules use the GNU Autotools for configuration and compilation. In principle, all modules should be compilable with ./configure, make, and make install. However, there are many configure flags to set correctly. Also the versions of GNU Autotools installed on the computer and used in the module can be different. In that case, one needs to update the files that are shipped with the module. It has proven helpful to run the following commands (Bash shell) before running configure:

for configure_ac in $(find . -name configure.ac | sort | tac); do
    pushd "${configure_ac%/*}"
    autoreconf -fiv
    popd
done

The tac is needed such that one goes bottom-up because configure goes into the subdirectories automatically and would then stumble across things that are not cleaned up.

Static Linkage

While dynamic linkage is preferred for system wide installations of software, a supercomputer benefits from static linkage. This eases the loading of the program into every node. Even with options to force the static linkage, some dynamic linkage might slip through. Therefore the shared object files can be deleted after each installation step like this:

pushd $prefix/lib
rm -f *.so *.so.*
popd

Git Submodules

Modules like Chroma and QDP++ use a lot of git submodules. It is very important to clone recursively, with git clone --recursive. Some of the repositories use SSH remote URLs for GitHub. If you do not have an SSH key on the JURECA frontend that is registered with GitHub, you will be denied to clone the repository, even though it is a public one with a message like the following:

Cloning into 'other_libs/filedb'...
Warning: Permanently added the RSA host key for IP address '192.30.253.113' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Clone of '[email protected]:usqcd-software/filedb.git' into submodule path 'other_libs/filedb' failed

There are a couple of options:

  1. Copy the SSH key that you have registered with GitHub to the frontend. This way, git can log into GitHub via SSH. This is not recommendable because somebody gaining access to that key can access all your GitHub repositories.

  2. Create a new SSH key pair on the frontend and register that with GitHub. Same disadvantage as above, at least one can remove the key from GitHub later on.

  3. Change all the remote URLs from SSH to HTTPS. This needs to be done on every level of submodule tree and is a bit cumbersome.

  4. The best way is to create an SSH key pair on the frontend. Then register that SSH key as a deploy key for an arbitrary repository on your GitHub account. Currently, this is done by navigating to the Settings and then to the Deploy Keys. This will only grant pull permissions by default, so nothing happens for a public repository. Then you can clone repositories over SSH because the SSH key that is used is registered somewhere in GitHub.

JURECA without GPUs

The JURECA system consists of E5-2680 v3 Haswell CPUs. They support the instruction set architecture. For an Intel Xeon, the selection that has served well is libxml2, QMP, QDP++, QPhiX, and Chroma. The Remez algorithm for the rational approximation also needs the GMP (GNU Multi Precision) library.

In this section, all the needed steps to get Chroma running JURECA will be presented. The whole script can be downloaded and executed, it should bootstrap the whole software stack automatically.

The recommended compiler for JURECA without GPUs is the Intel C++ compiler. It can take best advantage of the features of the Xeon Phi. QPhiX also uses some non-standard Intel extensions which can only be used with the Intel compiler. It is recommended to specify the version to load. The default version might change any time and then the compilation might not work out any more. If a newer version of the compiler got installed, it is worth checking whether that gives more performance than the old one.

MPI versions of the compiler are called mpiicc and mpiicpc for the C and C++ compilers, respectively. The full paths need to be passed to configure using the CC and CXX variables.

Troubleshooting

During the compilation of Chroma, one might encounter a couple errors. Here are some of the typical errors listed with a way to work around them.

Problem:

fatal: unable to connect to git.gnome.org:

Solution:

There seems to be some firewall rule on JURECA that prevents this. Just clone from JUDAC.

Configure

Problem:

configure: error: Cannot compile/link a program with libxml2.

Solution:

The binary xml2-config of the local installation must be somewhere in the $PATH. If the system wide installation of libxml2 is found first, it will be used. That version has been compiled with the standard system compiler (usually an older GCC), therefore this causes troubles.

Problem:

./configure: line 13042: syntax error near unexpected token `Z,zlib,'
./configure: line 13042: `    PKG_CHECK_MODULES(Z,zlib,'

Solution:

The PKG_CHECK_MODULES is a GNU M4 macro that has not been properly resolved during the generation of the configure script. The needed macros are defined /usr/share/aclocal/pkg.m4. In the source of libxml2, one needs to create a directory called m4 and symlink that file into it. Another run of autoreconf -f will pick up the changes and create a configure file without unresolved macros.

Problem:

error: cannot open source file "qio_config_internal.h"

Solution:

The source checkout of QIO that is a submodule of QDP++ might ship with a Makefile that is not up to date. Delete other_libs/qio/Makefile and run ./autoreconf.sh inside other_libs/qio.

Problem:

configure.ac:6: error: version mismatch.  This is Automake 1.15,

Solution:

The git repositories sometimes contain GNU Autotools files from a version different than the one installed on the frontend. In that case one needs to run aclocal in each submodule to update those files.

Problem:

CDPATH="${ZSH_VERSION+.}:" && cd /homec/hbn28/hbn28e/Sources/qphix && aclocal-1.13 
/bin/sh: aclocal-1.13: command not found

Solution:

Also a version missmatch. Run the Autotools reset commands given on Page .

Compilation

Problem:

QMP_mem.c(345): error: a value of type "void *" cannot be assigned to an entity of type "char *"
      mh->base = mm->mem;
               ^

Solution:

In C, this casting is allowed, in C++ it is not. This can happen if the C and C++ compilers are mixed up and C code is compiled with a C++ compiler.

Problem:

error: 'asm' undeclared (first use in this function)

Solution:

The asm keyword cannot be used with GCC with the ISO C. Use --std=gnu99.

Problem:

Error: unrecognized opcode: `qvlfdx'

Solution:

This and similar instructions are QPX intrinsics that only work on BG/Q. GCC does not seem to support this on JUQUEEN, use LLVM instead.

Problem:

error: inlining failed in call to always_inline '__m256d _mm256_set1_pd(double)': target specific option mismatch

Solution:

GCC needs the option -mavx2 for this to compile.

Problem:

../include/qphix/dslash_body.h:584:17: error: expected ']' before ':' token

Solution:

CEAN is an Intel C++ extension. Drop --enable-cean.