Roland's homepage

My random knot in the Web

Building CalculiX with the PaStiX solver without CUDA

The CalculiX solver package on FreeBSD is compiled with the SPOOLES (SParse Object Oriented Linear Equations Solver) library by default. This is also the case in many Linux distributions because SPOOLES is free, while faster solvers like PARADISO are proprietary.

SPOOLES is relatively fast compared to the built-in iterative solver, its most fundamental limitation is that the data must fit in RAM.

However a patched version of the PaStiX solver has been integrated with CalculiX. It is often faster than SPOOLES and can use a GPU using the CUDA library.

So I wanted to try CalculiX built with PaStiX. There is just one slight issue; CUDA is not supported on FreeBSD. And I use Intel or AMD graphics because those are supported by open-source drivers.

Therefore I needed to build PaStiX without CUDA support. This turned out to be slightly more complicated then expected.

Currently, CalculiX used a modified version of PaStiX that you can find at https://github.com/Dhondtguido/PaStiX4CalculiX. This is based on an older version of PaStiX. And even though it has a configuration option to build without CUDA, that doesn’t work. Basically, even if you switch CUDA off, there is still some CUDA-specific stuff left. I was halfway through patching all that, when I found that github user Kabbone had already done all of that: https://github.com/Kabbone/PaStiX4CalculiX/tree/cudaless. So that is the version I’m using.

Support for CalculiX in mainline PaStiX is being worked on, but will certainly not arrive before PaStiX 6.4.

Prerequisites

These build instructions were written for FreeBSD. With some changes they should be usable on other UNIX-like systems like Linux and MacOS.

Building software on ms-windows is such an exercise in self-flaggelation that I gladly avoid it. The contents of the patches and scripts might still be useful in this case. The latter as a general guideline.

The following software packages or ports are required for a successful build:

  • GNU make (called gmake on FreeBSD, usually make on Linux)
  • A Fortran compiler. Here GNU fortran in the form of gfortran13 is used. (On Linux it’s probably just called gfortran)
  • A C compiler. Here gcc13 is used. (or gcc on Linux)
  • GNU autotools/automake/libtool
  • cmake
  • bison
  • pkg-config
  • Python 2.7 (for PaStiX code expansion)
  • Python 3

For the build I set up the following directory tree:

.
├── bin
├── distfiles
├── examples
├── include
├── lib
├── libexec
├── logfiles
├── patches
├── share
├── source
└── unused

All the distribution files of known working versions are stored in distfiles. The needed patches are stored under patches. Log files of the builds are saved under logfiles. The builds are done under the source directory. Scripts and patches that are not in use anymore are stored under unused. The other directories (examples, include, lib, libexec, share) are locations for the libraries to be installed in.

The root directory contains the scripts used for the build. This whole set of distfiles, patches and shell scripts can be found on github.

License

The files in distfiles are under their respective licences. The materials that I wrote are hereby placed in the public domain.

Build

A whole stack of libraries needs to be built in order to be able to build CalculiX with PaStiX. Building CalculiX itself is the last step. In order:

  1. SPOOLES 2.2
  2. OpenBlas 0.3.26
  3. arpack-ng 3.9.1
  4. hwloc 2.10.0
  5. mfaverge-parsec-b580d208094e
  6. scotch 6.0.8
  7. PaStiX4CalculiX (cudaless branch from https://github.com/Kabbone/PaStiX4CalculiX)
  8. CalculiX 2.21

Before the build proper is started, the clean.sh script is started from the root directory of the repository. This will ensure a clean build.

Configuration

Except for parsec, all other libraries are available in the FreeBSD ports tree. However, their standard configurations are generally different than those required for this build. Therefore all the libraries are only built as static libraries so the code is linked into CalculiX so we don’t have multiple configurations of shared libraries around; that way lies madness. This can be done with environment variables like NO_SHARED=1, or it can be as a configuration options --disable-shared and --enable-static or -DBUILD_SHARED_LIBS=OFF.

In several of the build scripts the variable PREFIX is defined as the output of the pwd command. This is used as the location where the built libraries are to be installed, and where other libraries can be found. So it is important that all the build scripts are called from the directory in which they are located.

The compilers to use are often specified as environment variables:

  • CC=gcc13 for the C compiler,
  • CXX=g++13 for the C++ compiler,
  • FC=gfortran13 for the fortran compiler,
  • AR=gcc-ar13 for the archiver.

For both hwloc and PaRSEC I explicitly added libexecinfo and libpciaccess to the configuration.

SPOOLES

It is good to have SPOOLES available next to PaStiX as a back-up. Also, SPOOLES is generally faster for small calculations or eigenfreqency calculations.

For building SPOOLES, I basically applied the same patches as in the FreeBSD ports tree. Patches that I added were to ETree/src/transform.c and Utilities/src/iohb.c to fix compiler warnings. And of course Make.inc was patched to select the compiler and build options.

The script 01_build_spooles.sh is used to build SPOOLES. It is called as follows, to also redirect the output to a log file:

sh 01_build_spooles.sh |& tee logfiles/spooles.log

OpenBLAS

The routines in OpenBLAS, which not only includes BLAS but also LAPACK, are the core routines used in the solver.

The script 02_build_openblas.sh is used to build OpenBLAS:

sh 02_build_openblas.sh |& tee logfiles/openblas.log

This library does not require patches. It just needs to be configured for building and installing. The env program is used to communicate the configurations as environment variables to gmake.

The most important configurations are:

  • NO_SHARED=1 for a static library.
  • INTERFACE64=1 for 8-byte integers.
  • USE_THREAD=0 because CalculiX wants a single-threaded library.
  • USE_LOCKING=1 since CalculiX itself is threaded. Leaving this setting out results in a non-working CalculiX!.

Below is an example of what an example result looks like when OpenBLAS is configured without locking:

Failed analysis result because of misconfigured OpenBLAS

This is what it is supposed to look like:

Correct analysis result

There are two other settings that one might change;

  • BUFFERSIZE=25 increases the internal buffer from 32 MiB to 1 GiB.
  • DYNAMIC_ARCH=1 will build OpenBLAS for all revelant processor types and chooses the right one at runtime. Disabling this will make the build a lot faster but also restricts CalculiX to running on the same processor generation as it is built on (or a later one).

ARPACK

The ARPACK library is used to solve eigenvalue problems. The original library is no longer maintained, so we will use the fork arpack-ng.

The build script is 03_build_arpack.sh:

sh _03_build_arpack.sh|& tee logfiles/arpack.log

It is configured with the environment variable INTERFACE64=1 to use 8-byte integers. The compilers to use are configured like this as well. Important options given to the configure script are:

  • --with-blas=-lopenblas and --with-lapack=-lopenblas to tell it to use OpenBLAS.
  • --enable-static and --disable-shared for obvious reasons.

hwloc

The hwloc library is used as a portable abstraction for the topology of modern hardware.

Most of the possible extras were disabled in the build script 04_build_hwloc.sh, since we are only interested in the library:

sh 04_build_hwloc.sh|& tee logfiles/hwloc.log

PaRSEC

PaRSEC is used as a scheduling framework for multitasking.

The build script is 05_build_parsec.sh:

sh 05_build_parsec.sh|& tee logfiles/parsec.log

In this build I had to fix two header file names. This was done using sed. And I added a small patch to the parsec_bindthread that I found in one of the forum threads, IIRC.

Since CUDA is not used, it is possible to skip this library, and let CalculiX statically configure tasks. From what I’ve read, this might be suboptimal and/or require patching of CalculiX. For that reason I just included it.

scotch

Either scotch or METIS can be used by PaStiX for ordering of sparse matrices. However, I could not get PaStiX to detect METIS properly. And scotch is supposed to be faster, so scotch it is. If your CPU has more or less than 4 cores, you might want to change -DSCOTCH_PTHREAD_NUMBER=4 in patches/scotch/Makefile.inc.RFS accordingly before calling the build script 06_build_scotch.sh

sh 06_build_scotch.sh | & tee logfiles/scotch.log

PaStiX4CalculiX

As mentioned before, this is a custom version of PaStiX modified to work with CalculiX and without CUDA.

This library contains one version of the code (for complex numbers) which is then converted into other versions (float and double for example) at compile time. This older version of PaStiX requires Python 2.7 to do this. Python 3 will not work!

Note that you will have to adapt the build script 07_build_pastix_kabbone.sh if Python 2.7 is not in /usr/local/bin/python2.7:

sh 07_build_pastix_kabbone.sh|& tee logfiles/pastix.log

CalculiX

Finally it it time to build CalculiX using 08_build_calculix.sh:

sh 08_build_calculix.sh|& tee logfiles/calculix.log

A couple of patches are used to silence warnings. The Makefile is customized to configure the build and link to the correct libraries.

The build script installs the stripped binary in both ${PREFIX}/bin and ~/.local/bin as ccx_i8, to distinguish it from the regular FreeBSD package which uses ccx.


For comments, please send me an e-mail.


Related articles


←  Building vkQuake 1.22.0 on FreeBSD