Skip to content

build fails with mpich: ngappinit.cpp:11 mpicxx.h:160: error: multiple definitions of Status type #193

@drew-parsons

Description

@drew-parsons

Debian now uses mpich as the default MPI on 32-bit architectures such as armel, armhf, i386.

This reveals a bug in netgen building against mpich. The error (from armel) is

[100%] Building CXX object ng/CMakeFiles/netgen.dir/ngappinit.cpp.o
cd /<<PKGBUILDDIR>>/obj-arm-linux-gnueabi/ng && /usr/bin/c++ -DFFMPEG -DHAVE_DLFCN_H -DHAVE_FREEIMAGE -DHAVE_FREETYPE -DHAVE_OPENGL_EXT -DHAVE_RAPIDJSON -DHAVE_TBB -DHAVE_TK -DHAVE_XLIB -DIGNORE_NO_ATOMICS -DINTERNAL_TCL_DEFAULT=1 -DJPEGLIB -DNETGEN_PYTHON -DNG_PYTHON -DOCCGEOMETRY -DOCC_CONVERT_SIGNALS -DOPENGL -DPARALLEL -DPYBIND11_SIMPLE_GIL_MANAGEMENT -DTCL -DTOGL_X11 -DUSE_TCL_STUBS -DUSE_TK_STUBS -DUSE_TOGL_2 -D_GLIBCXX_USE_CXX11_ABI=1 -D__STDC_CONSTANT_MACROS -I/<<PKGBUILDDIR>>/obj-arm-linux-gnueabi/ng -I/<<PKGBUILDDIR>>/ng -I/<<PKGBUILDDIR>>/obj-arm-linux-gnueabi -I/<<PKGBUILDDIR>>/include -I/<<PKGBUILDDIR>>/libsrc -I/<<PKGBUILDDIR>>/libsrc/include -I/usr/include/opencascade -I/usr/lib/arm-linux-gnueabi/mpich/include -I/usr/include/python3.12 -I/usr/include/tcl -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 -O2 -g -DNDEBUG -std=gnu++17 -fvisibility=hidden -fabi-version=19 -MD -MT ng/CMakeFiles/netgen.dir/ngappinit.cpp.o -MF CMakeFiles/netgen.dir/ngappinit.cpp.o.d -o CMakeFiles/netgen.dir/ngappinit.cpp.o -c /<<PKGBUILDDIR>>/ng/ngappinit.cpp
In file included from /usr/include/tcl/tk.h:99,
                 from /<<PKGBUILDDIR>>/libsrc/include/inctcl.hpp:7,
                 from /<<PKGBUILDDIR>>/ng/ngappinit.cpp:10:
/usr/lib/arm-linux-gnueabi/mpich/include/mpicxx.h:160:18: error: expected identifier before ‘int’
  160 |     friend class Status;
      |                  ^~~~~~
In file included from /usr/lib/arm-linux-gnueabi/mpich/include/mpi.h:977,
                 from /<<PKGBUILDDIR>>/libsrc/core/ng_mpi.hpp:14,
                 from /<<PKGBUILDDIR>>/libsrc/core/mpi_wrapper.hpp:13,
                 from /<<PKGBUILDDIR>>/libsrc/include/../meshing/meshtype.hpp:13,
                 from /<<PKGBUILDDIR>>/libsrc/include/../meshing/meshing.hpp:23,
                 from /<<PKGBUILDDIR>>/libsrc/include/meshing.hpp:1,
                 from /<<PKGBUILDDIR>>/ng/ngappinit.cpp:11:
/usr/lib/arm-linux-gnueabi/mpich/include/mpicxx.h:160:12: error: multiple types in one declaration
  160 |     friend class Status;
      |            ^~~~~
/usr/lib/arm-linux-gnueabi/mpich/include/mpicxx.h:160:5: error: friend declaration does not name a class or function
  160 |     friend class Status;
      |     ^~~~~~
/usr/lib/arm-linux-gnueabi/mpich/include/mpicxx.h:493:7: error: expected identifier before ‘int’
  493 | class Status  {
      |       ^~~~~~
/usr/lib/arm-linux-gnueabi/mpich/include/mpicxx.h:493:15: error: expected unqualified-id before ‘{’ token
  493 | class Status  {
      |               ^

Ultimately the error occurs when the build is configured with -DPARALLEL. If I understood correctly, this is set by libsrc/core/CMakeLists.txt when USE_MPI is set.

ng/ngappinit.cpp includes both inctcl.hpp (l.10) and mpi_wrapper.hpp (l.12). mpi_wrapper.hpp is also included indirectly via meshing.hpp (l.11)

inctcl.hpp includes tcl/tk.h, which in turn includes (l.99) X11/Xlib.h. Xlib.h defines (l.83)

#define Status int

At the same time core/mpi_wrapper.hpp includes ng_mpi.hpp, which includes mpi.h (if PARALLEL is set).
mpi.h from mpich includes mpicxx.h which defines (l.160) Status as a class. The compilation error arises from the conflicting definitions of Status.

Commit c2af423 already removed use of MPI from the netgen gui executable and therefore from ngappinit.cpp. It didn't address the issue of the definition of PARALLEL.

A successful mpich build can be obtained simply by unsetting PARALLEL for ngappinit.cpp, e.g.

Index: netgen/ng/CMakeLists.txt
===================================================================
--- netgen.orig/ng/CMakeLists.txt	2024-09-16 15:05:13.163140919 +0200
+++ netgen/ng/CMakeLists.txt	2024-09-16 15:16:17.355536503 +0200
@@ -20,6 +20,7 @@
 
     if(NOT BUILD_FOR_CONDA)
       add_executable(netgen ngappinit.cpp)
+      set_source_files_properties(ngappinit.cpp PROPERTIES COMPILE_FLAGS "-UPARALLEL")
     if(WIN32)
       target_sources(netgen PRIVATE ../windows/netgen.rc)
     endif(WIN32)

This is sufficient for a successful build.

Arguably it might be more consistent to instead unset PARALLEL for the entire netgen target.

target_compile_definitions(netgen PUBLIC "-UPARALLEL")

I didn't do this in the patch above since I was not sure if the gui might be used to launch mpi processes (given the ParallelRun() function in ng/parallelfunc.cpp).

Setting -UPARALLEL means the compile line might contain -DPARALLEL -UPARALLEL, including the original flag for general parallel support in the netgen build. This looks a little strange, but is safe since the final value takes priority. In principle it would be possible to make cmake parse the compile options for the netgen target and remove -DPARALLEL, but that would be a little more complex than the one-line patch suggested here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions