-
Notifications
You must be signed in to change notification settings - Fork 37
Description
I am doing some tests on the LUMI AMD GPU for PR #801 .
The gcheck.exe standard test seems ok.
However fgcheck.exe segfaults.
./fgcheck.exe 2 64 2
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x152d03be3640 in ???
#1 0x152d03be2873 in ???
#2 0x152d02670dbf in ???
#3 0x152d03d56300 in ???
#4 0x152d03d56b64 in ???
#5 0x152d03d548bf in ???
#6 0x20c597 in ???
#7 0x20cd28 in ???
#8 0x152d0265b24c in ???
#9 0x20c3e9 in _start
at ../sysdeps/x86_64/start.S:120
#10 0xffffffffffffffff in ???
Segmentation fault
And also gdb does not help
(gdb) where
#0 0x0000155554f17300 in ?? () from /usr/lib64/libgfortran.so.4
#1 0x0000155554f17b65 in ?? () from /usr/lib64/libgfortran.so.4
#2 0x0000155554f158c0 in ?? () from /usr/lib64/libgfortran.so.4
#3 0x000000000020c598 in MAIN__ ()
#4 0x000000000020cd29 in main ()
I have done some poor man debugging by disabling stuff in fcheck_sa.f. It turns out that the error is in very simple stuff, already the READ statements.
The above is when I am using gfortran for fortran FC, and using the default cudacpp.mk where the link (of hip, fortran and c++) is done using hipcc. (For comparison, the same with nvcc works ok for cuda in my environments).
The only think that I was able to get to work, in this LUMI environment, involves two changes: one, use flang (hidden inside the ROC installation) instead of gfortran for FC; at the same time, use that same flang instead of hipcc for linking of fgcheck.exe, adding however -lstdc++ -L /opt/rocm-5.2.3/lib/ -lamdhip64 to the link command.
This is a problem I observed for fgceck.exe for now, but I guess that I would get the same for madevent? Maybe not, because it seems that we are actually linking madevent with the fortran compiler already (which is what would work here with flang). So I guess that we should probably always link fortran, c++ and GPU code with the fortran compiler? I will do more checks tomorrow.
By the way the issue above anyway does seem to need flang, so using gfortran for linking would not be ok if I tested this well. Maybe this would be easier with some nicer compiler combinations. I am not sure if @Jooorgen had observed anything like this?
I keep the details here for reference.