Hello uClibc-ng hackers, Damien, Rich,
I am sending this email to discuss the possibility to revert
https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/commit/?id=08d46f1ce21e4e…
I think this change is wrong, I have been tracking an issue for a few
days and I am pretty sure the issue comes from this commit.
Basically what happens is that with this commit, tst-cancel18 test is
failing.
The test is available over there:
https://github.com/wbx-github/uclibc-ng-test/blob/master/test/nptl/tst-canc…
The reason for this fail is that clock_nanosleep code from libc is badly
compiled:
https://elixir.bootlin.com/uclibc-ng/latest/source/librt/clock_nanosleep.c#…
Because SINGLE_THREAD_P is actually defined, in case your libc build
supports NPTL threads (which is, I think, the majority of configs,
except very old systems using LINUXTHREADS or NOMMU maybe).
In my case (kvx arch):
#define SINGLE_THREAD_P __builtin_expect (THREAD_GETMEM (THREAD_SELF,
header.multiple_threads) == 0, 1)
See:
https://elixir.bootlin.com/uclibc-ng/latest/source/libpthread/nptl/sysdeps/…
This results to clock_nanosleep code being compiled to a very simple
version that directly calls the syscall and does not change at all the
thread cancellability to asynchronous before calling the syscall.
Therefore, when the test main thread calls pthread_cancel(), no signal
is sent (see
https://elixir.bootlin.com/uclibc-ng/latest/source/libpthread/nptl/pthread_…)
and therefore the nanosleep syscall is never interrupted and the test fails.
clock_nanosleep is defined to be a mandatory cancellation point by
POSIX, it must be interruptible in all cases by pthread_cancel, whether
in synchronous or asynchronous mode.
I think the fix for the NOMMU case should be different, maybe checking
for if LIBC_CANCEL_ASYNC is defined ?
Damien do you have time to work on a new fix for the NOMMU case? Are you
OK with the revert?
Not OK with a revert as that will bring back the compilation errors for RISC-V
NOMMU case.
Instead, please work on a fix for your case on top of the current code. If I
recall correctly, the code around this is rather messy with lots of ifdefs,
which may be why this regression was introduced. Cleaning up the code first to
avoid that may help avoiding further issues in the future.
Note that to test patches/regressions for risc-v NOMMU builds, you can simply
use buildroot and run:
make sipeed_maix_bit_defconfig
make
To check that it builds.