Hi,
I'm hitting an issue with the assembler implementation of pthread_cond_wait() that is used on x86_64 with nptl.
If a thread is canceled while waiting in pthread_cond_wait(), the condvar seems be left in an inconsistent state. A different thread first calling pthread_cond_signal()t then gets blocked in pthread_cond_destroy(). It seems that the function falsely assumes that there are other waiters in the condvar.
The test program below reproduces the issue, blocking in pthread_cond_destroy(). Switching to the C implementation on x86_64 fixes the problem here.
A trivial fix would just switch back to the C implementation, basically revert e928e223fd. That commit, however, states that the generic implementations are broken on x86_64. I couldn't reproduce that, though.
Has anybody else seen pthread_cond_wait() issues on x86_64?
Kind regards Martin
--
#include <pthread.h>
static pthread_mutex_t m; static pthread_cond_t c; static pthread_t t; static volatile int ready;
static void cancelcb(void *arg) { pthread_mutex_unlock(&m); }
static void* threadcb(void *arg) { pthread_mutex_lock(&m); pthread_cleanup_push(cancelcb, NULL);
ready = 1; while (1) pthread_cond_wait(&c, &m); pthread_cleanup_pop(1); }
int main(int argc, const char *argv[]) { pthread_mutex_init(&m, NULL); pthread_cond_init(&c, NULL);
pthread_create(&t, NULL, threadcb, NULL);
while (!ready);
pthread_cancel(t); pthread_join(t, NULL);
pthread_cond_signal(&c); pthread_cond_destroy(&c); pthread_mutex_destroy(&m);
return 0; }
Hi Martin, Martin Willi wrote,
Hi,
I'm hitting an issue with the assembler implementation of pthread_cond_wait() that is used on x86_64 with nptl.
If a thread is canceled while waiting in pthread_cond_wait(), the condvar seems be left in an inconsistent state. A different thread first calling pthread_cond_signal()t then gets blocked in pthread_cond_destroy(). It seems that the function falsely assumes that there are other waiters in the condvar.
The test program below reproduces the issue, blocking in pthread_cond_destroy(). Switching to the C implementation on x86_64 fixes the problem here.
A trivial fix would just switch back to the C implementation, basically revert e928e223fd. That commit, however, states that the generic implementations are broken on x86_64. I couldn't reproduce that, though.
Has anybody else seen pthread_cond_wait() issues on x86_64?
The test failures for x86_64 are still high: http://tests.embedded-test.org/uClibc-ng/1.0.15/REPORT.x86_64.libc.uClibc-ng...
Can you compare with GNU libc pthread_cond_wait()? Do they use an assembler version or just C?
Thanks, Waldemar
Hi Martin, Martin Willi wrote,
Hi,
I'm hitting an issue with the assembler implementation of pthread_cond_wait() that is used on x86_64 with nptl.
If a thread is canceled while waiting in pthread_cond_wait(), the condvar seems be left in an inconsistent state. A different thread first calling pthread_cond_signal()t then gets blocked in pthread_cond_destroy(). It seems that the function falsely assumes that there are other waiters in the condvar.
The test program below reproduces the issue, blocking in pthread_cond_destroy(). Switching to the C implementation on x86_64 fixes the problem here.
A trivial fix would just switch back to the C implementation, basically revert e928e223fd. That commit, however, states that the generic implementations are broken on x86_64. I couldn't reproduce that, though.
Has anybody else seen pthread_cond_wait() issues on x86_64?
Your testcase works with GNU libc. Do you like do find out what is wrong in the assembly in our files? A diff shows some differences...
I would rather like to sync with GNU libc, as the whole NPTL/TLS stuff is mostly from there.
best regards Waldemar
Hi Waldemar,
Thanks for your answer.
Has anybody else seen pthread_cond_wait() issues on x86_64?
The test failures for x86_64 are still high: http://tests.embedded-test.org/uClibc- ng/1.0.15/REPORT.x86_64.libc.uClibc-ng-1.0.15
I don't see some of these test failures here locally, but instead see failing others. But obviously there are other issues on x86_64 than the one I've stumbled upon.
Can you compare with GNU libc pthread_cond_wait()? Do they use an assembler version or just C?
It seems that glibc uses the assembler variants on x86_64.
Your testcase works with GNU libc. Do you like do find out what is wrong in the assembly in our files? A diff shows some differences...
The differences I could spot were mostly related to two fixes for the priority inversion Futex code, namely glibc commits c30e8edf and 0e3b5d6a. Not sure, but likely these are related to the issues we see.
I would rather like to sync with GNU libc, as the whole NPTL/TLS stuff is mostly from there.
That certainly makes sense. All the changes glibc has seen are not trivial, though, and unfortunately I'm not sure if I can find the time to work on a proper patch set at this time :-/.
Best regards Martin
Hi Martin, Martin Willi wrote,
Hi Waldemar,
Thanks for your answer.
Has anybody else seen pthread_cond_wait() issues on x86_64?
The test failures for x86_64 are still high: http://tests.embedded-test.org/uClibc- ng/1.0.15/REPORT.x86_64.libc.uClibc-ng-1.0.15
I don't see some of these test failures here locally, but instead see failing others. But obviously there are other issues on x86_64 than the one I've stumbled upon.
Can you compare with GNU libc pthread_cond_wait()? Do they use an assembler version or just C?
It seems that glibc uses the assembler variants on x86_64.
Your testcase works with GNU libc. Do you like do find out what is wrong in the assembly in our files? A diff shows some differences...
The differences I could spot were mostly related to two fixes for the priority inversion Futex code, namely glibc commits c30e8edf and 0e3b5d6a. Not sure, but likely these are related to the issues we see.
I would rather like to sync with GNU libc, as the whole NPTL/TLS stuff is mostly from there.
That certainly makes sense. All the changes glibc has seen are not trivial, though, and unfortunately I'm not sure if I can find the time to work on a proper patch set at this time :-/.
You are right very diverted the code. Better a working C implementation, then a fast and broken assembly implementation. See commit: http://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/commit/?id=084e597e9f8e630e...
best regards Waldemar