On Tue, Dec 29, 2015 at 9:18 PM, Alex Potapenko opotapenko@gmail.com wrote:
I'm unable to get core dump due to system limitations (it's a router): I run `ulimit -c unlimited`, yet core dump is never to be found on the system after crash :(
Looks like your kernel config simply disables core dumps, just check that CONFIG_ELF_CORE parameter is set to yes.
Using gdb, I got this:
#0 0x4004c05c in _dl_debug_state () from /opt/lib/ld-uClibc.so.1 No symbol table info available. #1 0x40050394 in _dl_get_ready_to_run () from /opt/lib/ld-uClibc.so.1 No symbol table info available. #2 0x40050864 in ?? () from /opt/lib/ld-uClibc.so.1 No symbol table info available. Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Just gdb notification, non informative.
I then installed non-stripped uclibc-ng, libffi and openjdk8 libraries, and got a more informative output (see gdb.log attached).
Yes, it is much better (gdb.log): ... Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x40ac24d0 (LWP 15709)] 0x4010c804 in initLocalIfs () from /opt/lib/openjdk8/arm/libnet.so (gdb) bt full #0 0x4010c804 in initLocalIfs () from /opt/lib/openjdk8/arm/libnet.so No symbol table info available. #1 0x4010b3f4 in JNI_OnLoad () from /opt/lib/openjdk8/arm/libnet.so No symbol table info available. #2 0x40135de0 in Java_java_lang_ClassLoader_00024NativeLibrary_load () from /opt/lib/jvm/openjdk8/jre/lib/arm/libjava.so No symbol table info available. #3 0x401a433c in ffi_call_SYSV () from /opt/lib/libffi.so.6 No symbol table info available. #4 0x401a5a54 in ffi_call () from /opt/lib/libffi.so.6 No symbol table info available. #5 0x404c1640 in CppInterpreter::native_entry(Method*, int, Thread*) () from /opt/lib/jvm/openjdk8/jre/lib/arm/server/libjvm.so ...
but points inside the OpenJDK native code. I'm found initLocalIfs() function at jdk/src/solaris/native/java/net/net_util_md.c - very strange for me, since you are running on linux. Probably, it is due I'm not familiar with OpenJDK internals.
Anyway, are you able to:
1) update OpenJDK to latest jdk8u76-b01 & uClibc-ng to 1.0.10 ?
2) disable use of IPv6 at all, to exclude any chance of read of "/proc/net/if_inet6" in initLocalIfs() ?
3) compile only libnet.so with -g3 switch to get a much more of debug info ?
P.S. You forgot to select "Reply all" to allow anyone see your mail in mailing list
Leonid
Hi,
Sorry for my late reply.
2015-12-30 10:24 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
On Tue, Dec 29, 2015 at 9:18 PM, Alex Potapenko opotapenko@gmail.com wrote:
Anyway, are you able to:
update OpenJDK to latest jdk8u76-b01 & uClibc-ng to 1.0.10 ?
disable use of IPv6 at all, to exclude any chance of read of
"/proc/net/if_inet6" in initLocalIfs() ?
- compile only libnet.so with -g3 switch to get a much more of debug info ?
I'll try to do it all, just a bit later, since I won't probably have time for this until approx. the middle of January. Just to let you (and everyone else) know that I'm still willing to carry this on.
P.S. You forgot to select "Reply all" to allow anyone see your mail in mailing list
Indeed, sorry to everyone for this mistake.
Thank you, Alex Potapenko
P.S. Merry Christmas and happy New Year!
Hi Alex, Alex Potapenko wrote,
Hi,
Sorry for my late reply.
2015-12-30 10:24 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
On Tue, Dec 29, 2015 at 9:18 PM, Alex Potapenko opotapenko@gmail.com wrote:
Anyway, are you able to:
update OpenJDK to latest jdk8u76-b01 & uClibc-ng to 1.0.10 ?
disable use of IPv6 at all, to exclude any chance of read of
"/proc/net/if_inet6" in initLocalIfs() ?
- compile only libnet.so with -g3 switch to get a much more of debug info ?
I'll try to do it all, just a bit later, since I won't probably have time for this until approx. the middle of January. Just to let you (and everyone else) know that I'm still willing to carry this on.
I have tested with OpenJDK7 with x86/arm. On X86 it starts up fine. On ARM I have some issues with illegal instructions. I tried hard-float with NEON optimization on a Raspberry PI2. Did you use hard-float on ARM?
Next I try to port OpenJDK8 to OpenADK and will check on X86/MIPS and then ARM. I am 70% done with the OpenJDK8 port.
best regards Waldemar
Hi Leonid, Waldemar,
2015-12-30 10:24 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
Anyway, are you able to:
update OpenJDK to latest jdk8u76-b01 & uClibc-ng to 1.0.10 ?
disable use of IPv6 at all, to exclude any chance of read of
"/proc/net/if_inet6" in initLocalIfs() ?
- compile only libnet.so with -g3 switch to get a much more of debug info ?
I've upgraded OpenJDK and uClibc-ng, compiled libnet.so (and all OpenJDK8) with -g3 switch, but gdb output hasn't changed much (see log attached).
2016-01-06 10:56 GMT+02:00 Waldemar Brodkorb wbx@uclibc-ng.org:
I have tested with OpenJDK7 with x86/arm. On X86 it starts up fine. On ARM I have some issues with illegal instructions. I tried hard-float with NEON optimization on a Raspberry PI2. Did you use hard-float on ARM?
I use soft-float on ARM, the armv7 uclibc-ng Optware-ng feed targets FPU-less devices (like routers).
Next I try to port OpenJDK8 to OpenADK and will check on X86/MIPS and then ARM. I am 70% done with the OpenJDK8 port.
That's great! I really hope you can get OpenJDK8 to work properly.
On Sat, Jan 9, 2016 at 11:14 PM, Alex Potapenko opotapenko@gmail.com wrote:
Hi Leonid, Waldemar,
2015-12-30 10:24 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
Anyway, are you able to:
update OpenJDK to latest jdk8u76-b01 & uClibc-ng to 1.0.10 ?
disable use of IPv6 at all, to exclude any chance of read of
"/proc/net/if_inet6" in initLocalIfs() ?
- compile only libnet.so with -g3 switch to get a much more of debug info ?
I've upgraded OpenJDK and uClibc-ng, compiled libnet.so (and all OpenJDK8) with -g3 switch, but gdb output hasn't changed much (see log attached).
Well, it is a not well known bug, at least.
For unknown reasons, gdb unable to read full debug info from libnet.so:
#0 0x53be1804 in initLocalIfs () from /opt/lib/openjdk8/arm/libnet.so No symbol table info available.
Probably strip was run at install/deploy phase? Second chance, I see, is to use addr2line utility against non-stripped library.
First of all, since initLocalIfs() is in shared library, we have to know load address of it:
(gdb) info shared
From To Syms Read Shared Object Library
0xf7ff7b20 0xf7ffaf9f Yes (*) /lib/ld-uClibc.so.1 0xf7faa8d0 0xf7fab475 Yes (*) /lib/libdl.so.1 0xf7f74860 0xf7fa2b88 Yes /lib/libc.so.1 ...
Find line corresponding libnet.so, subtract "From" address from target 0x53be1804. You will get offset inside libnet.so and it is time to run addr2line on cross-compile host:
$ addr2line -j .text -f -e libnet.so -a <offset>
You should get the line in source src/solaris/native/java/net/net_util_md.c.
P.S. If you use address space layout randomization(ASLR) on target, you must use addresses from the single gdb session.
Regards, Leonid
Hi Alex, Alex Potapenko wrote,
Hi Leonid, Waldemar,
2015-12-30 10:24 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
Anyway, are you able to:
update OpenJDK to latest jdk8u76-b01 & uClibc-ng to 1.0.10 ?
disable use of IPv6 at all, to exclude any chance of read of
"/proc/net/if_inet6" in initLocalIfs() ?
- compile only libnet.so with -g3 switch to get a much more of debug info ?
I've upgraded OpenJDK and uClibc-ng, compiled libnet.so (and all OpenJDK8) with -g3 switch, but gdb output hasn't changed much (see log attached).
2016-01-06 10:56 GMT+02:00 Waldemar Brodkorb wbx@uclibc-ng.org:
I have tested with OpenJDK7 with x86/arm. On X86 it starts up fine. On ARM I have some issues with illegal instructions. I tried hard-float with NEON optimization on a Raspberry PI2. Did you use hard-float on ARM?
I use soft-float on ARM, the armv7 uclibc-ng Optware-ng feed targets FPU-less devices (like routers).
Next I try to port OpenJDK8 to OpenADK and will check on X86/MIPS and then ARM. I am 70% done with the OpenJDK8 port.
That's great! I really hope you can get OpenJDK8 to work properly.
So OpenJDK8 port is ready. It startup fine on my IBM X40 (pentium-m) laptop and I can access the webinterface. On my mikrotik rb532 it startsup, but then I get OOM. (the board have only 32 MB RAM IIRC). Next I try soft-float for Rpi2.
I use this combined patch: http://cgit.openadk.org/cgi/cgit/openadk.git/tree/package/openjdk8/files/ope...
When does the crash happens? On startup?
best regards Waldemar
Hi, Waldemar Brodkorb wrote,
Hi Alex, Alex Potapenko wrote,
Hi Leonid, Waldemar,
2015-12-30 10:24 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
Anyway, are you able to:
update OpenJDK to latest jdk8u76-b01 & uClibc-ng to 1.0.10 ?
disable use of IPv6 at all, to exclude any chance of read of
"/proc/net/if_inet6" in initLocalIfs() ?
- compile only libnet.so with -g3 switch to get a much more of debug info ?
I've upgraded OpenJDK and uClibc-ng, compiled libnet.so (and all OpenJDK8) with -g3 switch, but gdb output hasn't changed much (see log attached).
2016-01-06 10:56 GMT+02:00 Waldemar Brodkorb wbx@uclibc-ng.org:
I have tested with OpenJDK7 with x86/arm. On X86 it starts up fine. On ARM I have some issues with illegal instructions. I tried hard-float with NEON optimization on a Raspberry PI2. Did you use hard-float on ARM?
I use soft-float on ARM, the armv7 uclibc-ng Optware-ng feed targets FPU-less devices (like routers).
Next I try to port OpenJDK8 to OpenADK and will check on X86/MIPS and then ARM. I am 70% done with the OpenJDK8 port.
That's great! I really hope you can get OpenJDK8 to work properly.
So OpenJDK8 port is ready. It startup fine on my IBM X40 (pentium-m) laptop and I can access the webinterface. On my mikrotik rb532 it startsup, but then I get OOM. (the board have only 32 MB RAM IIRC). Next I try soft-float for Rpi2.
I use this combined patch: http://cgit.openadk.org/cgi/cgit/openadk.git/tree/package/openjdk8/files/ope...
When does the crash happens? On startup?
It now starts on my rpi2 with soft-float fine. With uClibc-ng 1.0.11.
best regards Waldemar
According to gdb.log, I suspect platform related problem. Dedicated initLocalIfs() code, extracted from src/solaris/native/java/net/net_util_md.c, works fine for me on MIPS32.
Alex, could you provide output of "cat /proc/net/if_inet6" from buggy host? It helps problem detection in case of fault occur at parse.
Regards, Leonid
On Wed, Jan 13, 2016 at 12:08 AM, Waldemar Brodkorb wbx@uclibc-ng.org wrote:
Hi Alex,
It now starts on my rpi2 with soft-float fine. With uClibc-ng 1.0.11.
best regards Waldemar
Hi Waldemar, Leonid,
On Wed, Jan 13, 2016 at 12:08 AM, Waldemar Brodkorb wbx@uclibc-ng.org wrote:
Hi Alex,
It now starts on my rpi2 with soft-float fine. With uClibc-ng 1.0.11.
From what I can gather, your combined patch isn't too different from
patches that I use in Optware-ng, correct? Or there're some differences which I should try? Also, I'm now using uClibc-ng 1.0.10, will upgrade to 1.0.11 asap.
2016-01-13 14:32 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
According to gdb.log, I suspect platform related problem. Dedicated initLocalIfs() code, extracted from src/solaris/native/java/net/net_util_md.c, works fine for me on MIPS32.
Alex, could you provide output of "cat /proc/net/if_inet6" from buggy host? It helps problem detection in case of fault occur at parse.
Here's the output on ARMv7:
[root@unknown root]$ cat /proc/net/if_inet6 00000000000000000000000000000001 01 80 10 80 lo
on MIPS32:
root@unknown:/tmp/home/root# cat /proc/net/if_inet6 00000000000000000000000000000001 01 80 10 80 lo
It's basically the same on both hosts, and it makes sense, since both routers run tomatousb firmwares.
I don't know why gdb output doesn't have symbol table info available: I installed not stripped versions of libs and binaries:
[root@unknown root]$ file /opt/lib/openjdk8/arm/libnet.so /opt/lib/openjdk8/arm/libnet.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, not stripped
Best regards, Alex
On Fri, Jan 15, 2016 at 4:20 PM, Alex Potapenko opotapenko@gmail.com wrote:
Alex, could you provide output of "cat /proc/net/if_inet6" from buggy host? It helps problem detection in case of fault occur at parse.
Here's the output on ARMv7:
[root@unknown root]$ cat /proc/net/if_inet6 00000000000000000000000000000001 01 80 10 80 lo
on MIPS32:
root@unknown:/tmp/home/root# cat /proc/net/if_inet6 00000000000000000000000000000001 01 80 10 80 lo
It's basically the same on both hosts, and it makes sense, since both routers run tomatousb firmwares.
It's OK, thanks.
I don't know why gdb output doesn't have symbol table info available: I installed not stripped versions of libs and binaries:
[root@unknown root]$ file /opt/lib/openjdk8/arm/libnet.so /opt/lib/openjdk8/arm/libnet.so: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, not stripped
Please check presence of full line information in debug sections:
readelf -WS /opt/lib/openjdk8/arm/libnet.so | grep .debug_macinfo
Have you tried addr2line method?
Which gcc version you are using? Could it be optimization issues? Will something changed if build OpenJDK without compiler optimizations, gcc flag "-O0" (disable -Os/-O2) ?
regards, Leonid
Hi Leonid,
2016-01-16 11:13 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
Please check presence of full line information in debug sections:
readelf -WS /opt/lib/openjdk8/arm/libnet.so | grep .debug_macinfo
It's not there, indeed:
[root@unknown root]$ readelf -WS /opt/lib/openjdk8/arm/libnet.so | grep .debug_macinfo [root@unknown root]$
Have you tried addr2line method?
No, but seeing full line information isn't there, it seems there's no sense too.
Which gcc version you are using? Could it be optimization issues? Will something changed if build OpenJDK without compiler optimizations, gcc flag "-O0" (disable -Os/-O2) ?
I'm using gcc-5.2.0. I haven't tried "-O0" flag, and I remember I used to debug some applications with "-O2" built with the same toolchain and '-g' flag, and they had line information. I've now noticed I have "--disable-debug-symbols" and "--disable-zip-debug-info" in OpenJDK configure options, maybe it's filtering out the '-g3' option I added manually? I should try to disable those options.
Best regards, Alex
Hi Alex, Alex Potapenko wrote,
Hi Leonid,
2016-01-16 11:13 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
Please check presence of full line information in debug sections:
readelf -WS /opt/lib/openjdk8/arm/libnet.so | grep .debug_macinfo
It's not there, indeed:
[root@unknown root]$ readelf -WS /opt/lib/openjdk8/arm/libnet.so | grep .debug_macinfo [root@unknown root]$
Have you tried addr2line method?
No, but seeing full line information isn't there, it seems there's no sense too.
Which gcc version you are using? Could it be optimization issues? Will something changed if build OpenJDK without compiler optimizations, gcc flag "-O0" (disable -Os/-O2) ?
I'm using gcc-5.2.0. I haven't tried "-O0" flag, and I remember I used to debug some applications with "-O2" built with the same toolchain and '-g' flag, and they had line information. I've now noticed I have "--disable-debug-symbols" and "--disable-zip-debug-info" in OpenJDK configure options, maybe it's filtering out the '-g3' option I added manually? I should try to disable those options.
I am still missing your info, when does the failure occurs. On startup? Can you also put up a complete strace log from the ARM soft-float system. So I can compare.
And may be you try with just my OpenJDK8 patch and verify, that it is not another change in your used patches....
best regards Waldemar
On Sun, Jan 17, 2016 at 4:38 PM, Alex Potapenko opotapenko@gmail.com wrote:
Which gcc version you are using? Could it be optimization issues? Will something changed if build OpenJDK without compiler optimizations, gcc flag "-O0" (disable -Os/-O2) ?
I'm using gcc-5.2.0. I haven't tried "-O0" flag, and I remember I used to debug some applications with "-O2" built with the same toolchain and '-g' flag, and they had line information. I've now noticed I have "--disable-debug-symbols" and "--disable-zip-debug-info" in OpenJDK configure options, maybe it's filtering out the '-g3' option I added manually? I should try to disable those options.
You are "running on a razor's edge". 5.2 is not so stable, as most people want to have. Anyway, please try to disable gcc optimizations.
Sorry, I can't provide suggestion about OpenJDK configure options.
Regards, Leonid
Hi Leonid, Waldemar,
2016-01-17 18:41 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
You are "running on a razor's edge". 5.2 is not so stable, as most people want to have.
Yes, I'm more or less aware of that, however up till now I haven't encountered any gcc-specific bugs. If I do have serious issues, I can safely roll back to 4.9.x, since I used '--with-default-libstdcxx-abi=gcc4-compatible' gcc configure option, so there is no need to recomiple entire feeds if I do this.
Anyway, please try to disable gcc optimizations.
I enabled debug symbols, switched to 'fastdebug' debug level, and now I have 'debuginfo' files. I also upgraded to 8u76-b02 and used Waldemar's combined patch. For some reason, gdb backtrace now hangs at certain stage (see gdb2.log), but at least we have the culprit line (openjdk/jdk/src/solaris/native/java/net/net_util_md.c:717):
672:static void initLocalIfs () { 673: FILE *f; 674: unsigned char staddr [16]; 675: char ifname [33]; 676: struct localinterface *lif=0; 677: int index, x1, x2, x3; 678: unsigned int u0,u1,u2,u3,u4,u5,u6,u7,u8,u9,ua,ub,uc,ud,ue,uf; 679: 680: if ((f = fopen("/proc/net/if_inet6", "r")) == NULL) { 681: return ; 682: } 683: while (fscanf (f, "%2x%2x%2x%2x%2x%2x%2x%2x%2x%2x%2x%2x%2x%2x%2x%2x " 684: "%d %x %x %x %32s",&u0,&u1,&u2,&u3,&u4,&u5,&u6,&u7, 685: &u8,&u9,&ua,&ub,&uc,&ud,&ue,&uf, 686: &index, &x1, &x2, &x3, ifname) == 21) { 687: staddr[0] = (unsigned char)u0; 688: staddr[1] = (unsigned char)u1; 689: staddr[2] = (unsigned char)u2; 690: staddr[3] = (unsigned char)u3; 691: staddr[4] = (unsigned char)u4; 692: staddr[5] = (unsigned char)u5; 693: staddr[6] = (unsigned char)u6; 694: staddr[7] = (unsigned char)u7; 695: staddr[8] = (unsigned char)u8; 696: staddr[9] = (unsigned char)u9; 697: staddr[10] = (unsigned char)ua; 698: staddr[11] = (unsigned char)ub; 699: staddr[12] = (unsigned char)uc; 700: staddr[13] = (unsigned char)ud; 701: staddr[14] = (unsigned char)ue; 702: staddr[15] = (unsigned char)uf; 703: nifs ++; 704: if (nifs > localifsSize) { 705: localifs = (struct localinterface *) realloc ( 706: localifs, sizeof (struct localinterface)* (localifsSize+5)); 707: if (localifs == 0) { 708: nifs = 0; 709: fclose (f); 710: return; 711: } 712: lif = localifs + localifsSize; 713: localifsSize += 5; 714: } else { 715: lif ++; 716: } 717: memcpy (lif->localaddr, staddr, 16); 718: lif->index = index; 719: } 720: fclose (f); 721:}
2016-01-17 15:55 GMT+02:00 Waldemar Brodkorb wbx@uclibc-ng.org:
I am still missing your info, when does the failure occurs. On startup? Can you also put up a complete strace log from the ARM soft-float system. So I can compare.
And may be you try with just my OpenJDK8 patch and verify, that it is not another change in your used patches....
The failure occurs some while after startup. I tried your patch, it looks to make no difference compared to 'my' patches. See complete strace log attached.
Best regards, Alex
On Wed, Jan 20, 2016 at 3:51 PM, Alex Potapenko opotapenko@gmail.com wrote:
I enabled debug symbols, switched to 'fastdebug' debug level, and now I have 'debuginfo' files. I also upgraded to 8u76-b02 and used Waldemar's combined patch. For some reason, gdb backtrace now hangs at certain stage (see gdb2.log),
If you enabled full debug info in all jdk libraries, it might be not enough memory to load it.
but at least we have the culprit line (openjdk/jdk/src/solaris/native/java/net/net_util_md.c:717):
Good catch! So, the problem is in code performing memory allocations for localinterface structure, due to lif variable points to 0x14 (broken/dummy address). Parse of /proc/net/if_inet6 completed successfully.
Could you: 1. put breakpoint at line 704 2. run until break 3. display variables nifs , localifs , localifsSize 4. next, next - until line 707 5. display variable localifs ?
Looks like this piece of code misinterpret realloc() return value.
703: nifs ++; 704: if (nifs > localifsSize) { 705: localifs = (struct localinterface *) realloc ( 706: localifs, sizeof (struct localinterface)* (localifsSize+5)); 707: if (localifs == 0) { 708: nifs = 0; 709: fclose (f); 710: return; 711: } 712: lif = localifs + localifsSize; 713: localifsSize += 5; 714: } else { 715: lif ++; 716: } 717: memcpy (lif->localaddr, staddr, 16); 718: lif->index = index; 719: } 720: fclose (f); 721:}
Additionally, please provide information about malloc config settings of your uClibc-ng. For example,
$ grep MALLOC .config
Regards, Leonid
2016-01-20 15:42 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
If you enabled full debug info in all jdk libraries, it might be not enough memory to load it.
That'd be very strange, since I have 1 GB swap partition
Good catch! So, the problem is in code performing memory allocations for localinterface structure, due to lif variable points to 0x14 (broken/dummy address). Parse of /proc/net/if_inet6 completed successfully.
Could you:
- put breakpoint at line 704
- run until break
- display variables nifs , localifs , localifsSize
- next, next - until line 707
- display variable localifs
?
This should be it (if needed, see entire gdb3.log):
Line 704: 1: nifs = 0 2: localifs = (struct localinterface *) 0x0 3: localifsSize = 0
Line 707: 1: nifs = 1 2: localifs = (struct localinterface *) 0x0 3: localifsSize = 0
Looks like this piece of code misinterpret realloc() return value.
...
Additionally, please provide information about malloc config settings of your uClibc-ng. For example,
$ grep MALLOC .config
Here it is:
$ grep MALLOC .config # MALLOC is not set # MALLOC_SIMPLE is not set MALLOC_STANDARD=y MALLOC_GLIBC_COMPAT=y # UCLIBC_MALLOC_DEBUGGING is not set
Best regards, Alex
2016-01-20 16:27 GMT+02:00 Alex Potapenko opotapenko@gmail.com:
This should be it (if needed, see entire gdb3.log):
Line 704: 1: nifs = 0 2: localifs = (struct localinterface *) 0x0 3: localifsSize = 0
Line 707: 1: nifs = 1 2: localifs = (struct localinterface *) 0x0 3: localifsSize = 0
I realized this was probably not what you wanted, so I made breakpoints on lines 703-717 and went on until segfault: see gdb4.log.
Best regards, Alex
On Wed, Jan 20, 2016 at 6:00 PM, Alex Potapenko opotapenko@gmail.com wrote:
I realized this was probably not what you wanted, so I made breakpoints on lines 703-717 and went on until segfault: see gdb4.log.
It's excellent - a bit more information much better than fewer! Preliminary, as for my point of view, it is compiler problem. But, for confidence, please repeat actions you done in gdb4.log and add to display list lif variable.
Regards, Leonid
2016-01-20 17:23 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
It's excellent - a bit more information much better than fewer! Preliminary, as for my point of view, it is compiler problem. But, for confidence, please repeat actions you done in gdb4.log and add to display list lif variable.
Done. If this is indeed a compiler bug, than I should note it manifests itself only in combination with uClibc-ng: it works OK with GNU Lib C (at least, in my experience). I wonder if there's a possible workaround here, and whether I should post it on GCC mailing list (after all, there's no such problem with glibc)?
Best regards, Alex
Hi, Alex Potapenko wrote,
2016-01-20 17:23 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
It's excellent - a bit more information much better than fewer! Preliminary, as for my point of view, it is compiler problem. But, for confidence, please repeat actions you done in gdb4.log and add to display list lif variable.
Done. If this is indeed a compiler bug, than I should note it manifests itself only in combination with uClibc-ng: it works OK with GNU Lib C (at least, in my experience). I wonder if there's a possible workaround here, and whether I should post it on GCC mailing list (after all, there's no such problem with glibc)?
Can you please try with MALLOC instead of MALLOC_STANDARD. I fixed in 1.0.11 a bug for Xorg usage, where a call to realloc produced some problems.
I would not report any gcc bug, please first verify with gcc 5.3, after you used MALLOC.
best regards Waldemar
2016-01-20 20:40 GMT+02:00 Waldemar Brodkorb wbx@uclibc-ng.org:
Can you please try with MALLOC instead of MALLOC_STANDARD. I fixed in 1.0.11 a bug for Xorg usage, where a call to realloc produced some problems.
Switched to MALLOC in 1.0.11, still the same issue. See gdb6.log
MALLOC=y # MALLOC_SIMPLE is not set # MALLOC_STANDARD is not set MALLOC_GLIBC_COMPAT=y
Best regards, Alex
Hi Alex, Alex Potapenko wrote,
2016-01-20 20:40 GMT+02:00 Waldemar Brodkorb wbx@uclibc-ng.org:
Can you please try with MALLOC instead of MALLOC_STANDARD. I fixed in 1.0.11 a bug for Xorg usage, where a call to realloc produced some problems.
Switched to MALLOC in 1.0.11, still the same issue. See gdb6.log
MALLOC=y # MALLOC_SIMPLE is not set # MALLOC_STANDARD is not set MALLOC_GLIBC_COMPAT=y
Okay, please repeat the strace log with strace -f as I can't see anything after the clone(). Please also report following lsmod ifconfig -a ip addr show uname -a
Why you get sometimes SIGILL and sometimes SIGSEGV? I'll have ipv6 disabled on my system, I'll repeat my tests on rpi2 with ipv6 enabled next.
best regards Waldemar
Hi Again, Waldemar Brodkorb wrote,
Hi Alex, Alex Potapenko wrote,
2016-01-20 20:40 GMT+02:00 Waldemar Brodkorb wbx@uclibc-ng.org:
Can you please try with MALLOC instead of MALLOC_STANDARD. I fixed in 1.0.11 a bug for Xorg usage, where a call to realloc produced some problems.
Switched to MALLOC in 1.0.11, still the same issue. See gdb6.log
MALLOC=y # MALLOC_SIMPLE is not set # MALLOC_STANDARD is not set MALLOC_GLIBC_COMPAT=y
Okay, please repeat the strace log with strace -f as I can't see anything after the clone(). Please also report following lsmod ifconfig -a ip addr show uname -a
Why you get sometimes SIGILL and sometimes SIGSEGV? I'll have ipv6 disabled on my system, I'll repeat my tests on rpi2 with ipv6 enabled next.
Have you seen that Alpine Linux is carrying a patch with modifies exactly our problematic function: https://github.com/alpinelinux/aports/blob/master/community/openjdk8/icedtea...
May be you could ask Timo or Natanael which problem he had. They are using it since OpenJDK6: http://git.alpinelinux.org/cgit/aports/commit/?id=a733d5ca3c5b38a12b6d7a1853...
Does a bell rings here? ;) main/openjdk6: fix ipv6 related startup crash
best regards Waldemar
On Thu, Jan 21, 2016 at 4:20 AM, Waldemar Brodkorb wbx@uclibc-ng.org wrote:
Have you seen that Alpine Linux is carrying a patch with modifies exactly our problematic function: https://github.com/alpinelinux/aports/blob/master/community/openjdk8/icedtea...
Bingo! Current initLocalIfs() implementation will leave ruins if it called twice!
Alex, please try this patch. If it not help, unlikely, turn off compiler optimizations for net_util_md.c (i.e. force gcc flags to -O0).
Does a bell rings here? ;) main/openjdk6: fix ipv6 related startup crash
Yes. It should be easy uncovered by Valgrind, but hidden in glibc due to zero initialization of all data, by default.
Regards Leonid
2016-01-21 9:52 GMT+02:00 Leonid Lisovskiy lly.dev@gmail.com:
On Thu, Jan 21, 2016 at 4:20 AM, Waldemar Brodkorb wbx@uclibc-ng.org wrote:
Have you seen that Alpine Linux is carrying a patch with modifies exactly our problematic function: https://github.com/alpinelinux/aports/blob/master/community/openjdk8/icedtea...
Bingo! Current initLocalIfs() implementation will leave ruins if it called twice!
Alex, please try this patch. If it not help, unlikely, turn off compiler optimizations for net_util_md.c (i.e. force gcc flags to -O0).
Does a bell rings here? ;) main/openjdk6: fix ipv6 related startup crash
Yes. It should be easy uncovered by Valgrind, but hidden in glibc due to zero initialization of all data, by default.
Great! Lots of thanks, it's working now! This patch is must-do, strange it hasn't been accepted upstream.
Best regards, Alex
On Thu, Jan 21, 2016 at 8:32 PM, Alex Potapenko opotapenko@gmail.com wrote:
On Thu, Jan 21, 2016 at 4:20 AM, Waldemar Brodkorb wbx@uclibc-ng.org wrote: Have you seen that Alpine Linux is carrying a patch with modifies exactly our problematic function: https://github.com/alpinelinux/aports/blob/master/community/openjdk8/icedtea...
Great! Lots of thanks, it's working now! This patch is must-do, strange it hasn't been accepted upstream.
I'm absolutely sure that initLocalIfs() must have minimal fool-proof.
If you want to try to push this patch into upstream, you should (IMHO):
* Ask Timo or Natanael (Alpine Linux maintainers) was the any response from OpenJDK maintainers?
* Try to collect additional information about source of condition in which initLocalIfs() called twice. To do that I suggest: - put breakpoint at initLocalIfs - run program - at break take an backtrace, continue
I can't see, how uClibc might be involved into problem for now. JNI_Onload() should be called from ClassLoader, not by ld constructor...
regards, Leonid