Hello,
I think I've run into a rather odd bug on a big-endian MIPS platform while trying to hand-assemble a MIPS-II ISA netboot image built from a uclibc-ng chroot. In my netboot, I need to include xfsprogs, but this has a dependency on the 'valloc' function call. So in uclibc-ng, I enabled CONFIG_UCLIBC_SUSV2_LEGACY to enable that function, and rebuilt uclibc-ng. This fixes the xfsprogs build, but it very subtly breaks busybox's ash shell.
After rebuilding uclibc-ng, then rebuilding busybox statically/multicall, if you run /bin/ash with a malformed argument or give it a script to execute that doesn't have the execute bit set, you get a SIGSEGV:
Fudging up the argument syntax to /bin/ash: octane / # /bin/ash "-c" /bin/ash: -c requires an argument Segmentation fault
Via a non-executable script "x.sh", we start with this sample: octane / # cat ./x.sh #!/bin/ash echo "foo!"
If "x.sh" has the executable bit set, we're all good: octane / # ls -l ./x.sh -rwxr-xr-x 1 root root 24 Oct 12 01:57 ./x.sh octane / # /bin/ash -c ./x.sh foo!
But if we turn off the executable bit... octane / # chmod -x ./x.sh octane / # ls -l ./x.sh -rw-r--r-- 1 root root 24 Oct 12 01:57 ./x.sh octane / # /bin/ash -c ./x.sh /bin/ash: ./x.sh: Permission denied Segmentation fault
The only backtrace I can get out of it after rebuilding uclibc-ng and busybox with debugging is this (generated via the fudged argument example):
Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () #1 0x00452278 in __GI__longjmp_unwind (env=0x7ffeed58, val=1) at libpthread/nptl/sysdeps/unix/sysv/linux/jmp-unwind.c:30 #2 0x004061e4 in __libc_longjmp (env=0x7ffeed58, val=1) at libc/sysdeps/linux/common/longjmp.c:29 #3 0x0050185c in raise_exception (e=1) at shell/ash.c:448 #4 0x00501f00 in ash_vmsg_and_raise (cond=1, msg=0x60294d <bb_msg_requires_arg> "%s requires an argument", ap=0x7ffeece4) at shell/ash.c:1232 #5 0x00501f4c in ash_msg_and_raise_error (msg=0x60294d <bb_msg_requires_arg> "%s requires an argument") at shell/ash.c:1243 #6 0x0051a918 in procargs (argv=0x7ffeeff4) at shell/ash.c:13009 #7 0x0051afe4 in ash_main (argc=2, argv=0x7ffeeff4) at shell/ash.c:13158 #8 0x0047e320 in run_applet_no_and_exit (applet_no=9, argv=0x7ffeeff4) at libbb/appletlib.c:774 #9 0x0047e370 in run_applet_and_exit (name=0x7ffef130 "ash", argv=0x7ffeeff4) at libbb/appletlib.c:781 #10 0x0047e484 in main (argc=2, argv=0x7ffeeff4) at libbb/appletlib.c:838
Line #30 in jmp-unwind.c leads me to a really old uclibc bug, #3919: https://bugs.busybox.net/show_bug.cgi?id=3919
But further investigation reveals that the null_not_ptr() check introduced by the patch in that bug is already present in uclibc-ng in the patched spots, plus a few new locations. So either I've run into a new area of the code that needs a similar change, or I'm chasing the wrong rabbit down the wrong hole and the bug lies elsewhere, e.g., in busybox (hinted at by the SIGSEGV only when chmod -x on the script).
I am aware that CONFIG_UCLIBC_SUSV2_LEGACY introduces an ABI compatibility, but it appears to remainder of the chroot userland used to build the netboot operates normally. Only /bin/ash seems to have an issue.
No idea if instead, I need to get xfsprogs off of using valloc (given XFS' age as a filesystem, it might need this anyways). I can pursue that avenue if needed, but I think I've stumbled onto a really obscure bug here that still may need looking into.
Anything else I can provide to help chase this down?