Hello,
I think I've run into a rather odd bug on a big-endian MIPS platform while
trying to hand-assemble a MIPS-II ISA netboot image built from a uclibc-ng
chroot. In my netboot, I need to include xfsprogs, but this has a dependency
on the 'valloc' function call. So in uclibc-ng, I enabled
CONFIG_UCLIBC_SUSV2_LEGACY to enable that function, and rebuilt uclibc-ng.
This fixes the xfsprogs build, but it very subtly breaks busybox's ash shell.
After rebuilding uclibc-ng, then rebuilding busybox statically/multicall, if
you run /bin/ash with a malformed argument or give it a script to execute that
doesn't have the execute bit set, you get a SIGSEGV:
Fudging up the argument syntax to /bin/ash:
octane / # /bin/ash "-c"
/bin/ash: -c requires an argument
Segmentation fault
Via a non-executable script "x.sh", we start with this sample:
octane / # cat ./x.sh
#!/bin/ash
echo "foo!"
If "x.sh" has the executable bit set, we're all good:
octane / # ls -l ./x.sh
-rwxr-xr-x 1 root root 24 Oct 12 01:57 ./x.sh
octane / # /bin/ash -c ./x.sh
foo!
But if we turn off the executable bit...
octane / # chmod -x ./x.sh
octane / # ls -l ./x.sh
-rw-r--r-- 1 root root 24 Oct 12 01:57 ./x.sh
octane / # /bin/ash -c ./x.sh
/bin/ash: ./x.sh: Permission denied
Segmentation fault
The only backtrace I can get out of it after rebuilding uclibc-ng and busybox
with debugging is this (generated via the fudged argument example):
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
#1 0x00452278 in __GI__longjmp_unwind (env=0x7ffeed58, val=1) at
libpthread/nptl/sysdeps/unix/sysv/linux/jmp-unwind.c:30
#2 0x004061e4 in __libc_longjmp (env=0x7ffeed58, val=1) at
libc/sysdeps/linux/common/longjmp.c:29
#3 0x0050185c in raise_exception (e=1) at shell/ash.c:448
#4 0x00501f00 in ash_vmsg_and_raise (cond=1, msg=0x60294d
<bb_msg_requires_arg> "%s requires an argument", ap=0x7ffeece4) at
shell/ash.c:1232
#5 0x00501f4c in ash_msg_and_raise_error (msg=0x60294d <bb_msg_requires_arg>
"%s requires an argument") at shell/ash.c:1243
#6 0x0051a918 in procargs (argv=0x7ffeeff4) at shell/ash.c:13009
#7 0x0051afe4 in ash_main (argc=2, argv=0x7ffeeff4) at shell/ash.c:13158
#8 0x0047e320 in run_applet_no_and_exit (applet_no=9, argv=0x7ffeeff4) at
libbb/appletlib.c:774
#9 0x0047e370 in run_applet_and_exit (name=0x7ffef130 "ash", argv=0x7ffeeff4)
at libbb/appletlib.c:781
#10 0x0047e484 in main (argc=2, argv=0x7ffeeff4) at libbb/appletlib.c:838
Line #30 in jmp-unwind.c leads me to a really old uclibc bug, #3919:
https://bugs.busybox.net/show_bug.cgi?id=3919
But further investigation reveals that the null_not_ptr() check introduced by
the patch in that bug is already present in uclibc-ng in the patched spots,
plus a few new locations. So either I've run into a new area of the code that
needs a similar change, or I'm chasing the wrong rabbit down the wrong hole and
the bug lies elsewhere, e.g., in busybox (hinted at by the SIGSEGV only when
chmod -x on the script).
I am aware that CONFIG_UCLIBC_SUSV2_LEGACY introduces an ABI compatibility, but
it appears to remainder of the chroot userland used to build the netboot
operates normally. Only /bin/ash seems to have an issue.
No idea if instead, I need to get xfsprogs off of using valloc (given XFS' age
as a filesystem, it might need this anyways). I can pursue that avenue if
needed, but I think I've stumbled onto a really obscure bug here that still may
need looking into.
Anything else I can provide to help chase this down?
--
Joshua Kinard
Gentoo/MIPS
kumba(a)gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943
"The past tempts us, the present confuses us, the future frightens us. And our
lives slip away, moment by moment, lost in that vast, terrible in-between."
--Emperor Turhan, Centauri Republic