Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
Best regards, Romain
On 2021/05/18 6:21, Romain Naour wrote:
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
Note that I am seeing similar problems on RISC-V NOMMU builds with the latest buildroot/busybox 1.33. I had no time to explore the reason for it yet though. It sounds like it may be a similar problem as yours.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
Same early segfault I am seeing on init shell startup (used as init process in my case).
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
Best regards, Romain _______________________________________________ devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
Hello Damien,
Le 18/05/2021 à 00:03, Damien Le Moal a écrit :
On 2021/05/18 6:21, Romain Naour wrote:
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
Note that I am seeing similar problems on RISC-V NOMMU builds with the latest buildroot/busybox 1.33. I had no time to explore the reason for it yet though. It sounds like it may be a similar problem as yours.
The upstream Buildroot project only support RISC-V with MMU (BR2_ARCH_HAS_MMU_MANDATORY).
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
Same early segfault I am seeing on init shell startup (used as init process in my case).
When a binary segfault like this, it can be anythings related to the compiler, linker, assembler or the libc.
The issue I reported seems only related to powerpc32 achitecture.
Best regards, Romain
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
Best regards, Romain _______________________________________________ devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
On 2021/05/20 19:30, Romain Naour wrote:
Hello Damien,
Le 18/05/2021 à 00:03, Damien Le Moal a écrit :
On 2021/05/18 6:21, Romain Naour wrote:
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
Note that I am seeing similar problems on RISC-V NOMMU builds with the latest buildroot/busybox 1.33. I had no time to explore the reason for it yet though. It sounds like it may be a similar problem as yours.
The upstream Buildroot project only support RISC-V with MMU (BR2_ARCH_HAS_MMU_MANDATORY).
I have a series of patches for building bootable images for boards based on the Canaan Kendryte K210 SoC. These boards use NOMMU builds that the patch series enables. I also have elf2flt fixed and the kernel flatbin loader is already fixed in 5.13-rc. The crash on startup I am seeing with the latest buildroot/busybox is blocking me from posting the patch series though.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
Same early segfault I am seeing on init shell startup (used as init process in my case).
When a binary segfault like this, it can be anythings related to the compiler, linker, assembler or the libc.
Agreed. I have not had time to debug yet. Busy on the kernel side. Your bug report did however give me one hint that it may be good to check that this relocation issue does not exist on riscv, since I tweaked elf2flt relocation code...
Will try to find some time to debug next week.
Cheers.
"Damien" == Damien Le Moal Damien.LeMoal@wdc.com writes:
Hi,
The upstream Buildroot project only support RISC-V with MMU (BR2_ARCH_HAS_MMU_MANDATORY).
I have a series of patches for building bootable images for boards based on the Canaan Kendryte K210 SoC. These boards use NOMMU builds that the patch series enables. I also have elf2flt fixed and the kernel flatbin loader is already fixed in 5.13-rc. The crash on startup I am seeing with the latest buildroot/busybox is blocking me from posting the patch series though.
Yes, I remember. What is the status of the elf2flt patch? It would be good if you could send an updated patch series to the Buildroot list.
On 2021/05/20 20:33, Peter Korsgaard wrote:
"Damien" == Damien Le Moal Damien.LeMoal@wdc.com writes:
Hi,
The upstream Buildroot project only support RISC-V with MMU (BR2_ARCH_HAS_MMU_MANDATORY).
I have a series of patches for building bootable images for boards based on the Canaan Kendryte K210 SoC. These boards use NOMMU builds that the patch series enables. I also have elf2flt fixed and the kernel flatbin loader is already fixed in 5.13-rc. The crash on startup I am seeing with the latest buildroot/busybox is blocking me from posting the patch series though.
Yes, I remember. What is the status of the elf2flt patch? It would be good if you could send an updated patch series to the Buildroot list.
The elf2flt patch is ready. All seems good on this front now that the kernel flatbin loader is fixed to not break riscv global pointer.
Also, with all the K210 improvments that went into kernel 5.12, the buildroot patch series was expanded to add SD card image build and 4 different K210 boards.
However, when I rebased my series on the latest upstream tree, I ran into this crash on startup and have not had time to debug yet. So I have not sent anything yet, nor did I send the elf2flt patch as I want to check that the problem cause is not that one. Really not sure what is going on. I only had a quick look after my last rebase and added "need debug" to my to-do list.
Will try to get to it next week. I may just wait for 5.13 to come out though as the needed flatbin loader patch will be included. Since we are at rc2, still have some time :)
Cheers.
"Damien" == Damien Le Moal Damien.LeMoal@wdc.com writes:
Hi,
Yes, I remember. What is the status of the elf2flt patch? It would be good if you could send an updated patch series to the Buildroot list.
The elf2flt patch is ready. All seems good on this front now that the kernel flatbin loader is fixed to not break riscv global pointer.
Also, with all the K210 improvments that went into kernel 5.12, the buildroot patch series was expanded to add SD card image build and 4 different K210 boards.
However, when I rebased my series on the latest upstream tree, I ran into this crash on startup and have not had time to debug yet. So I have not sent anything yet, nor did I send the elf2flt patch as I want to check that the problem cause is not that one. Really not sure what is going on. I only had a quick look after my last rebase and added "need debug" to my to-do list.
Will try to get to it next week. I may just wait for 5.13 to come out though as the needed flatbin loader patch will be included. Since we are at rc2, still have some time :)
OK, thanks. It would be nice to get it merged, my k210 board dusted off and put to use ;)
On 2021/05/20 20:46, Peter Korsgaard wrote:
"Damien" == Damien Le Moal Damien.LeMoal@wdc.com writes:
Hi,
Yes, I remember. What is the status of the elf2flt patch? It would be good if you could send an updated patch series to the Buildroot list.
The elf2flt patch is ready. All seems good on this front now that the kernel flatbin loader is fixed to not break riscv global pointer.
Also, with all the K210 improvments that went into kernel 5.12, the buildroot patch series was expanded to add SD card image build and 4 different K210 boards.
However, when I rebased my series on the latest upstream tree, I ran into this crash on startup and have not had time to debug yet. So I have not sent anything yet, nor did I send the elf2flt patch as I want to check that the problem cause is not that one. Really not sure what is going on. I only had a quick look after my last rebase and added "need debug" to my to-do list.
Will try to get to it next week. I may just wait for 5.13 to come out though as the needed flatbin loader patch will be included. Since we are at rc2, still have some time :)
OK, thanks. It would be nice to get it merged, my k210 board dusted off and put to use ;)
My tree is here:
https://github.com/damien-lemoal/buildroot/commits/master
But I have not pushed the latest working version to it yet. Will do that ASAP so that there is at least one tree working. Then will debug after rebasing on upstream.
Hi Romain, Romain Naour wrote,
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
What if the assembly inside HAVE_ASM_PPC_REL16 isn't pie safe?
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
best regards Waldemar
Hi Waldemar,
Le 19/05/2021 à 00:09, Waldemar Brodkorb a écrit :
Hi Romain, Romain Naour wrote,
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
What if the assembly inside HAVE_ASM_PPC_REL16 isn't pie safe?
Good question.
I guess it should work with pie (see PIEFLAG_NAME:=-fpie) https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?h=v1.0.38#n...
I did a try with Glibc without any problem with BR2_PIC_PIE enabled.
Best regards, Romain
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
best regards Waldemar
Hi Romain,
can you confirm that attached patch works for you? I tested with non-PIE and PIE and both seems to work.
best regards Waldemar
Romain Naour wrote,
Hi Waldemar,
Le 19/05/2021 à 00:09, Waldemar Brodkorb a écrit :
Hi Romain, Romain Naour wrote,
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
What if the assembly inside HAVE_ASM_PPC_REL16 isn't pie safe?
Good question.
I guess it should work with pie (see PIEFLAG_NAME:=-fpie) https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?h=v1.0.38#n...
I did a try with Glibc without any problem with BR2_PIC_PIE enabled.
Best regards, Romain
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
best regards Waldemar
Hello,
I've stayed silent so far but I've done some investigation on this one.
It's not yet clear to me why there is this code path:
|#ifdef __PIC__ # ifdef HAVE_ASM_PPC_REL16 bcl 20,31,1f 1: mflr r31 addis r31,r31,_GLOBAL_OFFSET_TABLE_-1b@ha addi r31,r31,_GLOBAL_OFFSET_TABLE_-1b@l # else bl _GLOBAL_OFFSET_TABLE_-4@local mflr r31 # endif #endif |||
If I understood correctly, the "new" (so I guess "better ?") way of doing is by using the PPC_REL16 reloc.
Code seems bigger, and uses 2 relocs, compared to the other code path which uses just 1 reloc.
So, why is the 1st better ?
I am betting here that : it's because by using 2 16-bit relocs, you can indeed address 32 bit and reach the .got even if it is very far away from the current code.
If you are stuck with only 16 bits ... it can fail.
So, I am guessing (please if someone can confirm) that your fix will work in most cases ... but can fail when got is too far away to fit in the 16 bit reloc targeting the "bl" insn immediate.
I think it would be better to understand why the first code path does not work well.
Here is the current state of my investigations:
I compiled both versions (with and without ppc_rel16) to compare what's happening in both cases.
busybox_unstripped in the no_rel16 (#else) case: https://salamandar.fr/lufi/r/4TrBm3U7q5#z3J3Z3G24BTPJnQKD/EzGhAk/n3YHZSZF3hN...
busybox_unstripped in the ppc_rel16 (#if) case: https://salamandar.fr/lufi/r/pQ79KBb5wU#HFlcBZIVngZFVzvObe/afqeNgMEbMtwDk9aj...
In the working case (#else):
#################
the _start symbol contains:
b022c: 48 03 fd f9 bl f0024 <__TMC_END__+0x10> b0230: 7f e8 02 a6 mflr r31 b0234: 81 bf ff f0 lwz r13,-16(r31)
to load the r13 reg with SDA (Small Data Area) pointer.
0xf0024 is in the got which contains:
000f0014 <.got>: ... f0024: 4e 80 00 21 blrl
000f0028 <_GLOBAL_OFFSET_TABLE_>: f0028: 00 0e ff 34 .long 0xeff34 ... so basically we jump to the got, we execute "blrl" which IIUC jumps back to our _start but with LR (link register) == 0xf0028 (nextpc)
then mflr (move from link register) makes it so that r31 == 0xf0028
Then we lwz r13 = * (0xf0028 - 16) == * 0xf0018
0xf0018 is in the .got too and seems to be the target of a R_PPC_RELATIVE relocation which I *think* will end up with the 0xf0028 value which is the address of "_GLOBAL_OFFSET_TABLE_" symbol.
In http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf we can read p.76:
"In a shared object,_SDA_BASE_is defined to have the same value as_GLOBAL_OFFSET_TABLE_"
So it seems all is indeed OK.
|##############################################|
|Now, in the crashing case (HAVE_ASM_PPC_REL16 = 1):|
||||
the _start symbol contains:
b022c: 42 9f 00 05 bcl 20,4*cr7+so,b0230 <_start+0xc> b0230: 7f e8 02 a6 mflr r31 b0234: 3f ff 00 05 addis r31,r31,5 b0238: 3b ff f8 6c addi r31,r31,-1940 b023c: 81 bf ff f4 lwz r13,-12(r31)
So we branch to 0xb0230 (nextpc) to move link register to r31 which ends up with the value 0xb0230.
we do r31 = r31 + 0x10000 * 5 = 0x100230
we do r31 = r31 - 1940 = 0xffa9c
Then we load r13 = *(0xffa9c - 12) = *(0xffa90)
0xffa90 seems to be the target of a R_PPC_RELATIVE relocation that targets 0xffa9c which is the address of the _GLOBAL_OFFSET_TABLE_ symbol.
The code seems correct, I am guessing the relocation handling might be the cause of the problem?
It might be interesting to compare glibc and uclibc-ng for PowerPC for the handling of R_PPC_RELATIVE reloc.
Sorry I don't have any quick fix, I'm just sharing my thoughts on the issue.
Cheers!
Yann
On 21/05/2021 10:34, Waldemar Brodkorb wrote:
Hi Romain,
can you confirm that attached patch works for you? I tested with non-PIE and PIE and both seems to work.
best regards Waldemar
Romain Naour wrote,
Hi Waldemar,
Le 19/05/2021 à 00:09, Waldemar Brodkorb a écrit :
Hi Romain, Romain Naour wrote,
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
What if the assembly inside HAVE_ASM_PPC_REL16 isn't pie safe?
Good question.
I guess it should work with pie (see PIEFLAG_NAME:=-fpie) https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?h=v1.0.38#n...
I did a try with Glibc without any problem with BR2_PIC_PIE enabled.
Best regards, Romain
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
best regards Waldemar
devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
Hello,
I think my previous analysis showing the r31 and r13 computations are OK still hold but my conclusion that the issue is with relocation is false.
I attach a small patch that seems to greatly improve the situation (but I think it's not the final patch yet, some other things might need changes)
I think the issue is due to the fact that now (since https://github.com/buildroot/buildroot/commit/f9b539bf4054d55da69280b19f4b99...) gcc is calling gas using -msecureplt, enable the secureplt.
Previous PLT was code written on-the-fly from the uClibc relocation handler (which has the drawback of needing writable PLT): https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/ldso/ldso/powerpc/elf...
New PLT, instead of just having a relocation pointing to it contains real code like this:
000b56a0 <00000000.plt_pic32.__uClibc_main>: b56a0: 81 7e 03 7c lwz r11,892(r30) b56a4: 7d 69 03 a6 mtctr r11 b56a8: 4e 80 04 20 bctr b56ac: 60 00 00 00 nop
This is where the crash is, the lwz triggers a segfault because r30 is not initialized.
The new PLT uses r30 as got pointer.
by just adding a "r30 = r31" in crt1.S I get buildroot to boot. (see attached patch)
We can see that in glibc start.S r30 is indeed set: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/powerpc/powerpc32/s...
Also see the comment above.
So I think the right direction is keeping PPC_REL16 code and adding the missing r30 got pointer.
I must still be missing something because for instance ldd crashes.
What do you think?
Le 21/05/2021 à 16:48, Yann Sionneau a écrit :
Hello,
I've stayed silent so far but I've done some investigation on this one.
It's not yet clear to me why there is this code path:
|#ifdef __PIC__ # ifdef HAVE_ASM_PPC_REL16 bcl 20,31,1f 1: mflr r31 addis r31,r31,_GLOBAL_OFFSET_TABLE_-1b@ha addi r31,r31,_GLOBAL_OFFSET_TABLE_-1b@l # else bl _GLOBAL_OFFSET_TABLE_-4@local mflr r31 # endif #endif |||
If I understood correctly, the "new" (so I guess "better ?") way of doing is by using the PPC_REL16 reloc.
Code seems bigger, and uses 2 relocs, compared to the other code path which uses just 1 reloc.
So, why is the 1st better ?
I am betting here that : it's because by using 2 16-bit relocs, you can indeed address 32 bit and reach the .got even if it is very far away from the current code.
If you are stuck with only 16 bits ... it can fail.
So, I am guessing (please if someone can confirm) that your fix will work in most cases ... but can fail when got is too far away to fit in the 16 bit reloc targeting the "bl" insn immediate.
I think it would be better to understand why the first code path does not work well.
Here is the current state of my investigations:
I compiled both versions (with and without ppc_rel16) to compare what's happening in both cases.
busybox_unstripped in the no_rel16 (#else) case: https://salamandar.fr/lufi/r/4TrBm3U7q5#z3J3Z3G24BTPJnQKD/EzGhAk/n3YHZSZF3hN...
busybox_unstripped in the ppc_rel16 (#if) case: https://salamandar.fr/lufi/r/pQ79KBb5wU#HFlcBZIVngZFVzvObe/afqeNgMEbMtwDk9aj...
In the working case (#else):
#################
the _start symbol contains:
b022c: 48 03 fd f9 bl f0024 <__TMC_END__+0x10> b0230: 7f e8 02 a6 mflr r31 b0234: 81 bf ff f0 lwz r13,-16(r31)
to load the r13 reg with SDA (Small Data Area) pointer.
0xf0024 is in the got which contains:
000f0014 <.got>: ... f0024: 4e 80 00 21 blrl
000f0028 <_GLOBAL_OFFSET_TABLE_>: f0028: 00 0e ff 34 .long 0xeff34 ... so basically we jump to the got, we execute "blrl" which IIUC jumps back to our _start but with LR (link register) == 0xf0028 (nextpc)
then mflr (move from link register) makes it so that r31 == 0xf0028
Then we lwz r13 = * (0xf0028 - 16) == * 0xf0018
0xf0018 is in the .got too and seems to be the target of a R_PPC_RELATIVE relocation which I *think* will end up with the 0xf0028 value which is the address of "_GLOBAL_OFFSET_TABLE_" symbol.
In http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf we can read p.76:
"In a shared object,_SDA_BASE_is defined to have the same value as_GLOBAL_OFFSET_TABLE_"
So it seems all is indeed OK.
|##############################################|
|Now, in the crashing case (HAVE_ASM_PPC_REL16 = 1):|
||||
the _start symbol contains:
b022c: 42 9f 00 05 bcl 20,4*cr7+so,b0230 <_start+0xc> b0230: 7f e8 02 a6 mflr r31 b0234: 3f ff 00 05 addis r31,r31,5 b0238: 3b ff f8 6c addi r31,r31,-1940 b023c: 81 bf ff f4 lwz r13,-12(r31)
So we branch to 0xb0230 (nextpc) to move link register to r31 which ends up with the value 0xb0230.
we do r31 = r31 + 0x10000 * 5 = 0x100230
we do r31 = r31 - 1940 = 0xffa9c
Then we load r13 = *(0xffa9c - 12) = *(0xffa90)
0xffa90 seems to be the target of a R_PPC_RELATIVE relocation that targets 0xffa9c which is the address of the _GLOBAL_OFFSET_TABLE_ symbol.
The code seems correct, I am guessing the relocation handling might be the cause of the problem?
It might be interesting to compare glibc and uclibc-ng for PowerPC for the handling of R_PPC_RELATIVE reloc.
Sorry I don't have any quick fix, I'm just sharing my thoughts on the issue.
Cheers!
Yann
On 21/05/2021 10:34, Waldemar Brodkorb wrote:
Hi Romain,
can you confirm that attached patch works for you? I tested with non-PIE and PIE and both seems to work.
best regards Waldemar
Romain Naour wrote,
Hi Waldemar,
Le 19/05/2021 à 00:09, Waldemar Brodkorb a écrit :
Hi Romain, Romain Naour wrote,
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
What if the assembly inside HAVE_ASM_PPC_REL16 isn't pie safe?
Good question.
I guess it should work with pie (see PIEFLAG_NAME:=-fpie) https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?h=v1.0.38#n...
I did a try with Glibc without any problem with BR2_PIC_PIE enabled.
Best regards, Romain
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
best regards Waldemar
devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
Hello Yann, All,
Thanks for the detailed analysis!
Le 22/05/2021 à 03:08, Yann Sionneau a écrit :
Hello,
I think my previous analysis showing the r31 and r13 computations are OK still hold but my conclusion that the issue is with relocation is false.
By enabling some debug option in uClibc-ng I tried to understand the working vs crashing case:
I noticed a similar result regarding the "_dl_elf_main" address.
https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/ldso/ldso/dl-startup....
In the crashing case (#if): transfering control to application @ 0xb0224
In the working case (#else): transfering control to application @ 0x1009a4b4
I attach a small patch that seems to greatly improve the situation (but I think it's not the final patch yet, some other things might need changes)
I think the issue is due to the fact that now (since https://github.com/buildroot/buildroot/commit/f9b539bf4054d55da69280b19f4b99...) gcc is calling gas using -msecureplt, enable the secureplt.
I didn't tried to revert this patch and enable BR2_PIC_PIE in Buildroot at the same time.
Previous PLT was code written on-the-fly from the uClibc relocation handler (which has the drawback of needing writable PLT): https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/ldso/ldso/powerpc/elf...
New PLT, instead of just having a relocation pointing to it contains real code like this:
000b56a0 <00000000.plt_pic32.__uClibc_main>: b56a0: 81 7e 03 7c lwz r11,892(r30) b56a4: 7d 69 03 a6 mtctr r11 b56a8: 4e 80 04 20 bctr b56ac: 60 00 00 00 nop
This is where the crash is, the lwz triggers a segfault because r30 is not initialized.
The new PLT uses r30 as got pointer.
by just adding a "r30 = r31" in crt1.S I get buildroot to boot. (see attached patch)
We can see that in glibc start.S r30 is indeed set: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/powerpc/powerpc32/s...
Indeed, I noticed r30 being used by Glibc.
Also see the comment above.
So I think the right direction is keeping PPC_REL16 code and adding the missing r30 got pointer.
I must still be missing something because for instance ldd crashes.
What do you think?
I guess we can follow Glibc developers and only support the HAVE_ASM_PPC_REL16 case. I don't think anyone is building a toolchain with an old binutils missing ppc REL16 support.
Best regards, Romain
Le 21/05/2021 à 16:48, Yann Sionneau a écrit :
Hello,
I've stayed silent so far but I've done some investigation on this one.
It's not yet clear to me why there is this code path:
|#ifdef __PIC__ # ifdef HAVE_ASM_PPC_REL16 bcl 20,31,1f 1: mflr r31 addis r31,r31,_GLOBAL_OFFSET_TABLE_-1b@ha addi r31,r31,_GLOBAL_OFFSET_TABLE_-1b@l # else bl _GLOBAL_OFFSET_TABLE_-4@local mflr r31 # endif #endif |||
If I understood correctly, the "new" (so I guess "better ?") way of doing is by using the PPC_REL16 reloc.
Code seems bigger, and uses 2 relocs, compared to the other code path which uses just 1 reloc.
So, why is the 1st better ?
I am betting here that : it's because by using 2 16-bit relocs, you can indeed address 32 bit and reach the .got even if it is very far away from the current code.
If you are stuck with only 16 bits ... it can fail.
So, I am guessing (please if someone can confirm) that your fix will work in most cases ... but can fail when got is too far away to fit in the 16 bit reloc targeting the "bl" insn immediate.
I think it would be better to understand why the first code path does not work well.
Here is the current state of my investigations:
I compiled both versions (with and without ppc_rel16) to compare what's happening in both cases.
busybox_unstripped in the no_rel16 (#else) case: https://salamandar.fr/lufi/r/4TrBm3U7q5#z3J3Z3G24BTPJnQKD/EzGhAk/n3YHZSZF3hN...
busybox_unstripped in the ppc_rel16 (#if) case: https://salamandar.fr/lufi/r/pQ79KBb5wU#HFlcBZIVngZFVzvObe/afqeNgMEbMtwDk9aj...
In the working case (#else):
#################
the _start symbol contains:
b022c: 48 03 fd f9 bl f0024 <__TMC_END__+0x10> b0230: 7f e8 02 a6 mflr r31 b0234: 81 bf ff f0 lwz r13,-16(r31)
to load the r13 reg with SDA (Small Data Area) pointer.
0xf0024 is in the got which contains:
000f0014 <.got>: ... f0024: 4e 80 00 21 blrl
000f0028 <_GLOBAL_OFFSET_TABLE_>: f0028: 00 0e ff 34 .long 0xeff34 ... so basically we jump to the got, we execute "blrl" which IIUC jumps back to our _start but with LR (link register) == 0xf0028 (nextpc)
then mflr (move from link register) makes it so that r31 == 0xf0028
Then we lwz r13 = * (0xf0028 - 16) == * 0xf0018
0xf0018 is in the .got too and seems to be the target of a R_PPC_RELATIVE relocation which I *think* will end up with the 0xf0028 value which is the address of "_GLOBAL_OFFSET_TABLE_" symbol.
In http://refspecs.linux-foundation.org/elf/elfspec_ppc.pdf we can read p.76:
"In a shared object,_SDA_BASE_is defined to have the same value as_GLOBAL_OFFSET_TABLE_"
So it seems all is indeed OK.
|##############################################|
|Now, in the crashing case (HAVE_ASM_PPC_REL16 = 1):|
||||
the _start symbol contains:
b022c: 42 9f 00 05 bcl 20,4*cr7+so,b0230 <_start+0xc> b0230: 7f e8 02 a6 mflr r31 b0234: 3f ff 00 05 addis r31,r31,5 b0238: 3b ff f8 6c addi r31,r31,-1940 b023c: 81 bf ff f4 lwz r13,-12(r31)
So we branch to 0xb0230 (nextpc) to move link register to r31 which ends up with the value 0xb0230.
we do r31 = r31 + 0x10000 * 5 = 0x100230
we do r31 = r31 - 1940 = 0xffa9c
Then we load r13 = *(0xffa9c - 12) = *(0xffa90)
0xffa90 seems to be the target of a R_PPC_RELATIVE relocation that targets 0xffa9c which is the address of the _GLOBAL_OFFSET_TABLE_ symbol.
The code seems correct, I am guessing the relocation handling might be the cause of the problem?
It might be interesting to compare glibc and uclibc-ng for PowerPC for the handling of R_PPC_RELATIVE reloc.
Sorry I don't have any quick fix, I'm just sharing my thoughts on the issue.
Cheers!
Yann
On 21/05/2021 10:34, Waldemar Brodkorb wrote:
Hi Romain,
can you confirm that attached patch works for you? I tested with non-PIE and PIE and both seems to work.
best regards Waldemar
Romain Naour wrote,
Hi Waldemar,
Le 19/05/2021 à 00:09, Waldemar Brodkorb a écrit :
Hi Romain, Romain Naour wrote,
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
What if the assembly inside HAVE_ASM_PPC_REL16 isn't pie safe?
Good question.
I guess it should work with pie (see PIEFLAG_NAME:=-fpie) https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?h=v1.0.38#n...
I did a try with Glibc without any problem with BR2_PIC_PIE enabled.
Best regards, Romain
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
best regards Waldemar
devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
devel mailing list devel@uclibc-ng.org https://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
Hello Waldemar,
Le 21/05/2021 à 10:34, Waldemar Brodkorb a écrit :
Hi Romain,
can you confirm that attached patch works for you? I tested with non-PIE and PIE and both seems to work.
Thanks for your feedback.
Indeed, removing the code enabled by HAVE_ASM_PPC_REL16 allow to boot the system (We could also remove entirely the PPC_HAS_REL16 test from Rules.mak) but I would keep this code instead (based on Glibc code history analysis).
Yann Sionneau seems to have an interesting findings.
Best regards, Romain
best regards Waldemar
Romain Naour wrote,
Hi Waldemar,
Le 19/05/2021 à 00:09, Waldemar Brodkorb a écrit :
Hi Romain, Romain Naour wrote,
Hello,
Recently in Buildroot the option BR2_PIC_PIE has been enabled by default along with other hardening features [1]. Since then some ppc defconfig such qemu_ppc_e500mc_defconfig are failing to boot due to a segfault in init program.
The segfault appear very early in __uClibc_main while starting any binaries, an issue located in crt1.S (powerpc)[2].
After some trial and error, removing HAVE_ASM_PPC_REL16 from CFLAGS [3] allow to generate a working system again. But this is actually wrong since instead we should consider HAVE_ASM_PPC_REL16 always true nowadays.
What if the assembly inside HAVE_ASM_PPC_REL16 isn't pie safe?
Good question.
I guess it should work with pie (see PIEFLAG_NAME:=-fpie) https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?h=v1.0.38#n...
I did a try with Glibc without any problem with BR2_PIC_PIE enabled.
Best regards, Romain
Indeed, Glibc removed HAVE_ASM_PPC_REL16 since version 2.22 [4] since "the minimum binutils supports rel16 relocs". Binutils 2.22 supports R_PPC_REL16 as default.
uClibc-ng should remove HAVE_ASM_PPC_REL16 but keep the code as it was defined. But this doesn't fix the initial issue.
Any idea ?
[1] https://git.buildroot.net/buildroot/commit/?id=810ba387bec3c5b6904e8893fb4cb... [2] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/po... [3] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/Rules.mak?id=2bf4991c... [4] https://sourceware.org/git/?p=glibc.git;a=commit;h=59261ad3eb345e0d7b9f5c73e...
best regards Waldemar