On 10/12/2016 04:27, Waldemar Brodkorb wrote:
hi,
i am visiting elce, be back on monday. can you try 1.0.17 please. 1.0.18 introduced some interesting regressions i habe not covered in my tests. i have solved most of them, but not all is pushed, yet.
looks related to the patch i sent out to lance last week on the list. best regards Waldemar
Von meinem iPhone gesendet
Am 12.10.2016 um 09:59 schrieb Joshua Kinard kumba@gentoo.org:
On 10/12/2016 03:53, Joshua Kinard wrote: Hello,
I think I've run into a rather odd bug on a big-endian MIPS platform while trying to hand-assemble a MIPS-II ISA netboot image built from a uclibc-ng chroot.
[snip]
PS, I forgot to add, this is using uclibc-ng-1.0.18 and busybox-1.24.2.
(Resending to the actual list, sorry Waldemar!)
Unfortunately, my base root for building the netboot is built on 1.0.18 at the moment. It'd take about two days to do a full rebuild.
That said, I think I might have an idea. The bit of code cited in Bug #3919 for old uclibc only defines and uses null_not_ptr in __uClibc_main.c, but it looks like the code in jmp-unwind.c does not. So I am going to try moving the null_not_ptr definition to a header somewhere, mark it non-static (maybe inline?), then try using it on the __pthread_cleanup_upto test and see if that might resolve the issue.
Sound sane?
Hi Joshua, Joshua Kinard wrote,
On 10/12/2016 04:27, Waldemar Brodkorb wrote:
hi,
i am visiting elce, be back on monday. can you try 1.0.17 please. 1.0.18 introduced some interesting regressions i habe not covered in my tests. i have solved most of them, but not all is pushed, yet.
looks related to the patch i sent out to lance last week on the list. best regards Waldemar
Von meinem iPhone gesendet
Am 12.10.2016 um 09:59 schrieb Joshua Kinard kumba@gentoo.org:
On 10/12/2016 03:53, Joshua Kinard wrote: Hello,
I think I've run into a rather odd bug on a big-endian MIPS platform while trying to hand-assemble a MIPS-II ISA netboot image built from a uclibc-ng chroot.
[snip]
PS, I forgot to add, this is using uclibc-ng-1.0.18 and busybox-1.24.2.
(Resending to the actual list, sorry Waldemar!)
Unfortunately, my base root for building the netboot is built on 1.0.18 at the moment. It'd take about two days to do a full rebuild.
So you are natively compiling the netboot system? Are you using static linked binaries, otherwise you could may be just change the shared library and ld.so.
That said, I think I might have an idea. The bit of code cited in Bug #3919 for old uclibc only defines and uses null_not_ptr in __uClibc_main.c, but it looks like the code in jmp-unwind.c does not. So I am going to try moving the null_not_ptr definition to a header somewhere, mark it non-static (maybe inline?), then try using it on the __pthread_cleanup_upto test and see if that might resolve the issue.
Sound sane?
I pushed the other open regression fixes. May be you could try with latest git master. On what hardware I could reproduce the issue? (I have some old SGI mips devices in my lab..)
best regards Waldemar
On 10/13/2016 15:23, Waldemar Brodkorb wrote:
Hi Joshua, Joshua Kinard wrote,
On 10/12/2016 04:27, Waldemar Brodkorb wrote:
hi,
i am visiting elce, be back on monday. can you try 1.0.17 please. 1.0.18 introduced some interesting regressions i habe not covered in my tests. i have solved most of them, but not all is pushed, yet.
looks related to the patch i sent out to lance last week on the list. best regards Waldemar
Von meinem iPhone gesendet
Am 12.10.2016 um 09:59 schrieb Joshua Kinard kumba@gentoo.org:
On 10/12/2016 03:53, Joshua Kinard wrote: Hello,
I think I've run into a rather odd bug on a big-endian MIPS platform while trying to hand-assemble a MIPS-II ISA netboot image built from a uclibc-ng chroot.
[snip]
PS, I forgot to add, this is using uclibc-ng-1.0.18 and busybox-1.24.2.
(Resending to the actual list, sorry Waldemar!)
Unfortunately, my base root for building the netboot is built on 1.0.18 at the moment. It'd take about two days to do a full rebuild.
So you are natively compiling the netboot system? Are you using static linked binaries, otherwise you could may be just change the shared library and ld.so.
I am doing native compiles on an SGI Octane (which I currently maintain the patchset for out-of-tree). I was only using static linking with Busybox, which is why ash was producing the flaw. I tried implementing the fix described in old uClibc Bug #3919, but that had no effect and the SIGSEGV is still reproducible.
For now, I've simply switched Busybox to use shared linking to resolve the problem, which should be fine with the netboot, since all of its utilities are built from the same chroot. Just trying to work up a fix for compiling rpcbind now, since a dependent library, libtirpc requires a non-existant header "rpcsvc/yp_prot.h", but there's a patch on the OpenWRT ML that might fix this.
Does uclibc-ng have a working Bugzilla yet? Might seem prudent to copy the details of Bug #3919 from old uClibc since it might be the same bug or related.
That said, I think I might have an idea. The bit of code cited in Bug #3919 for old uclibc only defines and uses null_not_ptr in __uClibc_main.c, but it looks like the code in jmp-unwind.c does not. So I am going to try moving the null_not_ptr definition to a header somewhere, mark it non-static (maybe inline?), then try using it on the __pthread_cleanup_upto test and see if that might resolve the issue.
Sound sane?
I pushed the other open regression fixes. May be you could try with latest git master. On what hardware I could reproduce the issue? (I have some old SGI mips devices in my lab..)
I am running Gentoo for my builds, so testing master isn't easy for me at the moment, since to be sure of things, I'd have to run a full rebuild and that would take a day or two due to gcc's compile time (~16 hours on a dual 600MHz R14000 CPU).
What kind of SGI gear do you have available and what CPUs are in them? I can vouch that SGI O2 (IP32) with R5K and RM7K CPUs work (not R10K/R12K), SGI Octane (IP30), and marginally, SGI Origin 2000/Onyx2 (IP27) should all at least work with current Linux, although IP27 and IP30 will require an external set of patches I have (and IP27 may lock up at random).
The older SGI Indy and Indog2 series w/ R4K/R5K CPUs should also still work, but I have not tested those recently due to a bad RTC chip in my Indy. Other Indigo2 variants may or may not work depending on CPU.
Hi Joshua, Joshua Kinard wrote,
On 10/13/2016 15:23, Waldemar Brodkorb wrote:
Hi Joshua, Joshua Kinard wrote,
On 10/12/2016 04:27, Waldemar Brodkorb wrote:
hi,
i am visiting elce, be back on monday. can you try 1.0.17 please. 1.0.18 introduced some interesting regressions i habe not covered in my tests. i have solved most of them, but not all is pushed, yet.
looks related to the patch i sent out to lance last week on the list. best regards Waldemar
Von meinem iPhone gesendet
Am 12.10.2016 um 09:59 schrieb Joshua Kinard kumba@gentoo.org:
On 10/12/2016 03:53, Joshua Kinard wrote: Hello,
I think I've run into a rather odd bug on a big-endian MIPS platform while trying to hand-assemble a MIPS-II ISA netboot image built from a uclibc-ng chroot.
[snip]
PS, I forgot to add, this is using uclibc-ng-1.0.18 and busybox-1.24.2.
(Resending to the actual list, sorry Waldemar!)
Unfortunately, my base root for building the netboot is built on 1.0.18 at the moment. It'd take about two days to do a full rebuild.
So you are natively compiling the netboot system? Are you using static linked binaries, otherwise you could may be just change the shared library and ld.so.
I am doing native compiles on an SGI Octane (which I currently maintain the patchset for out-of-tree). I was only using static linking with Busybox, which is why ash was producing the flaw. I tried implementing the fix described in old uClibc Bug #3919, but that had no effect and the SIGSEGV is still reproducible.
For now, I've simply switched Busybox to use shared linking to resolve the problem, which should be fine with the netboot, since all of its utilities are built from the same chroot. Just trying to work up a fix for compiling rpcbind now, since a dependent library, libtirpc requires a non-existant header "rpcsvc/yp_prot.h", but there's a patch on the OpenWRT ML that might fix this.
You might find patches for such issues in Buildroot or OpenADK, too.
Does uclibc-ng have a working Bugzilla yet? Might seem prudent to copy the details of Bug #3919 from old uClibc since it might be the same bug or related.
I want to get rid of Trac soon, but I haven't decided what to do with the bug tracking.
That said, I think I might have an idea. The bit of code cited in Bug #3919 for old uclibc only defines and uses null_not_ptr in __uClibc_main.c, but it looks like the code in jmp-unwind.c does not. So I am going to try moving the null_not_ptr definition to a header somewhere, mark it non-static (maybe inline?), then try using it on the __pthread_cleanup_upto test and see if that might resolve the issue.
Sound sane?
I pushed the other open regression fixes. May be you could try with latest git master. On what hardware I could reproduce the issue? (I have some old SGI mips devices in my lab..)
I am running Gentoo for my builds, so testing master isn't easy for me at the moment, since to be sure of things, I'd have to run a full rebuild and that would take a day or two due to gcc's compile time (~16 hours on a dual 600MHz R14000 CPU).
What kind of SGI gear do you have available and what CPUs are in them? I can vouch that SGI O2 (IP32) with R5K and RM7K CPUs work (not R10K/R12K), SGI Octane (IP30), and marginally, SGI Origin 2000/Onyx2 (IP27) should all at least work with current Linux, although IP27 and IP30 will require an external set of patches I have (and IP27 may lock up at random).
The older SGI Indy and Indog2 series w/ R4K/R5K CPUs should also still work, but I have not tested those recently due to a bad RTC chip in my Indy. Other Indigo2 variants may or may not work depending on CPU.
I have 2xO2 and 2xIndy.
The modern O2:
hinv
System: IP32 Processor: 300 Mhz R5000, with FPU Primary I-cache size: 32 Kbytes Primary D-cache size: 32 Kbytes Secondary cache size: 1024 Kbytes Memory size: 128 Mbytes Graphics: CRM, Rev C Audio: A3 version 1 SCSI Disk: scsi(0)disk(1) SCSI CDROM: scsi(0)cdrom(4)
The classic O2:
hinv
System: IP32 Processor: 175 Mhz R10000, with FPU Primary I-cache size: 32 Kbytes Primary D-cache size: 32 Kbytes Secondary cache size: 1024 Kbytes Memory size: 128 Mbytes Graphics: CRM, Rev C Audio: A3 version 1 SCSI Disk: scsi(0)disk(2) SCSI CDROM: scsi(0)cdrom(4)
So I should bootup a system on the modern O2? OpenBSD is running 64Bit kernel and userland on O2 and I think I remember they fixed the r10k issues somehow.
What are you running on the Octane? Linux 64 Bit or 32 Bit? (n32,o32,n64 in case of 64Bit)
Best regards Waldemar
On 10/16/2016 17:45, Waldemar Brodkorb wrote: [snip]
I have 2xO2 and 2xIndy.
The modern O2:
hinv
System: IP32 Processor: 300 Mhz R5000, with FPU Primary I-cache size: 32 Kbytes Primary D-cache size: 32 Kbytes Secondary cache size: 1024 Kbytes Memory size: 128 Mbytes Graphics: CRM, Rev C Audio: A3 version 1 SCSI Disk: scsi(0)disk(1) SCSI CDROM: scsi(0)cdrom(4)
This is the one you'll want to use with Linux. The 300MHz R5000's are actually RM5261's by PMC Sierra (who bought them via their QED acquisition many years ago). Linux refers to them as "Nevada" (CONFIG_CPU_NEVADA). You can add up to 1GB RAM, but when configuring the framebuffer (GBEFB) in the kernel, don't set its RAM higher than 4MB (else an Oops due to a *really* old bug no one's chased down yet).
I'm actually trying to get this netboot working so that I can re-install my O2 w/ a uClibc-ng-based rootfs. It's currently on an n32 glibc rootfs, but with gcc's increasingly-larger compile times and glibc's bloat, that machine, despite its 350MHz RM7000 CPU, just takes too long to do stuff (gcc is about a ~24-hour job). 64-bit PCI-X is also a little off, but 32-bit PCI should work fine as long as the driver isn't braindead and assumes little-endian.
The classic O2:
hinv
System: IP32 Processor: 175 Mhz R10000, with FPU Primary I-cache size: 32 Kbytes Primary D-cache size: 32 Kbytes Secondary cache size: 1024 Kbytes Memory size: 128 Mbytes Graphics: CRM, Rev C Audio: A3 version 1 SCSI Disk: scsi(0)disk(2) SCSI CDROM: scsi(0)cdrom(4)
So I should bootup a system on the modern O2? OpenBSD is running 64Bit kernel and userland on O2 and I think I remember they fixed the r10k issues somehow.
Yeah, OpenBSD solved the R10K issues, I *think*, by padding the affected instructions out with tons of 'cache' instructions, which I think is one of the stated solutions for dealing w/ R10K's speculative execution feature on the non-coherent platforms (I think at the cost of a significant performance hit). I was told once that Linux's memory design and TLB handling is too complicated for a similar approach, but I haven't tried looking into the issue at all lately. R12K CPUs have a hardware bit in their config register called "Delay Speculative Dirty" (DSD) that's supposed to help mitigate the problem, but you apparently still have to add some cache barriers before loads or stores (or both?). I recently picked one of those up, but haven't had a chance to try it out yet.
What are you running on the Octane? Linux 64 Bit or 32 Bit? (n32,o32,n64 in case of 64Bit)
Octanes can only boot a 64-bit kernel (same as their IP27 cousins), due to their firmware only supporting 64-bit ELF format. All of my SGI's run N32/glibc-based userlands, though when I format the O2, it'll be back to O32 until I figure out how good uclibc's N32 support is. I've also got a multilib chroot based on glibc that mixes O32, N32, and N64, but wasn't able to complete a fresh stage build with it due to some really weird glibc bug that popped up during the stage3 build cycle. I have to get glibc-2.24 into play and give it another go at some point.
Best regards Waldemar _______________________________________________ devel mailing list devel@uclibc-ng.org http://mailman.uclibc-ng.org/cgi-bin/mailman/listinfo/devel
Hi Joshua, Joshua Kinard wrote,
On 10/16/2016 17:45, Waldemar Brodkorb wrote: [snip]
I have 2xO2 and 2xIndy.
The modern O2:
hinv
System: IP32 Processor: 300 Mhz R5000, with FPU Primary I-cache size: 32 Kbytes Primary D-cache size: 32 Kbytes Secondary cache size: 1024 Kbytes Memory size: 128 Mbytes Graphics: CRM, Rev C Audio: A3 version 1 SCSI Disk: scsi(0)disk(1) SCSI CDROM: scsi(0)cdrom(4)
This is the one you'll want to use with Linux. The 300MHz R5000's are actually RM5261's by PMC Sierra (who bought them via their QED acquisition many years ago). Linux refers to them as "Nevada" (CONFIG_CPU_NEVADA). You can add up to 1GB RAM, but when configuring the framebuffer (GBEFB) in the kernel, don't set its RAM higher than 4MB (else an Oops due to a *really* old bug no one's chased down yet).
Thanks for the hints.
I'm actually trying to get this netboot working so that I can re-install my O2 w/ a uClibc-ng-based rootfs. It's currently on an n32 glibc rootfs, but with gcc's increasingly-larger compile times and glibc's bloat, that machine, despite its 350MHz RM7000 CPU, just takes too long to do stuff (gcc is about a ~24-hour job). 64-bit PCI-X is also a little off, but 32-bit PCI should work fine as long as the driver isn't braindead and assumes little-endian.
You don't like cross-compiling, right? uClibc-ng seems not only be interesting for embedded devices, but also for classic old unix hardware :)
The classic O2:
hinv
System: IP32 Processor: 175 Mhz R10000, with FPU Primary I-cache size: 32 Kbytes Primary D-cache size: 32 Kbytes Secondary cache size: 1024 Kbytes Memory size: 128 Mbytes Graphics: CRM, Rev C Audio: A3 version 1 SCSI Disk: scsi(0)disk(2) SCSI CDROM: scsi(0)cdrom(4)
So I should bootup a system on the modern O2? OpenBSD is running 64Bit kernel and userland on O2 and I think I remember they fixed the r10k issues somehow.
Yeah, OpenBSD solved the R10K issues, I *think*, by padding the affected instructions out with tons of 'cache' instructions, which I think is one of the stated solutions for dealing w/ R10K's speculative execution feature on the non-coherent platforms (I think at the cost of a significant performance hit). I was told once that Linux's memory design and TLB handling is too complicated for a similar approach, but I haven't tried looking into the issue at all lately. R12K CPUs have a hardware bit in their config register called "Delay Speculative Dirty" (DSD) that's supposed to help mitigate the problem, but you apparently still have to add some cache barriers before loads or stores (or both?). I recently picked one of those up, but haven't had a chance to try it out yet.
Thanks for your very detailed answer.
What are you running on the Octane? Linux 64 Bit or 32 Bit? (n32,o32,n64 in case of 64Bit)
Octanes can only boot a 64-bit kernel (same as their IP27 cousins), due to their firmware only supporting 64-bit ELF format. All of my SGI's run N32/glibc-based userlands, though when I format the O2, it'll be back to O32 until I figure out how good uclibc's N32 support is. I've also got a multilib chroot based on glibc that mixes O32, N32, and N64, but wasn't able to complete a fresh stage build with it due to some really weird glibc bug that popped up during the stage3 build cycle. I have to get glibc-2.24 into play and give it another go at some point.
I tested n32 support on my Lemote Yeelong, so you should at least be able to boot. A longer time ago I had Xorg and Firefox running on the book. Firefox only support n32. The Lemote has a page size of 16k, so it showed some interesting bugs in uClibc/uClibc-ng in the past.
best regards Waldemar
On 10/17/2016 14:54, Waldemar Brodkorb wrote:
Hi Joshua,
[snip]
You don't like cross-compiling, right? uClibc-ng seems not only be interesting for embedded devices, but also for classic old unix hardware :)
Oh, I cross-compile all the time. All of my SGI kernels are built with cross-compilers. Userland cross-compiling has just generally been trickier than kernel cross-compiles, so I don't use it as often. Plus, testing on the hardware itself can turn up some rather interesting bugs at times, and in other instances, you sometimes need the hardware to investigate if a particular CPU quirk will come back to haunt you (the R10K's speculative execution one being an all-time classic example).
And yeah, classic hardware still has its place! There's a port of the Linux kernel to the IP35-class of SGI hardware (Origin 300, Fuel, Tezro), that's been started, but hasn't had any recent updates. But it is functional enough to boot a netboot with, supposedly, so that's on my infinitely-long TODO list to try out at some point. These platforms are still quite plentiful on eBay, oddly enough, although parts for IP27 machines are becoming pretty scarce, especially the larger Origin 2000/Onyx2 rack setups.
[snip]
What are you running on the Octane? Linux 64 Bit or 32 Bit? (n32,o32,n64 in case of 64Bit)
Octanes can only boot a 64-bit kernel (same as their IP27 cousins), due to their firmware only supporting 64-bit ELF format. All of my SGI's run N32/glibc-based userlands, though when I format the O2, it'll be back to O32 until I figure out how good uclibc's N32 support is. I've also got a multilib chroot based on glibc that mixes O32, N32, and N64, but wasn't able to complete a fresh stage build with it due to some really weird glibc bug that popped up during the stage3 build cycle. I have to get glibc-2.24 into play and give it another go at some point.
I tested n32 support on my Lemote Yeelong, so you should at least be able to boot. A longer time ago I had Xorg and Firefox running on the book. Firefox only support n32. The Lemote has a page size of 16k, so it showed some interesting bugs in uClibc/uClibc-ng in the past.
I haven't tried running X11 stuff in a long time on these systems. Octane has an X11 driver for its Impact graphics board, but my last attempt to fix it up to compile, plus integrate other fixes for it, resulted in a SIGSEGV that I lacked knowledge on debugging correctly. I mostly stick to command-line right now on these platforms. And the direction that the Linux ecosystem itself is moving towards, e.g., Wayland, may throw GUI support into doubt in the future (have never actually tried Wayland yet). Way too many things with the hardware that still need fixing before one can take a look at the shiny bits.
Have you tried 64K PAGE_SIZE by chance? I use that setting on all of my SGI systems except for the IP27, which has a peculiar Oops crop up under 64K, so that machine boots 16K PAGE_SIZE at the moment. You actually get a nice performance bump on 16K or 64K versus the standard 4K. The testing netboot image I ran on my IP27 w/ 16K showed no ill effects, but I still need to do 64K on the O2 and Octane.
Hi Joshua, Joshua Kinard wrote,
Have you tried 64K PAGE_SIZE by chance? I use that setting on all of my SGI systems except for the IP27, which has a peculiar Oops crop up under 64K, so that machine boots 16K PAGE_SIZE at the moment. You actually get a nice performance bump on 16K or 64K versus the standard 4K. The testing netboot image I ran on my IP27 w/ 16K showed no ill effects, but I still need to do 64K on the O2 and Octane.
I have not tested 64k, but I hope it will work fine. I will check as soon as I get my O2 netbooting.
It seems I am too stupid to get the machine netbooted. I tried dnsmasq (dhcp,tftp included) and dhcpd/atftpd combination. No success so far. I have running a small Linux on my Solidrun cubox-i as bootserver for my other machines, which normally just works fine.
Could you share your bootserver details and the command you are using to boot a system?
For better experience I just want to boot OpenBSD bsd.rd.IP32 file to see that my bootserver works. Afterwards I want to try my cross-compiled kernel.
I have netbooted so many machines, even old classic unix hardware, (with mopd, rarpd, bootparamd, ...) feeling stupid right now.
Any hints?
Waldemar
On 10/19/2016 23:46, Waldemar Brodkorb wrote:
Hi Joshua, Joshua Kinard wrote,
Have you tried 64K PAGE_SIZE by chance? I use that setting on all of my SGI systems except for the IP27, which has a peculiar Oops crop up under 64K, so that machine boots 16K PAGE_SIZE at the moment. You actually get a nice performance bump on 16K or 64K versus the standard 4K. The testing netboot image I ran on my IP27 w/ 16K showed no ill effects, but I still need to do 64K on the O2 and Octane.
I have not tested 64k, but I hope it will work fine. I will check as soon as I get my O2 netbooting.
It seems I am too stupid to get the machine netbooted. I tried dnsmasq (dhcp,tftp included) and dhcpd/atftpd combination. No success so far. I have running a small Linux on my Solidrun cubox-i as bootserver for my other machines, which normally just works fine.
Could you share your bootserver details and the command you are using to boot a system?
For better experience I just want to boot OpenBSD bsd.rd.IP32 file to see that my bootserver works. Afterwards I want to try my cross-compiled kernel.
I have netbooted so many machines, even old classic unix hardware, (with mopd, rarpd, bootparamd, ...) feeling stupid right now.
Any hints?
My netbooting setup is standard dhcpd with old-school netkit-tftpd (too lazy to set up the more maintained tftp servers). The bit probably hanging you up is a simple /proc tuning directive needed for most SGI systems, so try executing this line on the netboot server:
echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc
That's needed for almost all of the SGI systems to netboot properly. IP22 (Indy/Indigo2) systems need this additional /proc tuning to limit the ephemeral port range:
echo "2048 32767" > /proc/sys/net/ipv4/ip_local_port_range
To actually do the netboot, make sure "netaddr" is unset in the ARCS environment:
unsetenv netaddr
Then try:
bootp(): <kernel args>
E.g., for my Octane, I typically use: bootp(): console=tty0 root=/dev/md0 consoleblank=0
And my IP27: bootp(): console=ttyS0,9600 root=/dev/md0
If you have no kernel args to pass, then just use "bootp():"
Just make sure your dhcpd is set up to do BOOTP requests properly and that you can tftp get the kernel image from the server using a tftp client, and it should JustWork(). If you have issues with the kernel itself booting, let me know and I can roll a kernel for you. I haven't tested O2's in a while, so I don't know if there's any surprises waiting in the 4.7 or 4.8 code.
Hi Joshua, Joshua Kinard wrote,
On 10/19/2016 23:46, Waldemar Brodkorb wrote:
Hi Joshua, Joshua Kinard wrote,
Have you tried 64K PAGE_SIZE by chance? I use that setting on all of my SGI systems except for the IP27, which has a peculiar Oops crop up under 64K, so that machine boots 16K PAGE_SIZE at the moment. You actually get a nice performance bump on 16K or 64K versus the standard 4K. The testing netboot image I ran on my IP27 w/ 16K showed no ill effects, but I still need to do 64K on the O2 and Octane.
I have not tested 64k, but I hope it will work fine. I will check as soon as I get my O2 netbooting.
It seems I am too stupid to get the machine netbooted. I tried dnsmasq (dhcp,tftp included) and dhcpd/atftpd combination. No success so far. I have running a small Linux on my Solidrun cubox-i as bootserver for my other machines, which normally just works fine.
Could you share your bootserver details and the command you are using to boot a system?
For better experience I just want to boot OpenBSD bsd.rd.IP32 file to see that my bootserver works. Afterwards I want to try my cross-compiled kernel.
I have netbooted so many machines, even old classic unix hardware, (with mopd, rarpd, bootparamd, ...) feeling stupid right now.
Any hints?
My netbooting setup is standard dhcpd with old-school netkit-tftpd (too lazy to set up the more maintained tftp servers). The bit probably hanging you up is a simple /proc tuning directive needed for most SGI systems, so try executing this line on the netboot server:
echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc
Yeah, you are right, this is critical. I found out yesterday evening and managed to bootup my self cross-compiled Linux kernel via TFTP. Dnsmasq doesn't work well in this case, dhcpd+atftpd worked fine.
E.g., for my Octane, I typically use: bootp(): console=tty0 root=/dev/md0 consoleblank=0
Why /dev/md0? Is nfsroot not possible?
I try:
printenv
AutoLoad=Yes diskless=0 dbaud=9600 sgilogo=y monitor=h TimeZone=PST8PDT crt_option=1 console=d1 SystemPartition=bootp(): OSLoader=vmlinux netaddr=10.0.0.10 volume=20 OSLoadOptions=ip=dhcp console=ttyS0 root=/dev/nfs ConsoleOut=serial(0) ConsoleIn=serial(0) cpufreq=300 eaddr=08:00:69:0e:a2:22 videostatus=illegal_env_var OSLoadPartition=/dev/nfs OSLoadFilename=vmlinux
bootp():
Setting $netaddr to 10.0.0.10 (from server ) Obtaining from server 3445180 - ... 0.156448] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.161755] console [ttyS0] disabled [ 0.182236] serial8250.0: ttyS0 at MMIO 0x1f390000 (irq = 60, base_baud = 115200) is a 16550A [ 4.308574] console [ttyS0] enabled [ 4.372163] serial8250.0: ttyS1 at MMIO 0x1f398000 (irq = 66, base_baud = 115200) is a 16550A [ 4.478209] eth0: SGI MACE Ethernet rev. 1 [ 4.528602] NET: Registered protocol family 17 [ 4.606099] Sending DHCP requests ., OK [ 4.659408] IP-Config: Got DHCP answer from 10.0.0.1, my address is 10.0.0.10 [ 4.746258] IP-Config: Complete: [ 4.785208] device=eth0, hwaddr=08:00:69:0e:a2:22, ipaddr=10.0.0.10, mask=255.255.255.0, gw=10.0.0.1 [ 4.900606] host=10.0.0.10, domain=foo.bar, nis-domain=(none) [ 4.975158] bootserver=0.0.0.0, rootserver=10.0.0.1, rootpath=/nfsroot/sgi nameserver0=10.0.0.1 [ 5.090765] Starting Linux (built with OpenADK). [ 100.166745] VFS: Unable to mount root fs via NFS, trying floppy. [ 100.239877] VFS: Cannot open root device "nfs" or unknown-block(2,0): error -6 [ 100.326955] Please append a correct "root=" boot option; here are the available partitions: [ 100.427692] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0) [ 100.527381] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(2,0)
Looking into tcpdump output from the bootserver, it looks like no NFS/RPC is even tried.
Any idea?
best regards Waldemar
cat /etc/dhcpd.conf # basic example configuration ddns-update-style none; option domain-name "foo.bar"; option domain-name-servers 10.0.0.1; option routers 10.0.0.1; default-lease-time 600; max-lease-time 7200; authoritative;
subnet 10.0.0.0 netmask 255.255.255.0 { range 10.0.0.15 10.0.0.20; }
host sgi { hardware ethernet 08:00:69:0e:a2:22; fixed-address 10.0.0.10; filename "vmlinux"; server-name "10.0.0.1"; option root-path "10.0.0.1:/nfsroot/sgi"; }
exportfs -av|grep sgi exporting 10.0.0.10:/nfsroot/sgi root@fluor:/nfsroot # ps |grep rpc 398 root 0 SW< [rpciod] 960 root 2144 S /usr/bin/rpcbind 2437 root 2060 S grep rpc root@fluor:/nfsroot # ps |grep mount 1358 root 2520 S /usr/sbin/mountd 2439 root 2060 S grep mount root@fluor:/nfsroot # ps |grep nfs 721 root 0 SW< [nfsiod] 976 root 0 SW [nfsd] 977 root 0 SW [nfsd] 978 root 0 SW [nfsd] 979 root 0 SW [nfsd] 2441 root 2060 S grep nfs root@fluor:/nfsroot #
Hi, Joshua Kinard wrote,
On 10/19/2016 23:46, Waldemar Brodkorb wrote:
Hi Joshua, Joshua Kinard wrote,
Any hints?
I switched to initramfs+piggyback. But everything I get is: 4.813896] Failed to execute /sbin/init (error -8) [ 4.873654] Starting init: /sbin/init exists but couldn't execute it (error -8) [ 4.962713] Starting init: /bin/sh exists but couldn't execute it (error -8)
Tried different C libraries, musl, glibc, uClibc-ng. What gcc march= is safe to be used?
Tried mips4/mips3 with o32/n32, no bootup. Tried static and dynamic linking.
Any ideas are appreciated. best regards Waldemar