Hi Thomas,
On 04/03/2017 09:57 PM, Thomas Petazzoni wrote:
So there's
a small downside to this - which I reckon will likely not stand out in
grand scheme of things - still tabling it here for people to be aware of.
This can potentially degrade some of the micro-benchmarks such as LMBench lat_proc
shell - since there's now an additional shared library (libtirpc) to load. We
spotted this back in 2015 when comparing buildroot (using libtirpc) vs. our
homegrown builds (using native toolchain rpc).
How big was the performance hit?
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
A7-720mhz Linux 3.4.61 720 0.50 0.97 4.37 9.38 26.0 1.00 5.59 1103 5061 8432
A7-720mhz Linux 3.4.61 720 0.50 0.96 4.24 9.31 26.0 1.00 5.59 1093 5029 10.K
So ~19%
Is it just due to loading an
additional library, or to internal RPC implementation aspects that
differs between the uClibc built-in implementation and the libtirpc
implementation?
So the thing is Busybox binary ends up being linked with 2 libs (instead of just
libc.so) thus ldso loading takes more time. Now lat_proc shell ends up doing
exec() of (busybox) shell and then hello world - hence the increased time.
If it's just due to loading an additional library,
is this micro
benchmark really representative of real-world workload? Is it really a
real life situation to have very short-lived processes that need RPC
support?
Hard to judge really - for a desktop/server system it probably doesn't matter -
but for a typical embedded system ...
-Vineet
Best regards,
Thomas