Hi Thomas,
On 04/03/2017 09:57 PM, Thomas Petazzoni wrote:
So there's a small downside to this - which I reckon will likely not stand out in grand scheme of things - still tabling it here for people to be aware of. This can potentially degrade some of the micro-benchmarks such as LMBench lat_proc shell - since there's now an additional shared library (libtirpc) to load. We spotted this back in 2015 when comparing buildroot (using libtirpc) vs. our homegrown builds (using native toolchain rpc).
How big was the performance hit?
Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- A7-720mhz Linux 3.4.61 720 0.50 0.97 4.37 9.38 26.0 1.00 5.59 1103 5061 8432 A7-720mhz Linux 3.4.61 720 0.50 0.96 4.24 9.31 26.0 1.00 5.59 1093 5029 10.K
So ~19%
Is it just due to loading an additional library, or to internal RPC implementation aspects that differs between the uClibc built-in implementation and the libtirpc implementation?
So the thing is Busybox binary ends up being linked with 2 libs (instead of just libc.so) thus ldso loading takes more time. Now lat_proc shell ends up doing exec() of (busybox) shell and then hello world - hence the increased time.
If it's just due to loading an additional library, is this micro benchmark really representative of real-world workload? Is it really a real life situation to have very short-lived processes that need RPC support?
Hard to judge really - for a desktop/server system it probably doesn't matter - but for a typical embedded system ...
-Vineet
Best regards,
Thomas