L4Re IPC and Syscall Performance

With L4Re being a microkernel-based system, some of you are interested in the IPC and syscall performance of L4Re. IPC is a base-level communication mechanisms that allows to exchange a limited amount of payload data between two threads. The fastest IPC is between two threads running in the same address space (task) on the same CPU core. A syscall is also IPC but only communicates with the kernel.

The following table provides IPC performance numbers for a single IPC on various popular platforms. To perform the measurement, the L4Re microkernel has been configured in its performance configuration CONFIG_PERFORMANCE=y, i.e., without assertions.

The source code of the benchmark program can be found here. The images used to measure those are linked in the table below.

Numbers are measured with the performance counters. On Arm, the cycle counter is used. On x86, the fixed-function counters are used.

PlatformProcessorIPC SyscallImage
IntraInter
amd64 / x86_64Intel N100 173/622/5431 392/1395/5871 64/190/1481 3
amd64 / x86_64Intel Xeon Platinum 8352S 511/649/5431 934/1128/5871 222/160/1481
Raspberry Pi 5 64bit - EL1Arm Cortex-A76 247 384 138 2
Raspberry Pi 5 64bit - EL2Arm Cortex-A76 300 401 202 2
NXP S32G2 64bit - EL1Arm Cortex-A53 562 691 230
NXP S32G2 64bit - EL2Arm Cortex-A53 661 770 228

1Values reflect the PMC's fixed-function counters 2 (TSC without halt) / 1 (clocks unhalted) / 0 (instructions retired)
2Convert to raw image for rpi5 firmware boot: dd if=l4re_ipcbench_rpi5-elX.uimage of=l4re.raw bs=64 skip=1
3You can boot the image directly in GRUB2: multiboot2 (http,l4re.org)/download/ipcbench/amd64/l4re_ipcbench.elf32