Performance Overview#

IPC, Context-Switch and Syscall Performance#

With L4Re being a microkernel-based system and hypervisor, some of you are interested in the IPC and syscall performance of L4Re as well as the performance of context switches. IPC is a base-level communication mechanisms that allows to exchange a limited amount of payload data between two threads. Context switching is switching from one executing thread to another, which sending a message is exactly doing. The fastest IPC is between two threads running in the same address space (task) on the same CPU core (Intra). Inter is IPC between two address spaces. A syscall is also an IPC but only communicates with the kernel.

The following table provides IPC performance numbers for a single IPC on various popular platforms (average over multiple ten thousand calls). To perform the measurement, the L4Re microkernel has been configured in its performance configuration CONFIG_PERFORMANCE=y, i.e., without assertions.

The source code of the benchmark program can be found here. The images used to measure those are linked in the table below.

Numbers are measured with the performance counters. On Arm, the cycle counter is used. On x86, the fixed-function counters are used.

Platform

Processor

IPC (in CPU cycles)

Syscall

Image

Intra

Inter

Raspberry Pi 5 64bit - EL1

Arm Cortex-A76

247

384

138

▶️

Raspberry Pi 5 64bit - EL2

Arm Cortex-A76

300

401

202

▶️

NXP S32G2 64bit - EL1

Arm Cortex-A53

562

691

230

▶️

NXP S32G2 64bit - EL2

Arm Cortex-A53

661

770

228

▶️

Ampere Altra (32 /80 Cores) 64bit - EL1

Arm Neoverse-N1

257

399

142

▶️

Ampere Altra (32 /80 Cores) 64bit - EL2

Arm Neoverse-N1

299

440

148

▶️

amd64 / x86_64

Intel N100

173/622/557 [2]

390/1388/613 [2]

52/188/147 [2]

▶️

amd64 / x86_64

Intel Xeon Platinum 8352S

511/649/543 [2]

934/1128/587 [2]

222/160/148 [2]

▶️

amd64 / x86_64

Intel Xeon Gold 6248R

664/663/557 [2]

1172/1172/613 [2]

146/146/147 [2]

▶️

For x86: You can boot the image directly in GRUB2, e.g. multiboot2 (http,l4re.org)/download/ipcbench/amd64/l4re_ipcbench-20250602.elf

For the Raspberry Pi’s, converting the uimage to a raw image for firmware boot works like this: dd if=l4re_ipcbench_rpi5-elX.uimage of=l4re.raw bs=64 skip=1.

Plots of Parallel Execution of the Benchmark#

Intra space IPC core-local IPC and system calls, parallel on cores

Processor: Ampere Altra / Arm Neoverse N1 80-core (EL1)
Cores: 80 Processor: Ampere Altra / Arm Neoverse N1 80-core (EL2)
Cores: 80 Processor: Xeon Gold 6248R (Without SMT/HT, efficient mode)
Cores: 48 Processor: Xeon Gold 6248R (With SMT/HT, efficient mode)
Cores: 96 Processor: Intel(R) N100 at 806MHz
Cores: 4