Performance Overview
********************
IPC, Context-Switch and Syscall Performance
===========================================
With L4Re being a microkernel-based system and hypervisor, some of you are
interested in the IPC and syscall performance of L4Re as well as the
performance of context switches. IPC is a base-level communication
mechanisms that allows to exchange a limited amount of payload data between
two threads. Context switching is switching from one executing thread to
another, which sending a message is exactly doing.
The fastest IPC is between two threads running in the same
address space (task) on the same CPU core (`Intra`). `Inter` is IPC between
two address spaces. A syscall is also an IPC but only communicates with the
kernel.
The following table provides IPC performance numbers for a single IPC on
various popular platforms (average over multiple ten thousand calls). To
perform the measurement, the L4Re microkernel has been configured in its
performance configuration ``CONFIG_PERFORMANCE=y``, i.e., without
assertions.
The source code of the benchmark program can be found `here
`_. The images used to measure those are
linked in the table below.
Numbers are measured with the performance counters. On Arm, the cycle counter
is used. On x86, the fixed-function counters are used.
+-----------------+----------------+------------------------------------------+--------------------+---------------------------------------------------------------------------------------------+
| Platform | Processor | IPC (in CPU cycles) | Syscall | Image |
| | +--------------------+---------------------+ | |
| | | Intra | Inter | | |
+=================+================+====================+=====================+====================+=============================================================================================+
| Raspberry Pi 5 | Arm Cortex-A76 | 247 | 384 | 138 | `▶️ `__ |
| 64bit - EL1 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| Raspberry Pi 5 | Arm Cortex-A76 | 300 | 401 | 202 | `▶️ `__ |
| 64bit - EL2 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| Raspberry Pi 4 | Arm Cortex-A72 | 505 | 702 | 305 | `▶️ `__ |
| 64bit - EL1 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| Raspberry Pi 4 | Arm Cortex-A72 | 1388 [#2]_ | 1600 [#2]_ | 567 [#2]_ | `▶️ `__ |
| 64bit-EL2 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| NXP S32G2 64bit | Arm Cortex-A53 | 562 | 691 | 230 | `▶️ `__ |
| - EL1 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| NXP S32G2 64bit | Arm Cortex-A53 | 661 | 770 | 228 | `▶️ `__ |
| - EL2 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| Ampere Altra (32| Arm Neoverse-N1| 257 | 399 | 142 | `▶️ `__ |
| /80 Cores) 64bit| | | | | |
| - EL1 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| Ampere Altra (32| Arm Neoverse-N1| 299 | 440 | 148 | `▶️ `__ |
| /80 Cores) 64bit| | | | | |
| - EL2 | | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| amd64 / x86_64 | Intel N100 | 173/622/557 [#1]_ | 390/1388/613 [#1]_ | 52/188/147 [#1]_ | `▶️ `__ |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| amd64 / x86_64 | Intel Xeon | 511/649/543 [#1]_ | 934/1128/587 [#1]_ | 222/160/148 [#1]_ | `▶️ `__ |
| | Platinum 8352S | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
| amd64 / x86_64 | Intel Xeon | 664/663/557 [#1]_ | 1172/1172/613 [#1]_ | 146/146/147 [#1]_ | `▶️ `__ |
| | Gold 6248R | | | | |
+-----------------+----------------+--------------------+---------------------+--------------------+---------------------------------------------------------------------------------------------+
.. [#1] Values reflect the PMC's fixed-function counters 2 (TSC without halt) / 1 (clocks unhalted) / 0 (instructions retired)
.. [#2] The Cortex-A72 performs considerable slower in EL2 mode, compared to
running in EL1, and compared to EL2 of other Arm cores. This has
been analyzed down to the microarchitectural level and cannot be
influenced by software.
For x86: You can boot the image directly in GRUB2, e.g. ``multiboot2 (http,l4re.org)/download/ipcbench/amd64/l4re_ipcbench-20250602.elf``
For the Raspberry Pi's, converting the uimage to a raw image for firmware
boot works like this: ``dd if=l4re_ipcbench_rpi5-elX.uimage of=l4re.raw bs=64 skip=1``.
Plots of Parallel Execution of the Benchmark
============================================
Intra space IPC core-local IPC and system calls, parallel on cores
.. raw:: html
.. raw:: html
:file: ../_build/perf/arm_altra80_el1_ipc.html
.. raw:: html
:file: ../_build/perf/arm_altra80_el2_ipc.html
.. raw:: html
:file: ../_build/perf/x86_xeon_gold_6248R_2socket_noht.html
.. raw:: html
:file: ../_build/perf/x86_xeon_gold_6248R_2socket_all.html
.. raw:: html
:file: ../_build/perf/x86_n100.html