Download raw body.
em(4) TX interrupt mitigation
Hi.
TL;DR: if you use em(4), particularly on a low-power device such as a
pcengines APU2, please try this diff.
The em(4) driver has 5 interrupt mitigation timers[0].
In each direction there's a "Packet Timer" that is reset each time a
packet is processed, and an "Absolute Timer" that is reset each time
in interrupt happens. The Packet Timer lets it wait a little while
for another packet, but the Absolute Timer makes sure it doesn't wait
too long.
In OpenBSD's em(4), these values are (in approximately microseconds):
Transmit Packet Timer (EM_TIDV) = 64
Transmit Absolute Timer (EM_TADV) = 64
Receive Packet Timer (EM_RDTR) = 0
Receive Absolute Timer (EM_RADV) = 64
You will note that the Receive Packet Timer is set to zero, so it
will generate an interrupt for each packet. This also means that the
corresponding Absolute Timer is also effectively disabled. There's a
comment that says "CAUTION: When setting EM_RDTR to a value other than 0,
adapters may hang (stop transmitting) under certain network conditions."
We'll examine that one later.
There's also an "Interrupt Throttle Timer" (ITR), which is set
(DEFAULT_ITR) to only allow a maximum of ~8000 interrupts per second,
which is consistent with what you seem in "systat vm 1" on a fully loaded
interface. Since it's at that limit, it would seem that interrupt rates
are a limiting factor. The interrupt handler processes both TX and RX
regardless of the source of the interrupt.
Looking at the TX interrupt mitigation, the value of 64 seems to have
come from the FreeBSD driver in 2002[1] where TIDV was reduced from
128 to 64 and TADV was added. How many packets can happen in 64 usec?
At 1Gb, a 1500 byte packet plus its overhead takes (1538*8)/1e9 seconds
= 12.3 usec, so about 5. But wait, em(2) supports jumbo packets, which
would take (9254*8)/1e9 = 74 usec! Since this is more than the maximum
holdoff timer, it means we're taking a TX completion interrupt for every
jumbo frame sent. The TX ring holds 256 or 512 packets depending on
NIC model, so we're not making very effective use of it.
What can we increase this to? Well the worst case would seem to be
back-to-back transmission of minimum size (64byte) packets at 1Gb/s while
also receiving nothing. Each packet takes about 0.8 usec, so if we want
to make sure the interface never runs out of packets to transmit we we
need to refill the ring before it's completely empty. 220 should just fit
3 jumbo packets while still leaving a little headroom. Note that actually
sending traffic while receiving absolutely nothing is difficult to acheive
in practice, since there will likely be replies and various other traffic.
In my testing with iperf an APU2 with TSO disabled and hw.setperf=0, I see
RX go up ~10% (334Mb/s -> 362Mb/s), TX go up ~25% (600Mb/s -> 750Mb/s),
and CPU usage go down by ~60% (nearly 100% of 1 core down to ~40%).
With hw.setperf=100 the speed doesn't change much, but the CPU goes down
by about the same amount.
Comments and test reports welcome.
[0] https://www.intel.com/content/dam/doc/application-note/gbe-controllers-interrupt-moderation-appl-note.pdf
[1] https://github.com/freebsd/freebsd-src/commit/a58e485d
Index: if_em.h
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_em.h,v
diff -u -p -r1.83 if_em.h
--- if_em.h 16 Feb 2024 22:30:54 -0000 1.83
+++ if_em.h 19 May 2025 08:48:07 -0000
@@ -124,20 +124,25 @@ typedef int boolean_t;
/*
* EM_TIDV - Transmit Interrupt Delay Value
* Valid Range: 0-65535 (0=off)
- * Default Value: 64
+ * Default Value: 150
* This value delays the generation of transmit interrupts in units of
* 1.024 microseconds. Transmit interrupt reduction can improve CPU
* efficiency if properly tuned for specific network traffic. If the
* system is reporting dropped transmits, this value may be set too high
* causing the driver to run out of available transmit descriptors.
*/
-#define EM_TIDV 64
+/* A 1Gb/s a jumbo frame takes ~74 usec, while a minimum size one takes 0.8.
+ * This allows 2 additional jumbo frames per interrupt while still refilling
+ * the ring before it empties in the worst case (transmitting back to back
+ * minimum size frames a 1Gb/s).
+ */
+#define EM_TIDV 150
/*
* EM_TADV - Transmit Absolute Interrupt Delay Value
* (Not valid for 82542/82543/82544)
* Valid Range: 0-65535 (0=off)
- * Default Value: 64
+ * Default Value: 150
* This value, in units of 1.024 microseconds, limits the delay in which a
* transmit interrupt is generated. Useful only if EM_TIDV is non-zero,
* this value ensures that an interrupt is generated after the initial
@@ -145,7 +150,7 @@ typedef int boolean_t;
* along with EM_TIDV, may improve traffic throughput in specific
* network conditions.
*/
-#define EM_TADV 64
+#define EM_TADV 150
/*
* EM_RDTR - Receive Interrupt Delay Timer (Packet Timer)
--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
em(4) TX interrupt mitigation