Index | Thread | Search

From:
Andrew Lemin <andrew.lemin@gmail.com>
Subject:
PF Queue bandwidth now 64bit for >4Gbps queues
To:
tech@openbsd.org
Date:
Wed, 18 Mar 2026 00:57:47 +1100

Download raw body.

Thread
Hi tech@,

The HFSC queue scheduler stores bandwidth values in struct hfsc_sc
using u_int for the m1 and m2 fields (bits/sec).  This caps the
maximum representable PF queue bandwidth at UINT_MAX ~= 4.29 Gbps,
which is insufficient for 10G+ interfaces.

Most of the code is already 64-bit: node_queue_bw.bw_absolute
(u_int64_t), pf_queue_bwspec.absolute (uint64_t), and the internal
HFSC scaled values (u_int64_t).  The bottleneck is hfsc_sc and
the three conversion functions that interface with it.

This patch widens hfsc_sc.m1 and m2 from u_int to u_int64_t, updates
m2sm()/m2ism() parameter types and sm2m() return type to match, and
removes the truncating casts. We now support configuring bandwidth
up to ~1 Tbps (overflow in m2sm at m > 2^40).

Also fixes a pre-existing truncation bug in pftop.c where the rate
variable was u_int but assigned from the u_int64_t linkshare value,
causing wrong bandwidth display for >4Gbps queues.

struct hfsc_sc grows from 12 to 24 bytes (with a 4-byte padding hole
between d and m2 due to alignment - should we reorder?).
This changes struct hfsc_class_stats which is copyout'd via
DIOCGETQSTATS therefore this patch is an ABI change.

Note: struct hfsc_opts in pfvar.h still has u_int bandwidth fields
but is not in the active queue data path (the current ioctl pathway
uses pf_queuespec -> pf_queue_scspec -> pf_queue_bwspec, all uint64_t).
hfsc_opts seems to be dead, should we remove?

Summary of changes:

  - sys/net/hfsc.h: widen hfsc_sc m1/m2 to u_int64_t
  - sys/net/hfsc.c: widen m2sm/m2ism params, sm2m return type,
    update forward declarations and remove truncating casts
  - usr.bin/systat/pftop.c: fix rate/rtmp truncation, fix format
    string %u -> %llu
  - share/man/man5/pf.conf.5: document >4G bandwidth support
  - regress/sbin/pfctl: add test 115 for 10G/8G bandwidth parsing

Example (this works now):

# pf.conf
    queue rootq on em0 bandwidth 10G
    queue defq parent rootq bandwidth 8G default

# systat queue
    QUEUE                          BW/FL SCH      PKTS    BYTES   DROP_P
DROP_B QLEN BORROW SUSPEN     P/S     B/S
    rootq on vio0                    10G fifo        0        0        0
     0    0                     0       0
     defq                          8000M fifo     1866   194229        0
     0    0                   0.8      78

# pfctl -vsq
    queue rootq on vio0 bandwidth 10G
      [ pkts:          0  bytes:          0  dropped pkts:      0 bytes:
   0 ]
      [ qlength:   0/ 50 ]
    queue defq parent rootq bandwidth 8G default
      [ pkts:       1449  bytes:     150799  dropped pkts:      0 bytes:
   0 ]
      [ qlength:   0/ 50 ]

Tested on arm64 with the pfctl regression suite (pf115, selfpf all pass)
and manual testing with 10G NIC.

Andy Lemin

---

Possible Blog Notes (for undeadly.org) :)

# PF queues break the 4 Gbps barrier

OpenBSD's PF packet filter has long supported HFSC traffic shaping
with the `queue` rules in `pf.conf(5)`.  However, an internal 32-bit
limitation in the HFSC service curve structure (`struct hfsc_sc`)
meant that bandwidth values were silently capped at approximately
4.29 Gbps — the maximum value of a `u_int`.

With 10G, 25G, and 100G network interfaces now commonplace,
OpenBSD devs making huge progress unlocking the kernel for SMP,
and adding drivers for cards supporting some of these speeds, this
limitation started to get in the way.  Configuring `bandwidth 10G` on a
queue would silently wrap around, producing incorrect and unpredictable
scheduling behaviour.

A new patch widens the bandwidth fields in the kernel's HFSC
scheduler from 32-bit to 64-bit integers, removing this bottleneck
entirely. The diff also fixes a pre-existing display bug in `pftop(1)`
where bandwidth values above 4 Gbps would be shown incorrectly.

For end users, the practical impact is: PF queue bandwidth
configuration now works correctly for modern high-speed interfaces.
The familiar syntax just does what you'd expect:

```
queue rootq on em0 bandwidth 10G
queue defq parent rootq bandwidth 8G default
```

Values up to 999G are supported, more than enough for interfaces
today and the future.  Existing configurations using values
below 4G continue to work - no changes are needed.

As always, testing of `-current` snapshots and
[donations](https://www.openbsdfoundation.org/donations.html) to
the OpenBSD Foundation are encouraged.


===================================================================
```
diff --git regress/sbin/pfctl/Makefile regress/sbin/pfctl/Makefile
index 5dd2c948..48b14a2b 100644
--- regress/sbin/pfctl/Makefile
+++ regress/sbin/pfctl/Makefile
@@ -18,7 +18,7 @@ PFTESTS=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27
 PFTESTS+=28 29 30 31 32 34 35 36 38 39 40 41 44 46 47 48 49 50
 PFTESTS+=52 53 54 55 56 57 60 61 65 66 67 68 69 70 71 72 73
 PFTESTS+=74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96
-PFTESTS+=97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 114
+PFTESTS+=97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 114 115
 PFFAIL=1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 19 20 23 25 27
 PFFAIL+=30 37 38 39 40 41 42 43 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61 62
 PFFAIL+=63 64 65 66 67
diff --git regress/sbin/pfctl/pf115.in regress/sbin/pfctl/pf115.in
new file mode 100644
index 00000000..886541e1
--- /dev/null
+++ regress/sbin/pfctl/pf115.in
@@ -0,0 +1,2 @@
+queue rootq on lo1000000 bandwidth 10G
+queue defq parent rootq bandwidth 8G default
diff --git regress/sbin/pfctl/pf115.ok regress/sbin/pfctl/pf115.ok
new file mode 100644
index 00000000..886541e1
--- /dev/null
+++ regress/sbin/pfctl/pf115.ok
@@ -0,0 +1,2 @@
+queue rootq on lo1000000 bandwidth 10G
+queue defq parent rootq bandwidth 8G default
diff --git share/man/man5/pf.conf.5 share/man/man5/pf.conf.5
index 3a383b23..f49fe959 100644
--- share/man/man5/pf.conf.5
+++ share/man/man5/pf.conf.5
@@ -1645,8 +1645,16 @@ values are specified as bits per second or using the
suffixes
 and
 .Cm G
 to represent kilobits, megabits, and gigabits per second, respectively.
+Values up to 999G are supported, allowing configuration of PF queues on
+10G, 25G, 40G, and 100G interfaces.
 The value must not exceed the interface bandwidth.
 .Pp
+For example, a 10 Gigabit interface:
+.Bd -literal -offset indent
+queue rootq on em0 bandwidth 10G
+queue defq parent rootq bandwidth 8G default
+.Ed
+.Pp
 If multiple connections are assigned the same queue, they're not guaranteed
 to share the queue bandwidth fairly.
 An alternative flow queue manager can be used to achieve fair sharing by
diff --git sys/net/hfsc.c sys/net/hfsc.c
index 37e9b6bf..93b89104 100644
--- sys/net/hfsc.c
+++ sys/net/hfsc.c
@@ -229,10 +229,10 @@ struct hfsc_class *hfsc_actlist_firstfit(struct
hfsc_class *,

 static __inline u_int64_t seg_x2y(u_int64_t, u_int64_t);
 static __inline u_int64_t seg_y2x(u_int64_t, u_int64_t);
-static __inline u_int64_t m2sm(u_int);
-static __inline u_int64_t m2ism(u_int);
+static __inline u_int64_t m2sm(u_int64_t);
+static __inline u_int64_t m2ism(u_int64_t);
 static __inline u_int64_t d2dx(u_int);
-static __inline u_int sm2m(u_int64_t);
+static __inline u_int64_t sm2m(u_int64_t);
 static __inline u_int dx2d(u_int64_t);

 void hfsc_sc2isc(struct hfsc_sc *, struct hfsc_internal_sc *);
@@ -1451,16 +1451,16 @@ seg_y2x(u_int64_t y, u_int64_t ism)
 }

 static __inline u_int64_t
-m2sm(u_int m)
+m2sm(u_int64_t m)
 {
  u_int64_t sm;

- sm = ((u_int64_t)m << SM_SHIFT) / 8 / HFSC_FREQ;
+ sm = (m << SM_SHIFT) / 8 / HFSC_FREQ;
  return (sm);
 }

 static __inline u_int64_t
-m2ism(u_int m)
+m2ism(u_int64_t m)
 {
  u_int64_t ism;

@@ -1480,13 +1480,13 @@ d2dx(u_int d)
  return (dx);
 }

-static __inline u_int
+static __inline u_int64_t
 sm2m(u_int64_t sm)
 {
  u_int64_t m;

  m = (sm * 8 * HFSC_FREQ) >> SM_SHIFT;
- return ((u_int)m);
+ return (m);
 }

 static __inline u_int
diff --git sys/net/hfsc.h sys/net/hfsc.h
index c8061dd3..ac965ff0 100644
--- sys/net/hfsc.h
+++ sys/net/hfsc.h
@@ -45,10 +45,11 @@ struct hfsc_pktcntr {
  do { (cntr)->packets++; (cntr)->bytes += len; } while (0)

 struct hfsc_sc {
- u_int m1; /* slope of the first segment in bits/sec */
- u_int d; /* the x-projection of the first segment in msec */
- u_int m2; /* slope of the second segment in bits/sec */
+ u_int64_t m1; /* slope of the first segment in bits/sec */
+ u_int d; /* the x-projection of the first segment in msec */
+ u_int64_t m2; /* slope of the second segment in bits/sec */
 };
+/* Note; 4 byte hole in above hfsc_sc */

 /* special class handles */
 #define HFSC_ROOT_CLASS 0x10000
diff --git usr.bin/systat/pftop.c usr.bin/systat/pftop.c
index 8668bd28..a036f5f5 100644
--- usr.bin/systat/pftop.c
+++ usr.bin/systat/pftop.c
@@ -1611,7 +1611,7 @@ calc_pps(u_int64_t new_pkts, u_int64_t last_pkts,
double interval)
 void
 print_queue_node(struct pfctl_queue_node *node)
 {
- u_int rate, rtmp;
+ u_int64_t rate, rtmp;
  int i;
  double interval, pps, bps;
  static const char unit[] = " KMG";
@@ -1641,7 +1641,7 @@ print_queue_node(struct pfctl_queue_node *node)
  */
  tbprintf("%u", node->qstats.data.period);
  } else
- tbprintf("%u%c", rate, unit[i]);
+ tbprintf("%llu%c", (unsigned long long)rate, unit[i]);
  print_fld_tb(FLD_BANDW);

  print_fld_str(FLD_SCHED, node->qs.flags & PFQS_FLOWQUEUE ?
```

(also attached as it is impossible to get gmail to not strip leading spaces)