Download raw body.
vmd: add checksum offload for guests
Jan Klemkow <j.klemkow@wemelug.de> writes: > On Wed, Jan 28, 2026 at 11:00:30AM -0500, Dave Voutila wrote: >> David Gwynne <david@gwynne.id.au> writes: >> >> > On Sat, Jan 17, 2026 at 10:56:36PM +0100, Jan Klemkow wrote: >> >> On Sat, Jan 17, 2026 at 11:38:50AM -0500, Dave Voutila wrote: >> >> > Mike Larkin <mlarkin@nested.page> writes: >> >> > > On Fri, Jan 16, 2026 at 07:38:16PM +0100, Jan Klemkow wrote: >> >> > >> On Thu, Jan 15, 2026 at 02:08:43PM -0800, Mike Larkin wrote: >> >> > >> > Does this "just work" no matter what guests I run? That's really all I care >> >> > >> > about. >> >> > >> >> >> > >> Here is my current diff for checksum offloading in vmd(8). >> >> > >> >> >> > >> I tested the following combination of features: >> >> > >> >> >> > >> - Debian/Linux and OpenBSD-current guests >> >> > >> - OpenBSD-current vio(4) w/o all offloading features >> >> > >> - Linux, OpenBSD and Hostsystem via veb(4) and vlan(4) >> >> > >> - IPv4 and IPv6 with tcpbench(1) >> >> > >> - local interface locked lladdr >> >> > >> - local interface dhcp >> >> > >> >> >> > >> Further tests are welcome! >> >> > > >> >> > > Not sure about dv@, but I can't really review this. it's hundreds of lines >> >> > > of changes in vmd vionet that require a level of understanding of tap(4) and >> >> > > in virtio/vionet (and the network stack in general) that I don't have. >> >> > > When I did the original vionet in vmd years ago it was pretty straightforward >> >> > > since the spec (for *all* virtio) was only like 20 pages. I was able to write >> >> > > that code in a weekend. now that we have bolted on all this other stuff, I >> >> > > don't feel comfortable giving oks in this area anymore since there is no way >> >> > > I can look at this and know if it's right or not. I think you need a network >> >> > > stack person to ok this, *and* explain what the ramifications are for vmd >> >> > > in general. It looks like vmd is doing inspection of every packet now? I >> >> > > dont think we want that. >> >> > >> >> > I've spent time digging into this and better understand it now. I'm also >> >> > happy now with how the current diff isn't expanding pledges for vionet. >> >> > >> >> > It feels overkill to have to poke every packet, >> > >> > if vmd is going to provide virtio net to the guest, and you want to >> > provide the offload features to the guest, then something on the >> > host side has to implement the virtio net header and fiddle with >> > the packets this way. >> > >> >> It don't have to be this way. My first versions of this diff was >> >> without all this packet parsing stuff in vmd(8)[1]. I'll try reproduce >> >> the old version till c2k25 to show you the difference. >> >> >> >> [1]: https://marc.info/?l=openbsd-tech&m=172381275602917 >> > >> > that's not the whole story though. if vmd doesn't do it, then the kernel >> > has to. either way, the work has to be done, but i strongly believe >> > that it's better to handle the virtio net header in vmd from a whole >> > system perspective. >> > >> > i get that there's a desire to make vmd as thin and simple as >> > possible, but that doesn't make sense to me if the kernel has to >> > bear the cost of increased complexity and reduced flexibility >> > instead. >> >> I'm ok with the design of putting the logic in vmd and agree with >> keeping that muck out of the kernel. > > In the Kernel we already have ether_extract_headers() which already > handles all the packet parsing edge cases. Its just about the question, > if we use virtio_net_hdr as an interface or our own tun_hdr. > > For me its not about to shuffle complexity from kernel to userland, or > the other way around. I'm interested to keep the whole complexity > as small as possible. Give me some time, to show you the other version > of the diff. > >> It seems like we need the TSO features to get performance gains. If >> that's the case, I'd rather we look at TSO and consider this part of the >> implementation. On its own, the checksum offload looks underwhelming. >> >> If TSO is going to bring another layer of complexity it's best that >> surface now. :) > > The additional complexity for VIRTIO_NET_F_HOST_TSO is low. The whole > diff is mostly about the basic infrastructure, to bring the information > in virtio_net_hdr into the kernel and back. > > I'll try to put the VIRTIO_NET_F_HOST_TSO also inside the diff, to show > you the result. That would be great! Thanks. > >> >> > but I do manage to see a >> >> > small improvement in the one test I did using iperf3 sending from host >> >> > to guest. It's only about 1-2% gain in throughput on my Intel x1c gen10 >> >> > and less than 1% on my newer Ryzen AI 350 machine. (This was using a >> >> > -current snapshot for the guest.) >> >> > >> >> > I did this both with the "local interface" (where we already inspect >> >> > each packet to intercept DHCP packets) and one added to a veb(4) device >> >> > with and accompanying host-side vport(4). >> >> > >> >> > My hypothesis is the gain is mostly due to offloading work from the >> >> > single-vcpu guest to the host vionet tx or rx threads. >> >> > >> >> > Is it worth it? Especially knowing we're technically shortcutting the >> >> > actual spec as written by attesting for every packet checksum being >> >> > good? /shrug >> >> > >> >> > Does someone have a better benchmark showing this moves the needle? >> >> >> >> Its not worth it, to benchmark the checksum offloading here. I don't do >> >> this diff for checksum offloading. This is just a dependency for LRO and >> >> TSO in vmd(8) which is the real performance kicker. >> >> >> >> I already showed mlarkin@ at the h2k24 that TSO 10x the network >> >> performance with a PoC diff for TSO in vmd(4). It pushed the guest to >> >> host network performance from ~1 Gbit/s to ~10 Gbit/s back than. >> >> >> >> Thanks, >> >> Jan
vmd: add checksum offload for guests