From: Mike Larkin Subject: Re: vmd: add checksum offload for guests To: Jan Klemkow Cc: David Gwynne , Dave Voutila , Klemens Nanni , Alexander Bluhm , tech@openbsd.org Date: Thu, 15 Jan 2026 14:08:43 -0800 On Thu, Jan 15, 2026 at 11:04:45PM +0100, Jan Klemkow wrote: > On Fri, Jan 16, 2026 at 07:35:35AM +1000, David Gwynne wrote: > > On Thu, Jan 15, 2026 at 12:32:13PM -0800, Mike Larkin wrote: > > > On Thu, Jan 15, 2026 at 09:00:18PM +0100, Jan Klemkow wrote: > > > > On Thu, Jan 15, 2026 at 01:46:59PM -0500, Dave Voutila wrote: > > > > > Jan Klemkow writes: > > > > > > > > > > > On Wed, Jan 14, 2026 at 05:27:07PM -0500, Dave Voutila wrote: > > > > > >> Jan Klemkow writes: > > > > > >> > > > > > >> > On Wed, Jan 14, 2026 at 02:21:18PM -0500, Dave Voutila wrote: > > > > > >> >> Jan Klemkow writes: > > > > > >> >> > > > > > >> >> > On Sat, May 24, 2025 at 06:14:38AM +0000, Klemens Nanni wrote: > > > > > >> >> >> 24.05.2025 06:33, Jan Klemkow ??????????: > > > > > >> >> >> Still breaks: > > > > > >> >> >> > > > > > >> >> >> May 24 09:12:25 atar vmd[44493]: vionet_tx: bad source address 22:8d:47:b5:88:f6 > > > > > >> >> >> May 24 09:12:56 atar last message repeated 25 time > > > > > >> >> >> > > > > > >> >> >> Linux VM is completely offline. > > > > > >> >> > > > > > > >> >> > There was a bug in the csum_start and csum_offset calculation which is fixed in > > > > > >> >> > the following diff. I tested it successfully with Debian/Linux and OpenBSD > > > > > >> >> > guests. > > > > > >> >> > > > > > > >> >> > This diff introduces optional checksum offloading for VMM guests. > > > > > >> >> > > > > > > >> >> > Tests are welcome. > > > > > >> >> > > > > > > >> >> > ok? > > > > > >> >> > > > > > > >> >> > > > > > >> >> Questions in line about pledge changes. > > > > > >> >> > > > > > >> >> Other question is broader: why the need for memory copying with this > > > > > >> >> offload feature? > > > > > >> > > > > > > >> >> I don't know how this offload works, > > > > > >> > > > > > > >> > When packets are just transmitted between host and guests, no one is > > > > > >> > calculating the checksum at all. Thus, we save two checksum calculations per > > > > > >> > packet. One on the sender side and another on the receiver side. Just in case > > > > > >> > of a transmit out of the machine via physical network, it needs to be > > > > > >> > calculated. > > > > > >> > > > > > > > > > > > So in reviewing the virtio spec I'm quite confused where checksumming is > > > > > occuring with this proposed change. > > > > > > > > > > From Virtio 1.2, 5.1.6.4.1 Device Requirements: Processing of Incoming > > > > > Packets: > > > > > > > > > > If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the device > > > > > MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags, if so: > > > > > > > > > > 1. the device MUST validate the packet checksum at offset csum_offset > > > > > from csum_start as well as all preceding offsets; > > > > > > > > > > 2. the device MUST set the packet checksum stored in the receive buffer > > > > > to the TCP/UDP pseudo header; > > > > > > > > > > 3. the device MUST set csum_start and csum_offset such that > > > > > calculating a ones' complement checksum from csum_start up until > > > > > the end of the packet and storing the result at offset csum_offset > > > > > from csum_start will result in a fully checksummed packet; > > > > > > > > > > This reads like vionet needs to be computing checksums in some cases > > > > > when we set VIRTIO_NET_HDR_F_NEEDS_CSUM...which appears to happen in > > > > > thdr2vhdr() on the rx side. > > > > > > > > The other sections make the idea more clear: > > > > > > > > >From Virtio 1.2, 5.1.5 Device Initialization: > > > > > > > > Note: For example, a network packet transported between two guests on > > > > the same system might not need checksumming at all, nor segmentation, > > > > if both guests are amenable. > > > > > > > > This means, we just have to keep the information, that the packet has an > > > > incomplete checksum for the case, the packet leaved our machine. > > > > > > > > Any guest, just accepts a packet with VIRTIO_NET_HDR_F_NEEDS_CSUM, without > > > > computing and verifying the checksum. Also, our host system accepts this kind > > > > of packet. We store this information with the M_TCP_CSUM_OUT flag in our mbuf > > > > pkthdr. So, a device like em(4) with checksum offloading computes the checksum > > > > for us before sending it on the wire. Or, our stack is computing the checksum, > > > > if the device doesn't have this feature. > > > > > > > > thdr2vhdr() and vhdr2thdr() just translates the feature bits between the tunhdr > > > > und virtio_net_hdr structs. Nothing is computed. > > > > > > > > > (It also blindly sets that flag and assumes the guest/driver actually > > > > > negotiated VIRTIO_NET_F_GUEST_CSUM, which is not a safe assumption.) > > > > > > > > We just activate the tun(4) offloading feature, if we successfully negotiated > > > > the VIRTIO_NET_F_GUEST_CSUM feature in vionet_update_offload(). > > > > > > > > The guest could set the VIRTIO_NET_HDR_F_NEEDS_CSUM flag regardless of a > > > > negotiated VIRTIO_NET_F_GUEST_CSUM feature. What should we do in this case? > > > > > > > > At the moment, we translate the bits anyways an keep this information. > > > > We could also just ignore the bit? Or, we could drop the packet? > > > > > > > > Whats your favorite behavior? I'm fine with everything. > > > > > > > > > > As I read that I nodded that packets between host and guest might not need the > > > checksumming. And you mentioned that you tested openbsd and linux guests. > > > > > > But what about two guests talking to each other via a veb or something like that? > > > > > > Who does the checksumming then? > > > > no one. > > Yes, when both guests support checksum offloading. > > In the case that the receiving guest has no checksumming capability, > same as a physical interface at the veb(4), the veb(4) is computing the > checksum before delivering the packet to an interface without this > capability. > > veb(4) itself provides checksum offloading and computes the checksum, if > tap(4) does not provide the IFCAP_CSUM_TCPv4 feature. And tap(4) just > provides this feature, if it was activated via the ioctl(2) TUNSCAP. > > > there's an unstated assumption that the hardware (eg, cpu and ram) a > > hypervisor and the guests run on is ok, and the software is correct. > > > > if you take a step back, the idea behind the checksums in packets > > is to help identify if a packet has been damaged as it travels > > across a medium like a network cable in an ethernet network. the > > important information is that the packet is ok, not that checksum > > is actually calculated. it just happens that you use the actual checksum > > for packets coming off the wire to make that claim about the packet > > being ok. > > > > if the path for a packet is inside a hypervisor between two guests, > > does it matter if the checksums are actually calculated if the platform > > is already trusted? we know the packet is good. > > Good summery of the reason behind this. > Thanks David. > Does this "just work" no matter what guests I run? That's really all I care about. -ml > > > > > >> >> but adding in more memcpy never seems like an improvement for efficiency. > > > > > >> > > > > > > >> > If we would use the virtio_net_hdr header between tun(4) und vmd(8), as Linux > > > > > >> > and FreeBSD do, we could remove all the memcpy(3)s and packet parsing in this > > > > > >> > diff. I would prefer it and also suggestes this in past, but claudio and dlg, > > > > > >> > want to have an openbsd-own tun_hdr for this [1]. > > > > > >> > > > > > > >> > Thus, we have to swap tun_hdr for virtio_net_hdr and vice versa while > > > > > >> > forwarding packets between and tun(4) and guest. I'm fine with this solution, > > > > > >> > now. At least I can say, it doesn't decrease the performance in a significant > > > > > >> > way. > > > > > >> > > > > > > >> > [1]: https://marc.info/?l=openbsd-tech&m=173076189216230 > > > > > >> > > > > > > >> >> Does the tun header stuff not have some requirement to not be fragmented > > > > > >> >> across buffers? > > > > > >> > > > > > > >> > The tun_hdr itself is not fragmented, but the other packet headers > > > > > >> > (ether_header, ip and tcphdr/udphdr) might be fragmented. > > > > > >> > > > > > > >> >> > Index: sys/kern/kern_pledge.c > > > > > >> >> > =================================================================== > > > > > >> >> > RCS file: /cvs/src/sys/kern/kern_pledge.c,v > > > > > >> >> > diff -u -p -r1.335 kern_pledge.c > > > > > >> >> > --- sys/kern/kern_pledge.c 13 Nov 2025 20:59:14 -0000 1.335 > > > > > >> >> > +++ sys/kern/kern_pledge.c 14 Jan 2026 17:25:57 -0000 > > > > > >> >> > @@ -46,6 +46,7 @@ > > > > > >> >> > #include > > > > > >> >> > #include > > > > > >> >> > #include > > > > > >> >> > +#include > > > > > >> >> > #include > > > > > >> >> > #include > > > > > >> >> > #include > > > > > >> >> > @@ -1337,6 +1338,12 @@ pledge_ioctl(struct proc *p, long com, s > > > > > >> >> > cdevsw[major(vp->v_rdev)].d_open == vmmopen) { > > > > > >> >> > error = pledge_ioctl_vmm(p, com); > > > > > >> >> > if (error == 0) > > > > > >> >> > + return 0; > > > > > >> >> > + } > > > > > >> >> > + if ((fp->f_type == DTYPE_VNODE) && > > > > > >> >> > + (vp->v_type == VCHR) && > > > > > >> >> > + (cdevsw[major(vp->v_rdev)].d_open == tapopen)) { > > > > > >> >> > + if (com == TUNSCAP) > > > > > >> >> > return 0; > > > > > >> >> > } > > > > > >> >> > > > > > >> >> The diff scope here isn't showing the actual logic change: this adds > > > > > >> >> capabilities to the "vmm" pledge. Can this be something specific to > > > > > >> >> tap/tun devices or something related? See below for more commentary. > > > > > >> > > > > > > >> > Is the condition "d_open == tapopen" not specific to tap/tun enough? > > > > > >> > > > > > > >> > I already discussed this with hshoexer, mlarkin and deraadt[2]. But, wenn you > > > > > >> > have an idea, how to make this more specific and tighter, I'm open for better > > > > > >> > solutions. > > > > > >> > > > > > > >> > [2]: https://marc.info/?l=openbsd-tech&m=174787581327675&w=2 > > > > > >> > > > > > > >> > > > > > >> I think the right way to do this is to have an already privileged > > > > > >> process make the change to the tap device. Expanding the pivileges of > > > > > >> vionet to add all vmm ioctl capabilities for a single tun/tap ioctl is > > > > > >> overkill. > > > > > >> > > > > > >> The "vmd" process already opens the tap device on behalf of the vm, so > > > > > >> it can do the initial TUNSCAP ioctl with all the bits 0 if that's a > > > > > >> necessity? (I'm not sure why we need to call this with no bits set, but > > > > > >> if there's no harm even if the guest doesn't enable the offload, let's > > > > > >> do it at open(2) time.) > > > > > >> > > > > > >> I think if we want to add "can do ioctl's on a tap(4)" to the "vmm" > > > > > >> pledge, it's best to have vionet ask its parent process (which already > > > > > >> has "vmm") instead of expanding the capabilities of the vionet process. > > > > > >> > > > > > >> I have a design in my head for this. If you can give me 2-3 days I can > > > > > >> put together how the IPC will work. > > > > > >> > > > > > >> It will require keeping a copy of the tap descriptor open in the vm > > > > > >> process, but tying it to that lifetime as well shouldn't be an issue. > > > > > > > > > > > > Here is a version of my diff with ioctl(2) for TUNSCAP in virtio.c. I use a > > > > > > new imsg type VIODEV_MSG_TUNSCAP to transfer the bits. Thus, vionet.c does > > > > > > not need to call ioctl(2) and don't need pledge "vmm". > > > > > > > > > > > > It this the way, you want to have it? > > > > > > > > > > Yes in general that's along what I was thinking. In talking to dlg@ it > > > > > should be fine for this change to be asynchronous as long as the TUNSCAP > > > > > mode is already enabled and we're just changing feature bits. > > > > > > > > > > > > > > > > > ps: i saw you mail about local interfaces a few minutes ago. I'll check it. > > > > > > > > > > > > Thanks, > > > > > > Jan > > > > > > > > > > > > Index: sys/kern/kern_pledge.c > > > > > > =================================================================== > > > > > > RCS file: /cvs/src/sys/kern/kern_pledge.c,v > > > > > > diff -u -p -r1.335 kern_pledge.c > > > > > > --- sys/kern/kern_pledge.c 13 Nov 2025 20:59:14 -0000 1.335 > > > > > > +++ sys/kern/kern_pledge.c 14 Jan 2026 17:25:57 -0000 > > > > > > @@ -46,6 +46,7 @@ > > > > > > #include > > > > > > #include > > > > > > #include > > > > > > +#include > > > > > > #include > > > > > > #include > > > > > > #include > > > > > > @@ -1337,6 +1338,12 @@ pledge_ioctl(struct proc *p, long com, s > > > > > > cdevsw[major(vp->v_rdev)].d_open == vmmopen) { > > > > > > error = pledge_ioctl_vmm(p, com); > > > > > > if (error == 0) > > > > > > + return 0; > > > > > > + } > > > > > > + if ((fp->f_type == DTYPE_VNODE) && > > > > > > + (vp->v_type == VCHR) && > > > > > > + (cdevsw[major(vp->v_rdev)].d_open == tapopen)) { > > > > > > + if (com == TUNSCAP) > > > > > > return 0; > > > > > > } > > > > > > } > > > > > > Index: usr.sbin/vmd/vionet.c > > > > > > =================================================================== > > > > > > RCS file: /cvs/src/usr.sbin/vmd/vionet.c,v > > > > > > diff -u -p -r1.29 vionet.c > > > > > > --- usr.sbin/vmd/vionet.c 14 Jan 2026 03:09:05 -0000 1.29 > > > > > > +++ usr.sbin/vmd/vionet.c 15 Jan 2026 15:55:28 -0000 > > > > > > @@ -22,7 +22,12 @@ > > > > > > #include > > > > > > > > > > > > #include > > > > > > +#include > > > > > > #include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > +#include > > > > > > #include > > > > > > > > > > > > #include > > > > > > @@ -50,6 +55,7 @@ > > > > > > > > > > > > #define VIRTIO_NET_CONFIG_MAC 0 /* 8 bit x 6 byte */ > > > > > > > > > > > > +#define VIRTIO_NET_F_GUEST_CSUM (1 << 1) > > > > > > #define VIRTIO_NET_F_MAC (1 << 5) > > > > > > #define RXQ 0 > > > > > > #define TXQ 1 > > > > > > @@ -65,7 +71,7 @@ static void *rx_run_loop(void *); > > > > > > static void *tx_run_loop(void *); > > > > > > static int vionet_rx(struct virtio_dev *, int); > > > > > > static ssize_t vionet_rx_copy(struct vionet_dev *, int, const struct iovec *, > > > > > > - int, size_t); > > > > > > + int, size_t, struct tun_hdr *th); > > > > > > static ssize_t vionet_rx_zerocopy(struct vionet_dev *, int, > > > > > > const struct iovec *, int); > > > > > > static void vionet_rx_event(int, short, void *); > > > > > > @@ -84,6 +90,10 @@ static void read_pipe_rx(int, short, voi > > > > > > static void read_pipe_tx(int, short, void *); > > > > > > static void vionet_assert_pic_irq(struct virtio_dev *); > > > > > > static void vionet_deassert_pic_irq(struct virtio_dev *); > > > > > > +static void vhdr2thdr(struct virtio_net_hdr *, struct tun_hdr *, > > > > > > + const struct iovec *, int); > > > > > > +static void thdr2vhdr(struct tun_hdr *, struct virtio_net_hdr *, > > > > > > + const struct iovec *, int); > > > > > > > > > > > > /* Device Globals */ > > > > > > struct event ev_tap; > > > > > > @@ -300,6 +310,30 @@ fail: > > > > > > } > > > > > > > > > > > > /* > > > > > > + * Update and sync offload features with tap(4). > > > > > > + */ > > > > > > +static void > > > > > > +vionet_update_offload(struct virtio_dev *dev) > > > > > > +{ > > > > > > + struct viodev_msg msg; > > > > > > + int ret; > > > > > > + > > > > > > + memset(&msg, 0, sizeof(msg)); > > > > > > + msg.irq = dev->irq; > > > > > > + msg.type = VIODEV_MSG_TUNSCAP; > > > > > > + > > > > > > + if (dev->driver_feature & VIRTIO_NET_F_GUEST_CSUM) { > > > > > > + msg.data |= IFCAP_CSUM_TCPv4 | IFCAP_CSUM_UDPv4; > > > > > > + msg.data |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; > > > > > > + } > > > > > > + > > > > > > + ret = imsg_compose_event2(&dev->async_iev, IMSG_DEVOP_MSG, 0, 0, -1, > > > > > > + &msg, sizeof(msg), ev_base_main); > > > > > > + if (ret == -1) > > > > > > + log_warnx("%s: failed to assert irq %d", __func__, dev->irq); > > > > > > +} > > > > > > + > > > > > > +/* > > > > > > * vionet_rx > > > > > > * > > > > > > * Pull packet from the provided fd and fill the receive-side virtqueue. We > > > > > > @@ -321,6 +355,7 @@ vionet_rx(struct virtio_dev *dev, int fd > > > > > > struct virtio_net_hdr *hdr = NULL; > > > > > > struct virtio_vq_info *vq_info; > > > > > > struct iovec *iov; > > > > > > + struct tun_hdr th; > > > > > > int notify = 0; > > > > > > ssize_t sz; > > > > > > uint8_t status = 0; > > > > > > @@ -351,8 +386,8 @@ vionet_rx(struct virtio_dev *dev, int fd > > > > > > goto reset; > > > > > > } > > > > > > > > > > > > - iov = &iov_rx[0]; > > > > > > - iov_cnt = 1; > > > > > > + iov = &iov_rx[1]; > > > > > > + iov_cnt = 2; > > > > > > > > > > > > /* > > > > > > * First descriptor should be at least as large as the > > > > > > @@ -373,7 +408,6 @@ vionet_rx(struct virtio_dev *dev, int fd > > > > > > if (iov->iov_base == NULL) > > > > > > goto reset; > > > > > > hdr = iov->iov_base; > > > > > > - memset(hdr, 0, sizeof(struct virtio_net_hdr)); > > > > > > > > > > > > /* Tweak the iovec to account for the virtio_net_hdr. */ > > > > > > iov->iov_len -= sizeof(struct virtio_net_hdr); > > > > > > @@ -418,22 +452,26 @@ vionet_rx(struct virtio_dev *dev, int fd > > > > > > goto reset; > > > > > > } > > > > > > > > > > > > - hdr->num_buffers = iov_cnt; > > > > > > - > > > > > > /* > > > > > > * If we're enforcing hardware address or handling an injected > > > > > > * packet, we need to use a copy-based approach. > > > > > > */ > > > > > > if (vionet->lockedmac || fd != vionet->data_fd) > > > > > > sz = vionet_rx_copy(vionet, fd, iov_rx, iov_cnt, > > > > > > - chain_len); > > > > > > - else > > > > > > + chain_len, &th); > > > > > > + else { > > > > > > + iov_rx[0].iov_base = &th; > > > > > > + iov_rx[0].iov_len = sizeof(th); > > > > > > sz = vionet_rx_zerocopy(vionet, fd, iov_rx, iov_cnt); > > > > > > + } > > > > > > if (sz == -1) > > > > > > goto reset; > > > > > > if (sz == 0) /* No packets, so bail out for now. */ > > > > > > break; > > > > > > > > > > > > + thdr2vhdr(&th, hdr, iov_rx + 1, iov_cnt - 1); > > > > > > + hdr->num_buffers = iov_cnt - 1; > > > > > > + > > > > > > /* > > > > > > * Account for the prefixed header since it wasn't included > > > > > > * in the copy or zerocopy operations. > > > > > > @@ -473,9 +511,9 @@ reset: > > > > > > */ > > > > > > ssize_t > > > > > > vionet_rx_copy(struct vionet_dev *dev, int fd, const struct iovec *iov, > > > > > > - int iov_cnt, size_t chain_len) > > > > > > + int iov_cnt, size_t chain_len, struct tun_hdr *th) > > > > > > { > > > > > > - static uint8_t buf[VIONET_HARD_MTU]; > > > > > > + static uint8_t buf[sizeof(struct tun_hdr) + VIONET_HARD_MTU]; > > > > > > struct packet *pkt = NULL; > > > > > > struct ether_header *eh = NULL; > > > > > > uint8_t *payload = buf; > > > > > > @@ -483,9 +521,10 @@ vionet_rx_copy(struct vionet_dev *dev, i > > > > > > ssize_t sz; > > > > > > > > > > > > /* If reading from the tap(4), try to right-size the read. */ > > > > > > - if (fd == dev->data_fd) > > > > > > - nbytes = MIN(chain_len, VIONET_HARD_MTU); > > > > > > - else if (fd == pipe_inject[READ]) > > > > > > + if (fd == dev->data_fd) { > > > > > > + nbytes = sizeof(struct tun_hdr) + > > > > > > + MIN(chain_len, VIONET_HARD_MTU); > > > > > > + } else if (fd == pipe_inject[READ]) > > > > > > nbytes = sizeof(struct packet); > > > > > > else { > > > > > > log_warnx("%s: invalid fd: %d", __func__, fd); > > > > > > @@ -504,10 +543,20 @@ vionet_rx_copy(struct vionet_dev *dev, i > > > > > > return (-1); > > > > > > } > > > > > > return (0); > > > > > > - } else if (fd == dev->data_fd && sz < VIONET_MIN_TXLEN) { > > > > > > + } else if (fd == dev->data_fd) { > > > > > > + if ((size_t)sz < sizeof(struct tun_hdr)) { > > > > > > + log_warnx("%s: short tun_hdr", __func__); > > > > > > + return (0); > > > > > > + } > > > > > > + memcpy(th, payload, sizeof *th); > > > > > > + payload += sizeof(struct tun_hdr); > > > > > > + sz -= sizeof(struct tun_hdr); > > > > > > + > > > > > > /* If reading the tap(4), we should get valid ethernet. */ > > > > > > - log_warnx("%s: invalid packet size", __func__); > > > > > > - return (0); > > > > > > + if (sz < VIONET_MIN_TXLEN) { > > > > > > + log_warnx("%s: invalid packet size", __func__); > > > > > > + return (0); > > > > > > + } > > > > > > } else if (fd == pipe_inject[READ] && sz != sizeof(struct packet)) { > > > > > > log_warnx("%s: invalid injected packet object (sz=%ld)", > > > > > > __func__, sz); > > > > > > @@ -585,6 +634,12 @@ vionet_rx_zerocopy(struct vionet_dev *de > > > > > > sz = readv(fd, iov, iov_cnt); > > > > > > if (sz == -1 && errno == EAGAIN) > > > > > > return (0); > > > > > > + > > > > > > + if ((size_t)sz < sizeof(struct tun_hdr)) > > > > > > + return (0); > > > > > > + > > > > > > + sz -= sizeof(struct tun_hdr); > > > > > > + > > > > > > return (sz); > > > > > > } > > > > > > > > > > > > @@ -666,6 +721,8 @@ vionet_tx(struct virtio_dev *dev) > > > > > > struct iovec *iov; > > > > > > struct packet pkt; > > > > > > uint8_t status = 0; > > > > > > + struct virtio_net_hdr *vhp; > > > > > > + struct tun_hdr th; > > > > > > > > > > > > status = dev->status & VIRTIO_CONFIG_DEVICE_STATUS_DRIVER_OK; > > > > > > if (status != VIRTIO_CONFIG_DEVICE_STATUS_DRIVER_OK) { > > > > > > @@ -692,8 +749,10 @@ vionet_tx(struct virtio_dev *dev) > > > > > > goto reset; > > > > > > } > > > > > > > > > > > > - iov = &iov_tx[0]; > > > > > > - iov_cnt = 0; > > > > > > + /* the 0th slot will by used by the tun_hdr */ > > > > > > + > > > > > > + iov = &iov_tx[1]; > > > > > > + iov_cnt = 1; > > > > > > chain_len = 0; > > > > > > > > > > > > /* > > > > > > @@ -704,13 +763,16 @@ vionet_tx(struct virtio_dev *dev) > > > > > > log_warnx("%s: invalid descriptor length", __func__); > > > > > > goto reset; > > > > > > } > > > > > > - iov->iov_len = desc->len; > > > > > > > > > > > > - if (iov->iov_len > sizeof(struct virtio_net_hdr)) { > > > > > > - /* Chop off the virtio header, leaving packet data. */ > > > > > > - iov->iov_len -= sizeof(struct virtio_net_hdr); > > > > > > - iov->iov_base = hvaddr_mem(desc->addr + > > > > > > - sizeof(struct virtio_net_hdr), iov->iov_len); > > > > > > + /* Chop the virtio net header off */ > > > > > > + vhp = hvaddr_mem(desc->addr, sizeof(*vhp)); > > > > > > + if (vhp == NULL) > > > > > > + goto reset; > > > > > > + > > > > > > + iov->iov_len = desc->len - sizeof(*vhp); > > > > > > + if (iov->iov_len > 0) { > > > > > > + iov->iov_base = hvaddr_mem(desc->addr + sizeof(*vhp), > > > > > > + iov->iov_len); > > > > > > if (iov->iov_base == NULL) > > > > > > goto reset; > > > > > > > > > > > > @@ -758,7 +820,7 @@ vionet_tx(struct virtio_dev *dev) > > > > > > * descriptor with packet data contains a large enough buffer > > > > > > * for this inspection. > > > > > > */ > > > > > > - iov = &iov_tx[0]; > > > > > > + iov = &iov_tx[1]; > > > > > > if (vionet->lockedmac) { > > > > > > if (iov->iov_len < ETHER_HDR_LEN) { > > > > > > log_warnx("%s: insufficient header data", > > > > > > @@ -784,6 +846,15 @@ vionet_tx(struct virtio_dev *dev) > > > > > > } > > > > > > } > > > > > > > > > > > > + /* > > > > > > + * if we look at more of vhp we might need to copy > > > > > > + * it so it's aligned properly > > > > > > + */ > > > > > > + vhdr2thdr(vhp, &th, iov_tx + 1, iov_cnt - 1); > > > > > > + > > > > > > + iov_tx[0].iov_base = &th; > > > > > > + iov_tx[0].iov_len = sizeof(th); > > > > > > + > > > > > > /* Write our packet to the tap(4). */ > > > > > > sz = writev(vionet->data_fd, iov_tx, iov_cnt); > > > > > > if (sz == -1 && errno != ENOBUFS) { > > > > > > @@ -1114,6 +1185,7 @@ vionet_cfg_write(struct virtio_dev *dev, > > > > > > dev->driver_feature &= dev->device_feature; > > > > > > DPRINTF("%s: driver features 0x%llx", __func__, > > > > > > dev->driver_feature); > > > > > > + vionet_update_offload(dev); > > > > > > break; > > > > > > case VIO1_PCI_CONFIG_MSIX_VECTOR: > > > > > > /* Ignore until we support MSIX. */ > > > > > > @@ -1555,6 +1627,155 @@ vionet_assert_pic_irq(struct virtio_dev > > > > > > &msg, sizeof(msg), ev_base_main); > > > > > > if (ret == -1) > > > > > > log_warnx("%s: failed to assert irq %d", __func__, dev->irq); > > > > > > +} > > > > > > + > > > > > > +static int > > > > > > +memcpyv(void *buf, size_t len, size_t off, const struct iovec *iov, int iovcnt) > > > > > > +{ > > > > > > + uint8_t *dst = buf; > > > > > > + size_t l; > > > > > > + > > > > > > + for (;;) { > > > > > > + if (iovcnt == 0) > > > > > > + return (-1); > > > > > > + > > > > > > + if (off < iov->iov_len) > > > > > > + break; > > > > > > + > > > > > > + off -= iov->iov_len; > > > > > > + iov++; > > > > > > + iovcnt--; > > > > > > + } > > > > > > + > > > > > > + l = off + len; > > > > > > + if (l > iov->iov_len) > > > > > > + l = iov->iov_len; > > > > > > + l -= off; > > > > > > + > > > > > > + memcpy(dst, (const uint8_t *)iov->iov_base + off, l); > > > > > > + dst += l; > > > > > > + len -= l; > > > > > > + > > > > > > + if (len == 0) > > > > > > + return (0); > > > > > > + > > > > > > + for (;;) { > > > > > > + if (iovcnt == 0) > > > > > > + return (-1); > > > > > > + > > > > > > + l = len; > > > > > > + if (l > iov->iov_len) > > > > > > + l = iov->iov_len; > > > > > > + > > > > > > + memcpy(dst, (const uint8_t *)iov->iov_base, l); > > > > > > + dst += l; > > > > > > + len -= l; > > > > > > + > > > > > > + if (len == 0) > > > > > > + break; > > > > > > + > > > > > > + iov++; > > > > > > + iovcnt--; > > > > > > + } > > > > > > + > > > > > > + return (0); > > > > > > +} > > > > > > + > > > > > > +static void > > > > > > +hdr_extract(const struct iovec *iov, int iovcnt, size_t *off, uint8_t *proto) > > > > > > +{ > > > > > > + size_t offs; > > > > > > + uint16_t etype; > > > > > > + > > > > > > + if (memcpyv(&etype, sizeof(etype), > > > > > > + offsetof(struct ether_header, ether_type), > > > > > > + iov, iovcnt) == -1) > > > > > > + return; > > > > > > + > > > > > > + *off = sizeof(struct ether_header); > > > > > > + > > > > > > + if (etype == htons(ETHERTYPE_VLAN)) { > > > > > > + if (memcpyv(&etype, sizeof(etype), > > > > > > + offsetof(struct ether_vlan_header, evl_proto), > > > > > > + iov, iovcnt) == -1) > > > > > > + return; > > > > > > + > > > > > > + *off = sizeof(struct ether_vlan_header); > > > > > > + } > > > > > > + > > > > > > + if (etype == htons(ETHERTYPE_IP)) { > > > > > > + uint8_t hl; > > > > > > + > > > > > > + /* Get ipproto field from IP header. */ > > > > > > + offs = *off + offsetof(struct ip, ip_p); > > > > > > + if (memcpyv(proto, sizeof(*proto), offs, iov, iovcnt) == -1) > > > > > > + return; > > > > > > + > > > > > > + /* Get IP header length field from IP header. */ > > > > > > + offs = *off; > > > > > > + if (memcpyv(&hl, sizeof(hl), offs, iov, iovcnt) == -1) > > > > > > + return; > > > > > > + > > > > > > + *off += (hl & 0x0f) << 2; > > > > > > + } else if (etype == htons(ETHERTYPE_IPV6)) { > > > > > > + /* Get next header field from IP header. */ > > > > > > + offs = *off + offsetof(struct ip6_hdr, ip6_nxt); > > > > > > + if (memcpyv(proto, sizeof(*proto), offs, iov, iovcnt) == -1) > > > > > > + return; > > > > > > + > > > > > > + *off += sizeof(struct ip6_hdr); > > > > > > + } > > > > > > +} > > > > > > + > > > > > > +static void > > > > > > +vhdr2thdr(struct virtio_net_hdr *vh, struct tun_hdr *th, > > > > > > + const struct iovec *iov, int iovcnt) > > > > > > +{ > > > > > > + memset(th, 0, sizeof(*th)); > > > > > > + > > > > > > + if (vh->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { > > > > > > + size_t off; > > > > > > + uint8_t proto; > > > > > > + > > > > > > + hdr_extract(iov, iovcnt, &off, &proto); > > > > > > + > > > > > > + switch (proto) { > > > > > > + case IPPROTO_TCP: > > > > > > + th->th_flags |= TUN_H_TCP_CSUM; > > > > > > + break; > > > > > > + > > > > > > + case IPPROTO_UDP: > > > > > > + th->th_flags |= TUN_H_UDP_CSUM; > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > +} > > > > > > + > > > > > > +static void > > > > > > +thdr2vhdr(struct tun_hdr *th, struct virtio_net_hdr *vh, > > > > > > + const struct iovec *iov, int iovcnt) > > > > > > +{ > > > > > > + size_t off; > > > > > > + uint8_t proto; > > > > > > + > > > > > > + memset(vh, 0, sizeof(*vh)); > > > > > > + > > > > > > + if (th->th_flags & (TUN_H_TCP_CSUM | TUN_H_UDP_CSUM)) { > > > > > > + hdr_extract(iov, iovcnt, &off, &proto); > > > > > > + > > > > > > + vh->flags |= VIRTIO_NET_HDR_F_NEEDS_CSUM; > > > > > > + vh->csum_start = off; > > > > > > + > > > > > > + switch (proto) { > > > > > > + case IPPROTO_TCP: > > > > > > + vh->csum_offset = offsetof(struct tcphdr, th_sum); > > > > > > + break; > > > > > > + > > > > > > + case IPPROTO_UDP: > > > > > > + vh->csum_offset = offsetof(struct udphdr, uh_sum); > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > } > > > > > > > > > > > > /* > > > > > > Index: usr.sbin/vmd/virtio.c > > > > > > =================================================================== > > > > > > RCS file: /cvs/src/usr.sbin/vmd/virtio.c,v > > > > > > diff -u -p -r1.134 virtio.c > > > > > > --- usr.sbin/vmd/virtio.c 14 Jan 2026 03:09:05 -0000 1.134 > > > > > > +++ usr.sbin/vmd/virtio.c 15 Jan 2026 15:55:36 -0000 > > > > > > @@ -19,6 +19,7 @@ > > > > > > #include /* PAGE_SIZE */ > > > > > > #include > > > > > > #include > > > > > > +#include > > > > > > > > > > > > #include > > > > > > #include > > > > > > @@ -28,6 +29,7 @@ > > > > > > #include > > > > > > > > > > > > #include > > > > > > +#include > > > > > > #include > > > > > > #include > > > > > > > > > > > > @@ -64,6 +66,8 @@ SLIST_HEAD(virtio_dev_head, virtio_dev) > > > > > > > > > > > > #define MAXPHYS (64 * 1024) /* max raw I/O transfer size */ > > > > > > > > > > > > +#define VIRTIO_NET_F_CSUM (1<<0) > > > > > > +#define VIRTIO_NET_F_GUEST_CSUM (1<<1) > > > > > > #define VIRTIO_NET_F_MAC (1<<5) > > > > > > > > > > > > #define VMMCI_F_TIMESYNC (1<<0) > > > > > > @@ -1020,6 +1024,8 @@ virtio_init(struct vmd_vm *vm, int child > > > > > > /* Virtio 1.x Network Devices */ > > > > > > if (vmc->vmc_nnics > 0) { > > > > > > for (i = 0; i < vmc->vmc_nnics; i++) { > > > > > > + struct tun_capabilities tcap; > > > > > > + > > > > > > dev = malloc(sizeof(struct virtio_dev)); > > > > > > if (dev == NULL) { > > > > > > log_warn("calloc failure allocating vionet"); > > > > > > @@ -1034,7 +1040,8 @@ virtio_init(struct vmd_vm *vm, int child > > > > > > } > > > > > > virtio_dev_init(vm, dev, id, VIONET_QUEUE_SIZE_DEFAULT, > > > > > > VIRTIO_NET_QUEUES, > > > > > > - (VIRTIO_NET_F_MAC | VIRTIO_F_VERSION_1)); > > > > > > + (VIRTIO_NET_F_MAC | VIRTIO_NET_F_CSUM | > > > > > > + VIRTIO_NET_F_GUEST_CSUM | VIRTIO_F_VERSION_1)); > > > > > > > > > > > > if (pci_add_bar(id, PCI_MAPREG_TYPE_IO, virtio_pci_io, > > > > > > dev) == -1) { > > > > > > @@ -1056,6 +1063,14 @@ virtio_init(struct vmd_vm *vm, int child > > > > > > dev->vmm_id = vm->vm_vmmid; > > > > > > dev->vionet.data_fd = child_taps[i]; > > > > > > > > > > > > + /* > > > > > > + * IFCAPs are tweaked after feature negotiation with > > > > > > + * the guest later. > > > > > > + */ > > > > > > + memset(&tcap, 0, sizeof(tcap)); > > > > > > + if (ioctl(dev->vionet.data_fd, TUNSCAP, &tcap) == -1) > > > > > > + fatal("tap(4) TUNSCAP"); > > > > > > + > > > > > > /* MAC address has been assigned by the parent */ > > > > > > memcpy(&dev->vionet.mac, &vmc->vmc_macs[i], 6); > > > > > > dev->vionet.lockedmac = > > > > > > @@ -1532,10 +1547,12 @@ virtio_dev_launch(struct vmd_vm *vm, str > > > > > > } > > > > > > > > > > > > /* Close data fds. Only the child device needs them now. */ > > > > > > - if (virtio_dev_closefds(dev) == -1) { > > > > > > - log_warnx("%s: failed to close device data fds", > > > > > > - __func__); > > > > > > - goto err; > > > > > > + if (dev->dev_type != VMD_DEVTYPE_NET) { > > > > > > + if (virtio_dev_closefds(dev) == -1) { > > > > > > + log_warnx("%s: failed to close device data fds", > > > > > > + __func__); > > > > > > + goto err; > > > > > > + } > > > > > > } > > > > > > > > > > > > /* 2. Send over details on the VM (including memory fds). */ > > > > > > @@ -1758,6 +1775,18 @@ handle_dev_msg(struct viodev_msg *msg, s > > > > > > case VIODEV_MSG_ERROR: > > > > > > log_warnx("%s: device reported error", __func__); > > > > > > break; > > > > > > + case VIODEV_MSG_TUNSCAP: > > > > > > + { > > > > > > + struct tun_capabilities tcap; > > > > > > + > > > > > > + memset(&tcap, 0, sizeof(tcap)); > > > > > > + tcap.tun_if_capabilities = msg->data; > > > > > > + > > > > > > + if (ioctl(gdev->vionet.data_fd, TUNSCAP, &tcap) == -1) > > > > > > + fatal("%s: tap(4) TUNSCAP", __func__); > > > > > > + > > > > > > + break; > > > > > > + } > > > > > > case VIODEV_MSG_INVALID: > > > > > > case VIODEV_MSG_IO_READ: > > > > > > case VIODEV_MSG_IO_WRITE: > > > > > > Index: usr.sbin/vmd/virtio.h > > > > > > =================================================================== > > > > > > RCS file: /cvs/src/usr.sbin/vmd/virtio.h,v > > > > > > diff -u -p -r1.60 virtio.h > > > > > > --- usr.sbin/vmd/virtio.h 14 Jan 2026 03:09:05 -0000 1.60 > > > > > > +++ usr.sbin/vmd/virtio.h 15 Jan 2026 15:55:48 -0000 > > > > > > @@ -134,6 +134,7 @@ struct viodev_msg { > > > > > > #define VIODEV_MSG_IO_WRITE 5 > > > > > > #define VIODEV_MSG_DUMP 6 > > > > > > #define VIODEV_MSG_SHUTDOWN 7 > > > > > > +#define VIODEV_MSG_TUNSCAP 8 > > > > > > > > > > > > uint16_t reg; /* VirtIO register */ > > > > > > uint8_t io_sz; /* IO instruction size */ > > > > > > @@ -309,6 +310,9 @@ struct virtio_net_hdr { > > > > > > uint16_t padding_reserved; > > > > > > */ > > > > > > }; > > > > > > + > > > > > > +#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* flags */ > > > > > > +#define VIRTIO_NET_HDR_F_DATA_VALID 2 /* flags */ > > > > > > > > > > > > enum vmmci_cmd { > > > > > > VMMCI_NONE = 0, > > > > > > > > > > > > > > > >