Index | Thread | Search

From:
Stefan Fritsch <sf@openbsd.org>
Subject:
Re: vmd(8): TCP/UDP checksum offloading for guests
To:
Claudio Jeker <claudio@openbsd.org>
Cc:
Jan Klemkow <j.klemkow@wemelug.de>, tech@openbsd.org, David Gwynne <david@gwynne.id.au>
Date:
Tue, 5 Nov 2024 15:50:26 +0100

Download raw body.

Thread
On Tue, 5 Nov 2024, Claudio Jeker wrote:

> On Mon, Nov 04, 2024 at 05:25:17PM +0100, Jan Klemkow wrote:
> > On Fri, Aug 23, 2024 at 02:20:37PM GMT, Claudio Jeker wrote:
> > > On Fri, Aug 23, 2024 at 12:29:10PM +0200, Jan Klemkow wrote:
> > > > On Fri, Aug 23, 2024 at 09:48:58AM +0200, Claudio Jeker wrote:
> > > > > On Fri, Aug 23, 2024 at 11:05:43AM +1000, David Gwynne wrote:
> > > > > > On Fri, Aug 16, 2024 at 02:51:06PM +0200, Jan Klemkow wrote:
> > > > > > > This diff introduces the transmit side of TCP/UDP checksum offloading
> > > > > > > for vmd(8) guests.  It uses a similar implementation as FreeBSD and
> > > > > > > Linux.  vmd(8) tell tap(4) via a new ioctl to expect virtio net header
> > > > > > > structs.  These are used to transport offloading feature bits between
> > > > > > > guest and tap(4) device.  The tap(4) translates the feature bits between
> > > > > > > virtio net header and our mbufs.
> > > > > > > 
> > > > > > > This diff just contains the sending part, to keep it short.
> > > > > > > 
> > > > > > > When both parts are implemented, we never calculate a checksum of
> > > > > > > packets between our guests or the host system.  As its described in the
> > > > > > > virtio-network specification:
> > > > > > > 
> > > > > > > 	A network packet transported between two guests on the same
> > > > > > > 	system might not need checksumming at all, nor segmentation,
> > > > > > > 	if both guests are amenable.
> > > > > > > 
> > > > > > > This behavior is similar to our loopback traffic, as we also don't
> > > > > > > calculate checksums of packets between two processes here.
> > > > > > > 
> > > > > > > Tests and any kind of feedback is welcome!
> > > > > > 
> > > > > > i love the idea, but i really dislike how tun and virtio get tied
> > > > > > together by this diff.
> > > > > > 
> > > > > > ideally tun and only tun would get extended to be able to communicate
> > > > > > this info between userland and the kernel in a way that makes sense for
> > > > > > our stack. separately, vmd would be responsible for mapping the virtio
> > > > > > semantics to our tun driver semantics.
> > > > > > 
> > > > > > i understand that tun/tap is kind of a psuedo standard interface
> > > > > > between a bunch of different systems, and we're catching up to some of
> > > > > > those systems in this area, but making struct virtio_net_hdr part
> > > > > > of the tun api seems a bit much. virtio can tweak this struct when
> > > > > > it makes sense for a guest talking to a hv via a memory mapped ring
> > > > > > and visa versa, but automatically inheriting the change for userland
> > > > > > talking to the kernel on an fd is annoying.
> > > > > 
> > > > > I'm very much on the same page with that. Linux did a disservice to
> > > > > everyone (once again). I would prefer if tun/tap had their own extension.
> > > > > The virtio header is a pain especially since they altered the format and
> > > > > you now have to handle two different forms.
> > > > > So this dependency of virtio and vio structs in tun/tap is just terrible.
> > > > 
> > > > Linux uses this interface for KVM and FreeBSD also uses it for bhyve.
> > > > Why don't we use it for VMM, too?
> > > 
> > > Because is it sub-standard and a layer violation of major proportions.
> > > We are not just followers adding bad code to our kernel. I think OpenBSD
> > > was always about doing the right thing even if is hard.
> > >  
> > > > At least there is an specification [1] for this interface.  Which is
> > > > quiet stable with backward compatibility.  Thus, it don't seems to break
> > > > in the future.
> > > 
> > > Until they break it again with more backwards compat and whatever. Why tie
> > > tun/tap together with virtio? tun and tap are used for other things as
> > > well. Some time ago I looked at what linux did for tun / tap and walked
> > > away exactly because of this. Their solution is a hack on a hack with a
> > > hack and a layer violation on top.
> > >  
> > > > I get your point, that you don't want to deal with two different
> > > > versions of the virtio net header structure in our kernel.  To get rid
> > > > of the legacy format we have to upgrade vmd(8) from virtio 0.9 to 1.0.
> > > > 
> > > > [1]: https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html
> > > > 
> > > > What would an OpenBSD only extension to the tap(4) interface looks like?
> > > > Do we define our own structure tap_net_hdr?  And vmd(8) has to translate
> > > > between virtio_net_hdr and tap_net_hdr?
> > > 
> > > Yes. Exactly. We could create one sensible header that would allow for all
> > > the things we care. Not only TX/RX csum offload, e.g. multi packet send
> > > and recv and multiqueue support.
> > > 
> > > In linux land virtio and tap is now so interlinked that it is hard to
> > > improve either of them.
> > >  
> > > > vmd(8) don't need the information in virtio_net_hdr.  tap(4) does.
> > > > tap(4) is the only one here, who can transfer information between mbuf
> > > > and virtio_net_hdr.  Like, our drivers in dev/if_*.c translate between
> > > > their receive descriptors and our mbuf structure.
> > > 
> > > No, our drivers do not create an ethernet extension header that is sent to
> > > the switch to do the checksum offload there... This is what people created
> > > here. They introduce signaling over two device abstractions. That is just
> > > bad design. virtio is already a mess and I would prefer if that mess
> > > remains in vmd / qemu.
> > >  
> > > > > > will any ports make use of this? would this be used outside vmd? i
> > > > > > have vague memories of tailscale omgoptimising a wireguard
> > > > > > implementation to do offloads which probably uses this, but nothing
> > > > > > else springs to mind.
> > > > > 
> > > > > From my understanding on qemu and maybe tailscale/openvpn would care.
> > > > > IMO this is easy enough to handle in ports.
> > > > 
> > > > Yes, qemu uses this interface.  tailscale and openvpn don't as far as I
> > > > see it in their sources.
> > 
> > After talking to claudio at h2k24 I don't see any other version of this
> > at the moment.  We only have two consumers of this API (vmd and qemu).
> > Every other API would make it much more complicated without any benefit
> > for other userland tools.
> > 
> > At least its an optional extension to tap(4) and don't change the
> > default behavior.
> > 
> > ok?
> 
> Looking at the diff it still disgusts me very much. The virtio header
> includes a lot more than required and it is unclear how tap should handle
> stuff that has bits set that it does not really no about.
> 
> Just the bits that have to be moved into a public header shows that this
> is something that was not thought through by the people pushing for this.

I think there is a bit of a misunderstanding here. The feature bits do not 
have to be moved to a public header for the tun(4) change, but moving all 
the "hardware" definitions to a if_vioreg.h header is a good idea. Then 
vmd can use the bits from there instead of bringing its own definitions. 
This is completely independent of what tun(4) does.

> 
> We only need CSUM, TSO, LRO flags and maybe VLAN MTU ones. A lot of the
> other other feature bits are just screaming for bad behaviour.
> So this should really be done less terrible.

The complete definition of the virtio_net_hdr in virtio 1.2 is this:

struct virtio_net_hdr { 
#define VIRTIO_NET_HDR_F_NEEDS_CSUM    1 
#define VIRTIO_NET_HDR_F_DATA_VALID    2 
#define VIRTIO_NET_HDR_F_RSC_INFO      4 
        u8 flags; 
#define VIRTIO_NET_HDR_GSO_NONE        0 
#define VIRTIO_NET_HDR_GSO_TCPV4       1 
#define VIRTIO_NET_HDR_GSO_UDP         3 
#define VIRTIO_NET_HDR_GSO_TCPV6       4 
#define VIRTIO_NET_HDR_GSO_UDP_L4      5 
#define VIRTIO_NET_HDR_GSO_ECN      0x80 
        u8 gso_type; 
        le16 hdr_len; 
        le16 gso_size; 
        le16 csum_start; 
        le16 csum_offset; 
        le16 num_buffers; 
        le32 hash_value;
#define VIRTIO_NET_HASH_REPORT_NONE            0 
#define VIRTIO_NET_HASH_REPORT_IPv4            1 
#define VIRTIO_NET_HASH_REPORT_TCPv4           2 
#define VIRTIO_NET_HASH_REPORT_UDPv4           3 
#define VIRTIO_NET_HASH_REPORT_IPv6            4 
#define VIRTIO_NET_HASH_REPORT_TCPv6           5 
#define VIRTIO_NET_HASH_REPORT_UDPv6           6 
#define VIRTIO_NET_HASH_REPORT_IPv6_EX         7 
#define VIRTIO_NET_HASH_REPORT_TCPv6_EX        8 
#define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9 
        le16 hash_report;
        le16 padding_reserved; 
};

But I don't really know vmd and tun and cannot comment on Jan's approach 
to offloading. Some more comments are inline below.

Cheers,
Stefan

>  
> > Thanks,
> > Jan
> > 
> > Index: sys/dev/pv/if_vio.c
> > ===================================================================
> > RCS file: /cvs/src/sys/dev/pv/if_vio.c,v
> > diff -u -p -r1.45 if_vio.c
> > --- sys/dev/pv/if_vio.c	1 Aug 2024 11:13:19 -0000	1.45
> > +++ sys/dev/pv/if_vio.c	16 Aug 2024 08:47:15 -0000
> > @@ -59,154 +59,7 @@
> >  #define DPRINTF(x...)
> >  #endif
> >  
> > -/*
> > - * if_vioreg.h:
> > - */
> > -/* Configuration registers */
> > -#define VIRTIO_NET_CONFIG_MAC		0 /* 8bit x 6byte */
> > -#define VIRTIO_NET_CONFIG_STATUS	6 /* 16bit */
> > -
> > -/* Feature bits */
> > -#define VIRTIO_NET_F_CSUM			(1ULL<<0)
> > -#define VIRTIO_NET_F_GUEST_CSUM			(1ULL<<1)
> > -#define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS        (1ULL<<2)
> > -#define VIRTIO_NET_F_MTU                        (1ULL<<3)
> > -#define VIRTIO_NET_F_MAC			(1ULL<<5)
> > -#define VIRTIO_NET_F_GSO			(1ULL<<6)
> > -#define VIRTIO_NET_F_GUEST_TSO4			(1ULL<<7)
> > -#define VIRTIO_NET_F_GUEST_TSO6			(1ULL<<8)
> > -#define VIRTIO_NET_F_GUEST_ECN			(1ULL<<9)
> > -#define VIRTIO_NET_F_GUEST_UFO			(1ULL<<10)
> > -#define VIRTIO_NET_F_HOST_TSO4			(1ULL<<11)
> > -#define VIRTIO_NET_F_HOST_TSO6			(1ULL<<12)
> > -#define VIRTIO_NET_F_HOST_ECN			(1ULL<<13)
> > -#define VIRTIO_NET_F_HOST_UFO			(1ULL<<14)
> > -#define VIRTIO_NET_F_MRG_RXBUF			(1ULL<<15)
> > -#define VIRTIO_NET_F_STATUS			(1ULL<<16)
> > -#define VIRTIO_NET_F_CTRL_VQ			(1ULL<<17)
> > -#define VIRTIO_NET_F_CTRL_RX			(1ULL<<18)
> > -#define VIRTIO_NET_F_CTRL_VLAN			(1ULL<<19)
> > -#define VIRTIO_NET_F_CTRL_RX_EXTRA		(1ULL<<20)
> > -#define VIRTIO_NET_F_GUEST_ANNOUNCE		(1ULL<<21)
> > -#define VIRTIO_NET_F_MQ				(1ULL<<22)
> > -#define VIRTIO_NET_F_CTRL_MAC_ADDR		(1ULL<<23)
> > -#define VIRTIO_NET_F_HOST_USO			(1ULL<<56)
> > -#define VIRTIO_NET_F_HASH_REPORT		(1ULL<<57)
> > -#define VIRTIO_NET_F_GUEST_HDRLEN		(1ULL<<59)
> > -#define VIRTIO_NET_F_RSS			(1ULL<<60)
> > -#define VIRTIO_NET_F_RSC_EXT			(1ULL<<61)
> > -#define VIRTIO_NET_F_STANDBY			(1ULL<<62)
> > -#define VIRTIO_NET_F_SPEED_DUPLEX		(1ULL<<63)


Everything from here ...

> > -/*
> > - * Config(8) flags. The lowest byte is reserved for generic virtio stuff.
> > - */
> > -
> > -/* Workaround for vlan related bug in qemu < version 2.0 */
> > -#define CONFFLAG_QEMU_VLAN_BUG		(1<<8)
> > -
> > -static const struct virtio_feature_name virtio_net_feature_names[] = {
> > -#if VIRTIO_DEBUG
> > -	{ VIRTIO_NET_F_CSUM,			"CSum" },
> > -	{ VIRTIO_NET_F_GUEST_CSUM,		"GuestCSum" },
> > -	{ VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,	"CtrlGuestOffl" },
> > -	{ VIRTIO_NET_F_MTU,			"MTU", },
> > -	{ VIRTIO_NET_F_MAC,			"MAC" },
> > -	{ VIRTIO_NET_F_GSO,			"GSO" },
> > -	{ VIRTIO_NET_F_GUEST_TSO4,		"GuestTSO4" },
> > -	{ VIRTIO_NET_F_GUEST_TSO6,		"GuestTSO6" },
> > -	{ VIRTIO_NET_F_GUEST_ECN,		"GuestECN" },
> > -	{ VIRTIO_NET_F_GUEST_UFO,		"GuestUFO" },
> > -	{ VIRTIO_NET_F_HOST_TSO4,		"HostTSO4" },
> > -	{ VIRTIO_NET_F_HOST_TSO6,		"HostTSO6" },
> > -	{ VIRTIO_NET_F_HOST_ECN,		"HostECN" },
> > -	{ VIRTIO_NET_F_HOST_UFO,		"HostUFO" },
> > -	{ VIRTIO_NET_F_MRG_RXBUF,		"MrgRXBuf" },
> > -	{ VIRTIO_NET_F_STATUS,			"Status" },
> > -	{ VIRTIO_NET_F_CTRL_VQ,			"CtrlVQ" },
> > -	{ VIRTIO_NET_F_CTRL_RX,			"CtrlRX" },
> > -	{ VIRTIO_NET_F_CTRL_VLAN,		"CtrlVLAN" },
> > -	{ VIRTIO_NET_F_CTRL_RX_EXTRA,		"CtrlRXExtra" },
> > -	{ VIRTIO_NET_F_GUEST_ANNOUNCE,		"GuestAnnounce" },
> > -	{ VIRTIO_NET_F_MQ,			"MQ" },
> > -	{ VIRTIO_NET_F_CTRL_MAC_ADDR,		"CtrlMAC" },
> > -	{ VIRTIO_NET_F_HOST_USO,		"HostUso" },
> > -	{ VIRTIO_NET_F_HASH_REPORT,		"HashRpt" },
> > -	{ VIRTIO_NET_F_GUEST_HDRLEN,		"GuestHdrlen" },
> > -	{ VIRTIO_NET_F_RSS,			"RSS" },
> > -	{ VIRTIO_NET_F_RSC_EXT,			"RSSExt" },
> > -	{ VIRTIO_NET_F_STANDBY,			"Stdby" },
> > -	{ VIRTIO_NET_F_SPEED_DUPLEX,		"SpdDplx" },
> > -#endif
> > -	{ 0,					NULL }
> > -};

... to here should stay in if_vio.c .


> > -
> > -/* Status */
> > -#define VIRTIO_NET_S_LINK_UP	1
> > -
> > -/* Packet header structure */
> > -struct virtio_net_hdr {
> > -	uint8_t		flags;
> > -	uint8_t		gso_type;
> > -	uint16_t	hdr_len;
> > -	uint16_t	gso_size;
> > -	uint16_t	csum_start;
> > -	uint16_t	csum_offset;
> > -
> > -	/* only present if VIRTIO_NET_F_MRG_RXBUF is negotiated */
> > -	uint16_t	num_buffers;
> > -} __packed;
> > -
> > -#define VIRTIO_NET_HDR_F_NEEDS_CSUM	1 /* flags */
> > -#define VIRTIO_NET_HDR_F_DATA_VALID	2 /* flags */
> > -#define VIRTIO_NET_HDR_GSO_NONE		0 /* gso_type */
> > -#define VIRTIO_NET_HDR_GSO_TCPV4	1 /* gso_type */
> > -#define VIRTIO_NET_HDR_GSO_UDP		3 /* gso_type */
> > -#define VIRTIO_NET_HDR_GSO_TCPV6	4 /* gso_type */
> > -#define VIRTIO_NET_HDR_GSO_ECN		0x80 /* gso_type, |'ed */
> > -
> > -#define VIRTIO_NET_MAX_GSO_LEN		(65536+ETHER_HDR_LEN)
> > -
> > -/* Control virtqueue */
> > -struct virtio_net_ctrl_cmd {
> > -	uint8_t	class;
> > -	uint8_t	command;
> > -} __packed;
> > -#define VIRTIO_NET_CTRL_RX		0
> > -# define VIRTIO_NET_CTRL_RX_PROMISC	0
> > -# define VIRTIO_NET_CTRL_RX_ALLMULTI	1
> > -
> > -#define VIRTIO_NET_CTRL_MAC		1
> > -# define VIRTIO_NET_CTRL_MAC_TABLE_SET	0
> > -
> > -#define VIRTIO_NET_CTRL_VLAN		2
> > -# define VIRTIO_NET_CTRL_VLAN_ADD	0
> > -# define VIRTIO_NET_CTRL_VLAN_DEL	1
> > -
> > -#define VIRTIO_NET_CTRL_GUEST_OFFLOADS	5
> > -# define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET	0
> > -
> > -struct virtio_net_ctrl_status {
> > -	uint8_t	ack;
> > -} __packed;
> > -#define VIRTIO_NET_OK			0
> > -#define VIRTIO_NET_ERR			1
> > -
> > -struct virtio_net_ctrl_rx {
> > -	uint8_t	onoff;
> > -} __packed;
> > -
> > -struct virtio_net_ctrl_guest_offloads {
> > -	uint64_t offloads;
> > -} __packed;
> > -
> > -struct virtio_net_ctrl_mac_tbl {
> > -	uint32_t nentries;
> > -	uint8_t macs[][ETHER_ADDR_LEN];
> > -} __packed;
> > -
> > -struct virtio_net_ctrl_vlan {
> > -	uint16_t id;
> > -} __packed;
> > +#include "if_vioreg.h"
> >  
> >  /*
> >   * if_viovar.h:
> > Index: sys/dev/pv/if_vioreg.h
> > ===================================================================
> > RCS file: sys/dev/pv/if_vioreg.h
> > diff -N sys/dev/pv/if_vioreg.h
> > --- /dev/null	1 Jan 1970 00:00:00 -0000
> > +++ sys/dev/pv/if_vioreg.h	16 Aug 2024 08:47:15 -0000
> > @@ -0,0 +1,182 @@
> > +/*	$OpenBSD$	*/
> > +
> > +/*
> > + * Copyright (c) 2012 Stefan Fritsch, Alexander Fiveg.
> > + * Copyright (c) 2010 Minoura Makoto.
> > + * All rights reserved.
> > + *
> > + * Redistribution and use in source and binary forms, with or without
> > + * modification, are permitted provided that the following conditions
> > + * are met:
> > + * 1. Redistributions of source code must retain the above copyright
> > + *    notice, this list of conditions and the following disclaimer.
> > + * 2. Redistributions in binary form must reproduce the above copyright
> > + *    notice, this list of conditions and the following disclaimer in the
> > + *    documentation and/or other materials provided with the distribution.
> > + *
> > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
> > + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
> > + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
> > + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
> > + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> > + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
> > + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> > + */
> > +
> > +/* Configuration registers */
> > +#define VIRTIO_NET_CONFIG_MAC		0 /* 8bit x 6byte */
> > +#define VIRTIO_NET_CONFIG_STATUS	6 /* 16bit */
> > +
> > +/* Feature bits */
> > +#define VIRTIO_NET_F_CSUM			(1ULL<<0)
> > +#define VIRTIO_NET_F_GUEST_CSUM			(1ULL<<1)
> > +#define VIRTIO_NET_F_CTRL_GUEST_OFFLOADS        (1ULL<<2)
> > +#define VIRTIO_NET_F_MTU                        (1ULL<<3)
> > +#define VIRTIO_NET_F_MAC			(1ULL<<5)
> > +#define VIRTIO_NET_F_GSO			(1ULL<<6)
> > +#define VIRTIO_NET_F_GUEST_TSO4			(1ULL<<7)
> > +#define VIRTIO_NET_F_GUEST_TSO6			(1ULL<<8)
> > +#define VIRTIO_NET_F_GUEST_ECN			(1ULL<<9)
> > +#define VIRTIO_NET_F_GUEST_UFO			(1ULL<<10)
> > +#define VIRTIO_NET_F_HOST_TSO4			(1ULL<<11)
> > +#define VIRTIO_NET_F_HOST_TSO6			(1ULL<<12)
> > +#define VIRTIO_NET_F_HOST_ECN			(1ULL<<13)
> > +#define VIRTIO_NET_F_HOST_UFO			(1ULL<<14)
> > +#define VIRTIO_NET_F_MRG_RXBUF			(1ULL<<15)
> > +#define VIRTIO_NET_F_STATUS			(1ULL<<16)
> > +#define VIRTIO_NET_F_CTRL_VQ			(1ULL<<17)
> > +#define VIRTIO_NET_F_CTRL_RX			(1ULL<<18)
> > +#define VIRTIO_NET_F_CTRL_VLAN			(1ULL<<19)
> > +#define VIRTIO_NET_F_CTRL_RX_EXTRA		(1ULL<<20)
> > +#define VIRTIO_NET_F_GUEST_ANNOUNCE		(1ULL<<21)
> > +#define VIRTIO_NET_F_MQ				(1ULL<<22)
> > +#define VIRTIO_NET_F_CTRL_MAC_ADDR		(1ULL<<23)
> > +#define VIRTIO_NET_F_HOST_USO			(1ULL<<56)
> > +#define VIRTIO_NET_F_HASH_REPORT		(1ULL<<57)
> > +#define VIRTIO_NET_F_GUEST_HDRLEN		(1ULL<<59)
> > +#define VIRTIO_NET_F_RSS			(1ULL<<60)
> > +#define VIRTIO_NET_F_RSC_EXT			(1ULL<<61)
> > +#define VIRTIO_NET_F_STANDBY			(1ULL<<62)
> > +#define VIRTIO_NET_F_SPEED_DUPLEX		(1ULL<<63)
> > +/*
> > + * Config(8) flags. The lowest byte is reserved for generic virtio stuff.
> > + */
> > +
> > +/* Workaround for vlan related bug in qemu < version 2.0 */
> > +#define CONFFLAG_QEMU_VLAN_BUG		(1<<8)
> > +
> > +static const struct virtio_feature_name virtio_net_feature_names[] = {
> > +#if VIRTIO_DEBUG
> > +	{ VIRTIO_NET_F_CSUM,			"CSum" },
> > +	{ VIRTIO_NET_F_GUEST_CSUM,		"GuestCSum" },
> > +	{ VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,	"CtrlGuestOffl" },
> > +	{ VIRTIO_NET_F_MTU,			"MTU", },
> > +	{ VIRTIO_NET_F_MAC,			"MAC" },
> > +	{ VIRTIO_NET_F_GSO,			"GSO" },
> > +	{ VIRTIO_NET_F_GUEST_TSO4,		"GuestTSO4" },
> > +	{ VIRTIO_NET_F_GUEST_TSO6,		"GuestTSO6" },
> > +	{ VIRTIO_NET_F_GUEST_ECN,		"GuestECN" },
> > +	{ VIRTIO_NET_F_GUEST_UFO,		"GuestUFO" },
> > +	{ VIRTIO_NET_F_HOST_TSO4,		"HostTSO4" },
> > +	{ VIRTIO_NET_F_HOST_TSO6,		"HostTSO6" },
> > +	{ VIRTIO_NET_F_HOST_ECN,		"HostECN" },
> > +	{ VIRTIO_NET_F_HOST_UFO,		"HostUFO" },
> > +	{ VIRTIO_NET_F_MRG_RXBUF,		"MrgRXBuf" },
> > +	{ VIRTIO_NET_F_STATUS,			"Status" },
> > +	{ VIRTIO_NET_F_CTRL_VQ,			"CtrlVQ" },
> > +	{ VIRTIO_NET_F_CTRL_RX,			"CtrlRX" },
> > +	{ VIRTIO_NET_F_CTRL_VLAN,		"CtrlVLAN" },
> > +	{ VIRTIO_NET_F_CTRL_RX_EXTRA,		"CtrlRXExtra" },
> > +	{ VIRTIO_NET_F_GUEST_ANNOUNCE,		"GuestAnnounce" },
> > +	{ VIRTIO_NET_F_MQ,			"MQ" },
> > +	{ VIRTIO_NET_F_CTRL_MAC_ADDR,		"CtrlMAC" },
> > +	{ VIRTIO_NET_F_HOST_USO,		"HostUso" },
> > +	{ VIRTIO_NET_F_HASH_REPORT,		"HashRpt" },
> > +	{ VIRTIO_NET_F_GUEST_HDRLEN,		"GuestHdrlen" },
> > +	{ VIRTIO_NET_F_RSS,			"RSS" },
> > +	{ VIRTIO_NET_F_RSC_EXT,			"RSSExt" },
> > +	{ VIRTIO_NET_F_STANDBY,			"Stdby" },
> > +	{ VIRTIO_NET_F_SPEED_DUPLEX,		"SpdDplx" },
> > +#endif
> > +	{ 0,					NULL }
> > +};
> > +
> > +/* Status */
> > +#define VIRTIO_NET_S_LINK_UP	1
> > +
> > +/* Packet header structure */
> > +struct virtio_net_hdr {
> > +	uint8_t		flags;
> > +	uint8_t		gso_type;
> > +	uint16_t	hdr_len;
> > +	uint16_t	gso_size;
> > +	uint16_t	csum_start;
> > +	uint16_t	csum_offset;
> > +
> > +	/* only present if VIRTIO_NET_F_MRG_RXBUF is negotiated */
> > +	uint16_t	num_buffers;
> > +} __packed;
> > +
> > +struct virtio_net_hdr_legacy {
> > +	uint8_t		flags;
> > +	uint8_t		gso_type;
> > +	uint16_t	hdr_len;
> > +	uint16_t	gso_size;
> > +	uint16_t	csum_start;
> > +	uint16_t	csum_offset;
> > +} __packed;
> > +
> > +#define VIRTIO_NET_HDR_F_NEEDS_CSUM	1 /* flags */
> > +#define VIRTIO_NET_HDR_F_DATA_VALID	2 /* flags */
> > +#define VIRTIO_NET_HDR_GSO_NONE		0 /* gso_type */
> > +#define VIRTIO_NET_HDR_GSO_TCPV4	1 /* gso_type */
> > +#define VIRTIO_NET_HDR_GSO_UDP		3 /* gso_type */
> > +#define VIRTIO_NET_HDR_GSO_TCPV6	4 /* gso_type */
> > +#define VIRTIO_NET_HDR_GSO_ECN		0x80 /* gso_type, |'ed */
> > +
> > +#define VIRTIO_NET_MAX_GSO_LEN		(65536+ETHER_HDR_LEN)
> > +
> > +/* Control virtqueue */
> > +struct virtio_net_ctrl_cmd {
> > +	uint8_t	class;
> > +	uint8_t	command;
> > +} __packed;
> > +#define VIRTIO_NET_CTRL_RX		0
> > +# define VIRTIO_NET_CTRL_RX_PROMISC	0
> > +# define VIRTIO_NET_CTRL_RX_ALLMULTI	1
> > +
> > +#define VIRTIO_NET_CTRL_MAC		1
> > +# define VIRTIO_NET_CTRL_MAC_TABLE_SET	0
> > +
> > +#define VIRTIO_NET_CTRL_VLAN		2
> > +# define VIRTIO_NET_CTRL_VLAN_ADD	0
> > +# define VIRTIO_NET_CTRL_VLAN_DEL	1
> > +
> > +#define VIRTIO_NET_CTRL_GUEST_OFFLOADS	5
> > +# define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET	0
> > +
> > +struct virtio_net_ctrl_status {
> > +	uint8_t	ack;
> > +} __packed;
> > +#define VIRTIO_NET_OK			0
> > +#define VIRTIO_NET_ERR			1
> > +
> > +struct virtio_net_ctrl_rx {
> > +	uint8_t	onoff;
> > +} __packed;
> > +
> > +struct virtio_net_ctrl_guest_offloads {
> > +	uint64_t offloads;
> > +} __packed;
> > +
> > +struct virtio_net_ctrl_mac_tbl {
> > +	uint32_t nentries;
> > +	uint8_t macs[][ETHER_ADDR_LEN];
> > +} __packed;
> > +
> > +struct virtio_net_ctrl_vlan {
> > +	uint16_t id;
> > +} __packed;
> > Index: sys/net/if_tun.c
> > ===================================================================
> > RCS file: /cvs/src/sys/net/if_tun.c,v
> > diff -u -p -r1.240 if_tun.c
> > --- sys/net/if_tun.c	23 Dec 2023 10:52:54 -0000	1.240
> > +++ sys/net/if_tun.c	16 Aug 2024 12:00:46 -0000
> > @@ -63,8 +63,15 @@
> >  #include <net/rtable.h>
> >  
> >  #include <netinet/in.h>
> > +#include <netinet/ip.h>
> > +#include <netinet/ip6.h>
> > +#include <netinet/tcp.h>
> > +#include <netinet/udp.h>
> >  #include <netinet/if_ether.h>
> >  
> > +#include <dev/pv/virtiovar.h>
> > +#include <dev/pv/if_vioreg.h>
> > +
> >  #include "bpfilter.h"
> >  #if NBPFILTER > 0
> >  #include <net/bpf.h>
> > @@ -92,6 +99,7 @@ struct tun_softc {
> >  	dev_t			sc_dev;
> >  	struct refcnt		sc_refs;
> >  	unsigned int		sc_reading;
> > +	int			sc_vhdrlen;
> >  };
> >  
> >  #ifdef	TUN_DEBUG
> > @@ -779,6 +787,17 @@ tun_dev_ioctl(dev_t dev, u_long cmd, voi
> >  		bcopy(data, sc->sc_ac.ac_enaddr,
> >  		    sizeof(sc->sc_ac.ac_enaddr));
> >  		break;
> > +	case TAPSVNETHDR:
> > +		if (*(int *)data != 0 &&
> > +		    *(int *)data != sizeof(struct virtio_net_hdr) &&
> > +		    *(int *)data != sizeof(struct virtio_net_hdr_legacy)) {
> > +			error = EINVAL;
> > +			break;
> > +		}
> > +
> > +		sc->sc_vhdrlen = *(int *)data;
> > +
> > +		break;
> >  	default:
> >  		error = ENOTTY;
> >  		break;
> > @@ -828,6 +847,17 @@ tun_dev_read(dev_t dev, struct uio *uio,
> >  		bpf_mtap(ifp->if_bpf, m0, BPF_DIRECTION_OUT);
> >  #endif
> >  
> > +	size_t len = ulmin(uio->uio_resid, sc->sc_vhdrlen);
> > +	if (len > 0) {
> > +		struct virtio_net_hdr vhdr;
> > +
> > +		bzero(&vhdr, sizeof(vhdr));
> > +
> > +		error = uiomove(&vhdr, len, uio);
> > +		if (error != 0)
> > +			goto free;
> > +	}
> > +
> >  	m = m0;
> >  	while (uio->uio_resid > 0) {
> >  		size_t len = ulmin(uio->uio_resid, m->m_len);
> > @@ -841,7 +871,7 @@ tun_dev_read(dev_t dev, struct uio *uio,
> >  		if (m == NULL)
> >  			break;
> >  	}
> > -
> > +free:
> >  	m_freem(m0);
> >  
> >  put:
> > @@ -870,6 +900,7 @@ tun_dev_write(dev_t dev, struct uio *uio
> >  	struct tun_softc	*sc;
> >  	struct ifnet		*ifp;
> >  	struct mbuf		*m0;
> > +	struct virtio_net_hdr	vhdr;
> >  	int			error = 0;
> >  	size_t			mlen;
> >  
> > @@ -905,9 +936,31 @@ tun_dev_write(dev_t dev, struct uio *uio
> >  	m0->m_pkthdr.len = m0->m_len = mlen;
> >  	m_adj(m0, align);
> >  
> > +	if (sc->sc_vhdrlen > 0) {
> > +		error = uiomove(&vhdr, sc->sc_vhdrlen, uio);
> > +		if (error != 0)
> > +			goto drop;
> > +	}
> > +
> >  	error = uiomove(mtod(m0, void *), m0->m_len, uio);
> >  	if (error != 0)
> >  		goto drop;
> > +
> > +	if (sc->sc_vhdrlen > 0) {
> > +		if (ISSET(vhdr.flags, VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
> > +			struct ether_extracted ext;
> > +
> > +			ether_extract_headers(m0, &ext);
> > +
> > +			if (ext.tcp) {
> > +				SET(m0->m_pkthdr.csum_flags, M_TCP_CSUM_IN_OK);
> > +				SET(m0->m_pkthdr.csum_flags, M_TCP_CSUM_OUT);
> > +			} else if (ext.udp) {
> > +				SET(m0->m_pkthdr.csum_flags, M_UDP_CSUM_IN_OK);
> > +				SET(m0->m_pkthdr.csum_flags, M_UDP_CSUM_OUT);
> > +			}
> > +		}
> > +	}
> >  
> >  	NET_LOCK();
> >  	if_vinput(ifp, m0);
> > Index: sys/net/if_tun.h
> > ===================================================================
> > RCS file: /cvs/src/sys/net/if_tun.h,v
> > diff -u -p -r1.15 if_tun.h
> > --- sys/net/if_tun.h	6 Feb 2007 10:49:40 -0000	1.15
> > +++ sys/net/if_tun.h	16 Aug 2024 12:00:43 -0000
> > @@ -68,4 +68,7 @@ struct tuninfo {
> >  #define	TUNSDEBUG	_IOW('t', 94, int)
> >  #define	TUNGDEBUG	_IOR('t', 95, int)
> >  
> > +/* ioctl to set the virtio-net header size */
> > +#define TAPSVNETHDR	_IOW('t', 96, int)
> > +
> >  #endif /* _NET_IF_TUN_H_ */
> > Index: usr.sbin/vmd/vionet.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/vionet.c,v
> > diff -u -p -r1.16 vionet.c
> > --- usr.sbin/vmd/vionet.c	12 Jul 2024 14:34:08 -0000	1.16
> > +++ usr.sbin/vmd/vionet.c	16 Aug 2024 12:00:43 -0000
> > @@ -18,6 +18,7 @@
> >   */
> >  #include <sys/socket.h>
> >  #include <sys/types.h>
> > +#include <sys/ioctl.h>
> >  
> >  #include <dev/pci/virtio_pcireg.h>
> >  #include <dev/pv/virtioreg.h>
> > @@ -46,11 +47,6 @@
> >  extern char *__progname;
> >  extern struct vmd_vm *current_vm;
> >  
> > -struct packet {
> > -	uint8_t	*buf;
> > -	size_t	 len;
> > -};
> > -
> >  static void *rx_run_loop(void *);
> >  static void *tx_run_loop(void *);
> >  static int vionet_rx(struct vionet_dev *, int);
> > @@ -102,13 +98,7 @@ vionet_main(int fd, int fd_vmm)
> >  	struct vm_create_params	*vcp;
> >  	ssize_t			 sz;
> >  	int			 ret;
> > -
> > -	/*
> > -	 * stdio - needed for read/write to disk fds and channels to the vm.
> > -	 * vmm + proc - needed to create shared vm mappings.
> > -	 */
> > -	if (pledge("stdio vmm proc", NULL) == -1)
> > -		fatal("pledge");
> > +	int			 vhdrlen = sizeof(struct virtio_net_hdr);
> >  
> >  	/* Initialize iovec arrays. */
> >  	memset(iov_rx, 0, sizeof(iov_rx));
> > @@ -133,6 +123,17 @@ vionet_main(int fd, int fd_vmm)
> >  	    ", vmm fd = %d", __func__, vionet->data_fd, dev.sync_fd,
> >  	    dev.async_fd, fd_vmm);
> >  
> > +	/* Activate virtio_net_hdr for tap(4) device. */
> > +	if (ioctl(vionet->data_fd, TAPSVNETHDR, &vhdrlen) == -1)
> > +		fatal("%s: TAPSVNETHDR option on tap(4) failed", __func__);
> > +
> > +	/*
> > +	 * stdio - needed for read/write to disk fds and channels to the vm.
> > +	 * vmm + proc - needed to create shared vm mappings.
> > +	 */
> > +	if (pledge("stdio vmm proc", NULL) == -1)
> > +		fatal("pledge");
> > +
> >  	/* Receive our vm information from the vm process. */
> >  	memset(&vm, 0, sizeof(vm));
> >  	sz = atomicio(read, dev.sync_fd, &vm, sizeof(vm));
> > @@ -413,6 +414,7 @@ vionet_rx(struct vionet_dev *dev, int fd
> >  
> >  		iov = &iov_rx[0];
> >  		iov_cnt = 1;
> > +		chain_len = 0;
> >  
> >  		/*
> >  		 * First descriptor should be at least as large as the
> > @@ -425,22 +427,9 @@ vionet_rx(struct vionet_dev *dev, int fd
> >  			goto reset;
> >  		}
> >  
> > -		/*
> > -		 * Insert the virtio_net_hdr and adjust len/base. We do the
> > -		 * pointer math here before it's a void*.
> > -		 */
> >  		iov->iov_base = hvaddr_mem(desc->addr, iov->iov_len);
> >  		if (iov->iov_base == NULL)
> >  			goto reset;
> > -		memset(iov->iov_base, 0, sizeof(struct virtio_net_hdr));
> > -
> > -		/* Tweak the iovec to account for the virtio_net_hdr. */
> > -		iov->iov_len -= sizeof(struct virtio_net_hdr);
> > -		iov->iov_base = hvaddr_mem(desc->addr +
> > -		    sizeof(struct virtio_net_hdr), iov->iov_len);
> > -		if (iov->iov_base == NULL)
> > -			goto reset;
> > -		chain_len = iov->iov_len;
> >  
> >  		/*
> >  		 * Walk the remaining chain and collect remaining addresses
> > @@ -491,12 +480,6 @@ vionet_rx(struct vionet_dev *dev, int fd
> >  		if (sz == 0)	/* No packets, so bail out for now. */
> >  			break;
> >  
> > -		/*
> > -		 * Account for the prefixed header since it wasn't included
> > -		 * in the copy or zerocopy operations.
> > -		 */
> > -		sz += sizeof(struct virtio_net_hdr);
> > -
> >  		/* Mark our buffers as used. */
> >  		used->ring[used->idx & VIONET_QUEUE_MASK].id = hdr_idx;
> >  		used->ring[used->idx & VIONET_QUEUE_MASK].len = sz;
> > @@ -533,22 +516,19 @@ vionet_rx_copy(struct vionet_dev *dev, i
> >      int iov_cnt, size_t chain_len)
> >  {
> >  	static uint8_t		 buf[VIONET_HARD_MTU];
> > -	struct packet		*pkt = NULL;
> >  	struct ether_header	*eh = NULL;
> >  	uint8_t			*payload = buf;
> >  	size_t			 i, chunk, nbytes, copied = 0;
> >  	ssize_t			 sz;
> >  
> > -	/* If reading from the tap(4), try to right-size the read. */
> > -	if (fd == dev->data_fd)
> > -		nbytes = MIN(chain_len, VIONET_HARD_MTU);
> > -	else if (fd == pipe_inject[READ])
> > -		nbytes = sizeof(struct packet);
> > -	else {
> > +	if (fd != dev->data_fd && fd != pipe_inject[READ]) {
> >  		log_warnx("%s: invalid fd: %d", __func__, fd);
> >  		return (-1);
> >  	}
> >  
> > +	/* If reading from the tap(4), try to right-size the read. */
> > +	nbytes = MIN(chain_len, VIONET_HARD_MTU);
> > +
> >  	/*
> >  	 * Try to pull a packet. The fd should be non-blocking and we don't
> >  	 * care if we under-read (i.e. sz != nbytes) as we may not have a
> > @@ -560,36 +540,18 @@ vionet_rx_copy(struct vionet_dev *dev, i
> >  			log_warn("%s: error reading packet", __func__);
> >  			return (-1);
> >  		}
> > +		log_warn("%s: EAGAIN", __func__);
> >  		return (0);
> > -	} else if (fd == dev->data_fd && sz < VIONET_MIN_TXLEN) {
> > +	} else if (fd == dev->data_fd && (size_t)sz < VIONET_MIN_TXLEN) {
> >  		/* If reading the tap(4), we should get valid ethernet. */
> >  		log_warnx("%s: invalid packet size", __func__);
> >  		return (0);
> > -	} else if (fd == pipe_inject[READ] && sz != sizeof(struct packet)) {
> > -		log_warnx("%s: invalid injected packet object (sz=%ld)",
> > -		    __func__, sz);
> > -		return (0);
> > -	}
> > -
> > -	/* Decompose an injected packet, if that's what we're working with. */
> > -	if (fd == pipe_inject[READ]) {
> > -		pkt = (struct packet *)buf;
> > -		if (pkt->buf == NULL) {
> > -			log_warnx("%s: invalid injected packet, no buffer",
> > -			    __func__);
> > -			return (0);
> > -		}
> > -		if (sz < VIONET_MIN_TXLEN || sz > VIONET_MAX_TXLEN) {
> > -			log_warnx("%s: invalid injected packet size", __func__);
> > -			goto drop;
> > -		}
> > -		payload = pkt->buf;
> > -		sz = (ssize_t)pkt->len;
> >  	}
> >  
> >  	/* Validate the ethernet header, if required. */
> >  	if (dev->lockedmac) {
> > -		eh = (struct ether_header *)(payload);
> > +		eh = (struct ether_header *)(payload
> > +		    + sizeof(struct virtio_net_hdr));
> >  		if (!ETHER_IS_MULTICAST(eh->ether_dhost) &&
> >  		    memcmp(eh->ether_dhost, dev->mac,
> >  		    sizeof(eh->ether_dhost)) != 0)
> > @@ -610,10 +572,6 @@ vionet_rx_copy(struct vionet_dev *dev, i
> >  	}
> >  
> >  drop:
> > -	/* Free any injected packet buffer. */
> > -	if (pkt != NULL)
> > -		free(pkt->buf);
> > -
> >  	return (copied);
> >  }
> >  
> > @@ -724,7 +682,6 @@ vionet_tx(struct virtio_dev *dev)
> >  	struct virtio_vq_info *vq_info;
> >  	struct ether_header *eh;
> >  	struct iovec *iov;
> > -	struct packet pkt;
> >  	uint8_t status = 0;
> >  
> >  	status = vionet->cfg.device_status
> > @@ -766,17 +723,15 @@ vionet_tx(struct virtio_dev *dev)
> >  		if (iov->iov_len < sizeof(struct virtio_net_hdr)) {
> >  			log_warnx("%s: invalid descriptor length", __func__);
> >  			goto reset;
> > -		} else if (iov->iov_len > sizeof(struct virtio_net_hdr)) {
> > -			/* Chop off the virtio header, leaving packet data. */
> > -			iov->iov_len -= sizeof(struct virtio_net_hdr);
> > -			chain_len += iov->iov_len;
> > -			iov->iov_base = hvaddr_mem(desc->addr +
> > -			    sizeof(struct virtio_net_hdr), iov->iov_len);
> > -			if (iov->iov_base == NULL)
> > -				goto reset;
> > -			iov_cnt++;
> >  		}
> >  
> > +		/* Handel the first descriptor. */
> > +		chain_len += iov->iov_len;
> > +		iov->iov_base = hvaddr_mem(desc->addr, iov->iov_len);
> > +		if (iov->iov_base == NULL)
> > +			goto reset;
> > +		iov_cnt++;
> > +
> >  		/*
> >  		 * Walk the chain and collect remaining addresses and lengths.
> >  		 */
> > @@ -818,6 +773,8 @@ vionet_tx(struct virtio_dev *dev)
> >  		 * for this inspection.
> >  		 */
> >  		iov = &iov_tx[0];
> > +		if (iov->iov_len == sizeof(struct virtio_net_hdr))
> > +			iov = &iov_tx[1];
> >  		if (vionet->lockedmac) {
> >  			if (iov->iov_len < ETHER_HDR_LEN) {
> >  				log_warnx("%s: insufficient header data",
> > @@ -839,6 +796,7 @@ vionet_tx(struct virtio_dev *dev)
> >  			if (dhcpsz > 0) {
> >  				log_debug("%s: detected dhcp request of %zu bytes",
> >  				    __func__, dhcpsz);
> > +				sz = chain_len;
> >  				goto drop;
> >  			}
> >  		}
> > @@ -859,20 +817,23 @@ drop:
> >  
> >  		/* Facilitate DHCP reply injection, if needed. */
> >  		if (dhcpsz > 0) {
> > -			pkt.buf = dhcppkt;
> > -			pkt.len = dhcpsz;
> > -			sz = write(pipe_inject[WRITE], &pkt, sizeof(pkt));
> > -			if (sz == -1 && errno != EAGAIN) {
> > +			struct virtio_net_hdr vhdr = { 0 };
> > +			struct iovec iovec[2] = {
> > +				{ .iov_base = &vhdr, .iov_len = sizeof(vhdr) },
> > +				{ .iov_base = dhcppkt, .iov_len = dhcpsz }
> > +			};
> > +
> > +			sz = writev(pipe_inject[WRITE], iovec, 2);
> > +
> > +			if (sz == -1 && errno != EAGAIN)
> >  				log_warn("%s: packet injection", __func__);
> > -				free(pkt.buf);
> > -			} else if (sz == -1 && errno == EAGAIN) {
> > +			else if (sz == -1 && errno == EAGAIN)
> >  				log_debug("%s: dropping dhcp reply", __func__);
> > -				free(pkt.buf);
> > -			} else if (sz != sizeof(pkt)) {
> > +			else if (sz != (ssize_t)(sizeof(vhdr) + dhcpsz))
> >  				log_warnx("%s: failed packet injection",
> >  				    __func__);
> > -				free(pkt.buf);
> > -			}
> > +
> > +			free(dhcppkt);
> >  			log_debug("%s: injected dhcp reply with %ld bytes",
> >  			    __func__, sz);
> >  		}
> > Index: usr.sbin/vmd/virtio.c
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/virtio.c,v
> > diff -u -p -r1.115 virtio.c
> > --- usr.sbin/vmd/virtio.c	10 Jul 2024 09:27:33 -0000	1.115
> > +++ usr.sbin/vmd/virtio.c	16 Aug 2024 12:00:46 -0000
> > @@ -60,6 +60,7 @@ SLIST_HEAD(virtio_dev_head, virtio_dev) 
> >  
> >  #define MAXPHYS	(64 * 1024)	/* max raw I/O transfer size */
> >  
> > +#define VIRTIO_NET_F_CSUM	(1<<0)
> >  #define VIRTIO_NET_F_MAC	(1<<5)
> >  
> >  #define VMMCI_F_TIMESYNC	(1<<0)
> > @@ -584,6 +585,9 @@ virtio_init(struct vmd_vm *vm, int child
> >  			/* MAC address has been assigned by the parent */
> >  			memcpy(&dev->vionet.mac, &vmc->vmc_macs[i], 6);
> >  			dev->vionet.cfg.device_feature = VIRTIO_NET_F_MAC;
> > +
> > +			/* enable checksum offloading */
> > +			dev->vionet.cfg.device_feature |= VIRTIO_NET_F_CSUM;
> >  
> >  			dev->vionet.lockedmac =
> >  			    vmc->vmc_ifflags[i] & VMIFF_LOCKED ? 1 : 0;
> > Index: usr.sbin/vmd/virtio.h
> > ===================================================================
> > RCS file: /cvs/src/usr.sbin/vmd/virtio.h,v
> > diff -u -p -r1.52 virtio.h
> > --- usr.sbin/vmd/virtio.h	10 Jul 2024 09:27:33 -0000	1.52
> > +++ usr.sbin/vmd/virtio.h	16 Aug 2024 12:00:43 -0000
> > @@ -49,7 +49,7 @@
> >  
> >  /* Virtio network device is backed by tap(4), so inherit limits */
> >  #define VIONET_HARD_MTU		TUNMRU
> > -#define VIONET_MIN_TXLEN	ETHER_HDR_LEN
> > +#define VIONET_MIN_TXLEN	(sizeof(struct virtio_net_hdr) + ETHER_HDR_LEN)
> >  #define VIONET_MAX_TXLEN	VIONET_HARD_MTU + ETHER_HDR_LEN
> >  
> >  /* VMM Control Interface shutdown timeout (in seconds) */
> 
>