From: Alexander Bluhm <bluhm@openbsd.org>
Subject: Re: ip6 forward mcopy
To: tech@openbsd.org
Date: Fri, 12 Jul 2024 11:56:31 +0200

On Mon, Jun 24, 2024 at 11:33:44AM +0200, Alexander Bluhm wrote:
> Forwarding IPv6 packets is slower than IPv4.  Reason is the m_copym()
> that is done for every packet.  Just in case we may have to send
> an ICMP6 packet, ip6_forward() creates a mbuf copy.  After that
> mbuf cluster is read only, so for the ethernet header another mbuf
> is allocated.  pf NAT and RDR ignores readonly clusters, so it also
> modifies the potential ICMP6 packet.
> 
> IPv4 ip_forward() avoids all these problems by copying the leading
> 68 bytes of the original packets onto the stack.  More is not need
> for ICMP.  IPv6 RFC requires up to 1232 bytes in the ICMP6 packet.
> This cannot be copied to the stack.
> 
> The reason for the difference is that the ICMP6 packet has to contain
> the full header chain.  I think if we have a simple UDP or TCP
> packet without chain, we could do a shortcut and just copy the
> header to ICMP6 packet.
> 
> Do we want to violate the RFC and gain 30% performance for the
> common TCP/UDP case?

Part of the diff have been commited.  For small packets we use stack
memory, large packets need extra mbuf allocation.

Following diff truncates ICMP6 packtes to a reasonable length if
the orignal packets has a final protocol header directly after the
IPv6 header.  My list contains TCP, UDP, ESP as they cover the
common cases and anything behind it should not be needed for path
MTU discovery.

OK to violate RFC for performance and simplicity?

bluhm

Index: netinet6/ip6_forward.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_forward.c,v
diff -u -p -r1.121 ip6_forward.c
--- netinet6/ip6_forward.c	9 Jul 2024 09:33:13 -0000	1.121
+++ netinet6/ip6_forward.c	12 Jul 2024 09:27:23 -0000
@@ -145,10 +145,33 @@ ip6_forward(struct mbuf *m, struct route
 	 * Thanks to M_EXT, in most cases copy will not occur.
 	 * For small packets copy original onto stack instead of mbuf.
 	 *
+	 * For final protocol header like TCP or UDP, full header chain in
+	 * ICMP6 packet is not necessary.  In this case only copy small
+	 * part of original packet and save it on stack instead of mbuf.
+	 * Although this violates RFC 4443 2.4. (c), it avoids additional
+	 * mbuf allocations.  Also pf nat and rdr do not affect the shared
+	 * mbuf cluster.
+	 *
 	 * It is important to save it before IPsec processing as IPsec
 	 * processing may modify the mbuf.
 	 */
-	icmp_len = min(m->m_pkthdr.len, ICMPV6_PLD_MAXLEN);
+	switch (ip6->ip6_nxt) {
+	case IPPROTO_TCP:
+		icmp_len = sizeof(struct ip6_hdr) + sizeof(struct tcphdr) +
+		    MAX_TCPOPTLEN;
+		break;
+	case IPPROTO_UDP:
+		icmp_len = sizeof(struct ip6_hdr) + sizeof(struct udphdr);
+		break;
+	case IPPROTO_ESP:
+		icmp_len = sizeof(struct ip6_hdr) + 2 * sizeof(u_int32_t);
+		break;
+	default:
+		icmp_len = ICMPV6_PLD_MAXLEN;
+		break;
+	}
+	if (icmp_len > m->m_pkthdr.len)
+		icmp_len = m->m_pkthdr.len;
 	if (icmp_len <= sizeof(icmp_buf)) {
 		mflags = m->m_flags;
 		pfflags = m->m_pkthdr.pf.flags;