Index | Thread | Search

From:
David Gwynne <david@gwynne.id.au>
Subject:
Re: ipv4 icmp_reflect() source address selection optimisation
To:
tech@openbsd.org
Date:
Mon, 29 Jul 2024 09:21:25 +1000

Download raw body.

Thread
On Thu, Jul 25, 2024 at 02:55:39PM +1000, David Gwynne wrote:
> this changes icmp_reflect() so it doesn't do work that ip_output is
> going to do for it anyway.
> 
> if the original packet we need to send some icmp about was for us,
> we can send the icmp packet back to the sender from our local address.
> however, if we're a router we need to pick one of our own ips to send
> that reply from. currently icmp_reflect fills in the icmp packets
> src ip by doing a route lookup for the new dst address and using
> the local ip from the route entry.
> 
> however, picking a local source address is not an icmp specific
> problem. other parts of the stack solve this problem by leaving the
> src ip unset (0.0.0.0) and ip_forward fixes it up using the address
> on the route entry made against the dst ip.
> 
> ip_output has to do the route lookup anyway, so save a bit of effort by
> not doing it in icmp_reflect.
> 
> this also has the benefit that there's one less place implementing
> this semantic, so there's less to fix up if we ever want to tweak
> source address selection based on route lookups.
> 
> ok?
>

after talking to claudio, i think the actual source address selection in
icmp_reflect can be improved.

this diff still delegates setting the source address based on a
destination route lookup to ip_output(), but it adds some extra attempts
in icmp_reflect() itself to pick a good source address.

if original packet is for a local address on the host, then we use the
local address as the source for the icmp reflected packet, as we do now.

however, if we're routing the packet, the way we pick a source address
now has some extra steps. firstly, we try and use a "route sourceaddr"
as we do for socket addresses. this means if we have a link local or
host route for the icmp packet destination we prefer to use that, and
then fall through to using the address configured with "route
sourceaddr" next. if that doesn't exist then we try and use an ip
address from the interface the icmp packet was received on. if all that
fails, we let ip_output set it.

why do we care about this when the status quo has been fine for so long
you ask? 

our stack is a lot more complicated now, and things like pf and ipsec
mean that it is possible to reroute packets that would otherwise be
dropped. the pf_forward regress test demonstrates some of them, but the
most interesting one is when we generate an icmp reply that's supposed
to head toward a blackholed prefix, but pf reply-to tries to fix it up.
blackhole routes are only really supported on loopback interfaces, which
means pretty much every icmp reflected packet ends up using the local ip
on lo0. without pf, these replies would be dropped, but with reply-to we
suddnely want to send an icmp packet with the local ip from lo0, which
is 127.0.0.1, which also should never appear on the wire.

these changes mean we try a lot harder to use a usable ip as the source
address on icmp reflected packets, and will hopefully let us clean up a
fixup in pf_route for handling packets that were generated with loopback
ips on them.

ok?

Index: ip_icmp.c
===================================================================
RCS file: /cvs/src/sys/netinet/ip_icmp.c,v
diff -u -p -r1.196 ip_icmp.c
--- ip_icmp.c	14 Jul 2024 18:53:39 -0000	1.196
+++ ip_icmp.c	28 Jul 2024 22:59:21 -0000
@@ -684,7 +684,8 @@ icmp_reflect(struct mbuf *m, struct mbuf
 	struct ip *ip = mtod(m, struct ip *);
 	struct mbuf *opts = NULL;
 	struct sockaddr_in sin;
-	struct rtentry *rt = NULL;
+	struct rtentry *rt;
+	struct in_addr ip_src = { INADDR_ANY };
 	int optlen = (ip->ip_hl << 2) - sizeof(struct ip);
 	u_int rtableid;
 	u_int8_t pfflags;
@@ -701,10 +702,6 @@ icmp_reflect(struct mbuf *m, struct mbuf
 		return (ELOOP);
 	}
 	rtableid = m->m_pkthdr.ph_rtableid;
-	pfflags = m->m_pkthdr.pf.flags;
-	m_resethdr(m);
-	m->m_pkthdr.ph_rtableid = rtableid;
-	m->m_pkthdr.pf.flags = pfflags & PF_TAG_GENERATED;
 
 	/*
 	 * If the incoming packet was addressed directly to us,
@@ -718,19 +715,24 @@ icmp_reflect(struct mbuf *m, struct mbuf
 		sin.sin_addr = ip->ip_dst;
 
 		rt = rtalloc(sintosa(&sin), 0, rtableid);
-		if (rtisvalid(rt) &&
-		    ISSET(rt->rt_flags, RTF_LOCAL|RTF_BROADCAST))
-			ia = ifatoia(rt->rt_ifa);
-	}
+		if (rtisvalid(rt)) {
+			if (ISSET(rt->rt_flags, RTF_LOCAL))
+				ip_src = ip->ip_dst;
+			else if (ISSET(rt->rt_flags, RTF_BROADCAST)) {
+				ia = ifatoia(rt->rt_ifa);
+				ip_src = ia->ia_addr.sin_addr;
+			}
+		}
+		rtfree(rt);
+	} else
+		ip_src = ia->ia_addr.sin_addr;
 
 	/*
 	 * The following happens if the packet was not addressed to us.
-	 * Use the new source address and do a route lookup. If it fails
-	 * drop the packet as there is no path to the host.
+	 * If we're directly connected use the closest address, otherwise
+	 * try to use the sourceaddr from the routing table.
 	 */
-	if (ia == NULL) {
-		rtfree(rt);
-
+	if (ip_src.s_addr == INADDR_ANY) {
 		memset(&sin, 0, sizeof(sin));
 		sin.sin_len = sizeof(sin);
 		sin.sin_family = AF_INET;
@@ -738,21 +740,56 @@ icmp_reflect(struct mbuf *m, struct mbuf
 
 		/* keep packet in the original virtual instance */
 		rt = rtalloc(sintosa(&sin), RT_RESOLVE, rtableid);
-		if (rt == NULL) {
-			ipstat_inc(ips_noroute);
-			m_freem(m);
-			return (EHOSTUNREACH);
+		if (rtisvalid(rt) &&
+		    ISSET(rt->rt_flags, RTF_LLINFO|RTF_HOST)) {
+			ia = ifatoia(rt->rt_ifa);
+			ip_src = ia->ia_addr.sin_addr;
+		} else {
+			struct sockaddr *sourceaddr;
+			struct ifaddr *ifa;
+
+			sourceaddr = rtable_getsource(rtableid, AF_INET);
+			if (sourceaddr != NULL) {
+				ifa = ifa_ifwithaddr(sourceaddr, rtableid);
+				if (ifa != NULL &&
+				    ISSET(ifa->ifa_ifp->if_flags, IFF_UP))
+					ip_src = satosin(sourceaddr)->sin_addr;
+			}
 		}
+		rtfree(rt);
+	}
 
-		ia = ifatoia(rt->rt_ifa);
+	/*
+	 * If the above didn't find an ip_src, get the IP of the
+	 * interface the original packet was received on. If all this
+	 * comes up with nothing, ip_output() will try and fill it
+	 * in for us.
+	 */
+	if (ip_src.s_addr == INADDR_ANY) {
+		struct ifnet *ifp;
+		struct ifaddr *ifa;
+
+		ifp = if_get(m->m_pkthdr.ph_ifidx);
+		if (ifp != NULL) {
+			TAILQ_FOREACH(ifa, &ifp->if_addrlist, ifa_list) {
+				if (ifa->ifa_addr->sa_family != AF_INET)
+					continue;
+
+				ip_src = satosin(ifa->ifa_addr)->sin_addr;
+				break;
+			}
+		}
+		if_put(ifp);
 	}
 
+	pfflags = m->m_pkthdr.pf.flags;
+
+	m_resethdr(m);
+	m->m_pkthdr.ph_rtableid = rtableid;
+	m->m_pkthdr.pf.flags = pfflags & PF_TAG_GENERATED;
 	ip->ip_dst = ip->ip_src;
+	ip->ip_src = ip_src;
 	ip->ip_ttl = MAXTTL;
-
-	/* It is safe to dereference ``ia'' iff ``rt'' is valid. */
-	ip->ip_src = ia->ia_addr.sin_addr;
-	rtfree(rt);
 
 	if (optlen > 0) {
 		u_char *cp;