Download raw body.
ipv4 icmp_reflect() source address selection optimisation
On Tue, Aug 06, 2024 at 04:57:33PM +0200, Alexander Bluhm wrote:
> On Thu, Jul 25, 2024 at 02:55:39PM +1000, David Gwynne wrote:
> > this changes icmp_reflect() so it doesn't do work that ip_output is
> > going to do for it anyway.
> >
> > if the original packet we need to send some icmp about was for us,
> > we can send the icmp packet back to the sender from our local address.
> > however, if we're a router we need to pick one of our own ips to send
> > that reply from. currently icmp_reflect fills in the icmp packets
> > src ip by doing a route lookup for the new dst address and using
> > the local ip from the route entry.
> >
> > however, picking a local source address is not an icmp specific
> > problem. other parts of the stack solve this problem by leaving the
> > src ip unset (0.0.0.0) and ip_forward fixes it up using the address
> > on the route entry made against the dst ip.
> >
> > ip_output has to do the route lookup anyway, so save a bit of effort by
> > not doing it in icmp_reflect.
> >
> > this also has the benefit that there's one less place implementing
> > this semantic, so there's less to fix up if we ever want to tweak
> > source address selection based on route lookups.
> >
> > ok?
>
> This change breaks regress/sys/netinet/ipsec.
>
> But not IPsec itself is broken, just the behavior changes. I had
> to add an explicit flow for ICMP to test path MTU. Now the source
> address of the ICMP packet is different and the packet does not
> match anymore.
>
> Of course it would be easy to adapt the test. But is the new
> behavior better? You code selects the first address of the interface
> where the original packet arrived. Before the address associated
> with the route was used. In IPv4 we do source address selection
> based on routing table. Using the local IP address of the route
> the ICMP packet goes to or the original packet came from makes more
> sense to me than to select the first address of the interface.
(in my opinion) the way IPsec is integrated into the network stack, and
the SPD in particular, was a mistake. anyone using it for anything
except the absolute simplest setups has to be prepared for "interesting
times".
this means in terms of stack features it's toward the bottom of my
list of priorities. im not saying i don't care about it, just that
if i have to choose between ipsec and the other parts of the stack
like routing, pf, tunnel interfaces, etc, i'll prefer to make the
other things work better cos ipsec is annoying and complicated by
design. i might be^W^Wam biased, but id argue that complicated ipsec
setups would benefit from migrating to sec(4) and using routes and
pf for policy.
tl;dr: i believe this change would let us simplify pf_route (the
loopback ip handling specifically), so it's worth it.
> bluhm
>
> > Index: ip_icmp.c
> > ===================================================================
> > RCS file: /cvs/src/sys/netinet/ip_icmp.c,v
> > diff -u -p -r1.196 ip_icmp.c
> > --- ip_icmp.c 14 Jul 2024 18:53:39 -0000 1.196
> > +++ ip_icmp.c 25 Jul 2024 04:53:21 -0000
> > @@ -684,7 +684,7 @@ icmp_reflect(struct mbuf *m, struct mbuf
> > struct ip *ip = mtod(m, struct ip *);
> > struct mbuf *opts = NULL;
> > struct sockaddr_in sin;
> > - struct rtentry *rt = NULL;
> > + struct in_addr ip_src = { INADDR_ANY };
> > int optlen = (ip->ip_hl << 2) - sizeof(struct ip);
> > u_int rtableid;
> > u_int8_t pfflags;
> > @@ -710,49 +710,33 @@ icmp_reflect(struct mbuf *m, struct mbuf
> > * If the incoming packet was addressed directly to us,
> > * use dst as the src for the reply. For broadcast, use
> > * the address which corresponds to the incoming interface.
> > + * ip_output() will fix up ip_src if it's still 0.0.0.0
> > + * after this.
> > */
> > if (ia == NULL) {
> > + struct rtentry *rt;
> > +
> > memset(&sin, 0, sizeof(sin));
> > sin.sin_len = sizeof(sin);
> > sin.sin_family = AF_INET;
> > sin.sin_addr = ip->ip_dst;
> >
> > rt = rtalloc(sintosa(&sin), 0, rtableid);
> > - if (rtisvalid(rt) &&
> > - ISSET(rt->rt_flags, RTF_LOCAL|RTF_BROADCAST))
> > - ia = ifatoia(rt->rt_ifa);
> > - }
> > -
> > - /*
> > - * The following happens if the packet was not addressed to us.
> > - * Use the new source address and do a route lookup. If it fails
> > - * drop the packet as there is no path to the host.
> > - */
> > - if (ia == NULL) {
> > - rtfree(rt);
> > -
> > - memset(&sin, 0, sizeof(sin));
> > - sin.sin_len = sizeof(sin);
> > - sin.sin_family = AF_INET;
> > - sin.sin_addr = ip->ip_src;
> > -
> > - /* keep packet in the original virtual instance */
> > - rt = rtalloc(sintosa(&sin), RT_RESOLVE, rtableid);
> > - if (rt == NULL) {
> > - ipstat_inc(ips_noroute);
> > - m_freem(m);
> > - return (EHOSTUNREACH);
> > + if (rtisvalid(rt)) {
> > + if (ISSET(rt->rt_flags, RTF_LOCAL))
> > + ip_src = ip->ip_dst;
> > + else if (ISSET(rt->rt_flags, RTF_BROADCAST)) {
> > + ia = ifatoia(rt->rt_ifa);
> > + ip_src = ia->ia_addr.sin_addr;
> > + }
> > }
> > -
> > - ia = ifatoia(rt->rt_ifa);
> > - }
> > + rtfree(rt);
> > + } else
> > + ip_src = ia->ia_addr.sin_addr;
> >
> > ip->ip_dst = ip->ip_src;
> > + ip->ip_src = ip_src;
> > ip->ip_ttl = MAXTTL;
> > -
> > - /* It is safe to dereference ``ia'' iff ``rt'' is valid. */
> > - ip->ip_src = ia->ia_addr.sin_addr;
> > - rtfree(rt);
> >
> > if (optlen > 0) {
> > u_char *cp;
ipv4 icmp_reflect() source address selection optimisation