From: Alexandr Nedvedicky Subject: Re: ipv4 icmp_reflect() source address selection optimisation To: David Gwynne Cc: tech@openbsd.org Date: Mon, 5 Aug 2024 10:39:24 +0200 Hello, On Mon, Jul 29, 2024 at 09:21:25AM +1000, David Gwynne wrote: > > after talking to claudio, i think the actual source address selection in > icmp_reflect can be improved. > > this diff still delegates setting the source address based on a > destination route lookup to ip_output(), but it adds some extra attempts > in icmp_reflect() itself to pick a good source address. > > if original packet is for a local address on the host, then we use the > local address as the source for the icmp reflected packet, as we do now. > > however, if we're routing the packet, the way we pick a source address > now has some extra steps. firstly, we try and use a "route sourceaddr" > as we do for socket addresses. this means if we have a link local or > host route for the icmp packet destination we prefer to use that, and > then fall through to using the address configured with "route > sourceaddr" next. if that doesn't exist then we try and use an ip > address from the interface the icmp packet was received on. if all that > fails, we let ip_output set it. > > why do we care about this when the status quo has been fine for so long > you ask? > > our stack is a lot more complicated now, and things like pf and ipsec > mean that it is possible to reroute packets that would otherwise be > dropped. the pf_forward regress test demonstrates some of them, but the > most interesting one is when we generate an icmp reply that's supposed > to head toward a blackholed prefix, but pf reply-to tries to fix it up. > blackhole routes are only really supported on loopback interfaces, which > means pretty much every icmp reflected packet ends up using the local ip > on lo0. without pf, these replies would be dropped, but with reply-to we > suddnely want to send an icmp packet with the local ip from lo0, which > is 127.0.0.1, which also should never appear on the wire. > > these changes mean we try a lot harder to use a usable ip as the source > address on icmp reflected packets, and will hopefully let us clean up a > fixup in pf_route for handling packets that were generated with loopback > ips on them. > it all make sense. It's definitely an improvement to what we have. I have one comment/question to your diff. see in-line. > > Index: ip_icmp.c > =================================================================== > RCS file: /cvs/src/sys/netinet/ip_icmp.c,v > diff -u -p -r1.196 ip_icmp.c > --- ip_icmp.c 14 Jul 2024 18:53:39 -0000 1.196 > +++ ip_icmp.c 28 Jul 2024 22:59:21 -0000 > @@ -684,7 +684,8 @@ icmp_reflect(struct mbuf *m, struct mbuf > struct ip *ip = mtod(m, struct ip *); > struct mbuf *opts = NULL; > struct sockaddr_in sin; > - struct rtentry *rt = NULL; > + struct rtentry *rt; > + struct in_addr ip_src = { INADDR_ANY }; > int optlen = (ip->ip_hl << 2) - sizeof(struct ip); > u_int rtableid; > u_int8_t pfflags; > @@ -701,10 +702,6 @@ icmp_reflect(struct mbuf *m, struct mbuf > return (ELOOP); > } > rtableid = m->m_pkthdr.ph_rtableid; > - pfflags = m->m_pkthdr.pf.flags; > - m_resethdr(m); > - m->m_pkthdr.ph_rtableid = rtableid; > - m->m_pkthdr.pf.flags = pfflags & PF_TAG_GENERATED; > > /* > * If the incoming packet was addressed directly to us, > @@ -718,19 +715,24 @@ icmp_reflect(struct mbuf *m, struct mbuf > sin.sin_addr = ip->ip_dst; > > rt = rtalloc(sintosa(&sin), 0, rtableid); > - if (rtisvalid(rt) && > - ISSET(rt->rt_flags, RTF_LOCAL|RTF_BROADCAST)) > - ia = ifatoia(rt->rt_ifa); > - } > + if (rtisvalid(rt)) { > + if (ISSET(rt->rt_flags, RTF_LOCAL)) > + ip_src = ip->ip_dst; > + else if (ISSET(rt->rt_flags, RTF_BROADCAST)) { > + ia = ifatoia(rt->rt_ifa); > + ip_src = ia->ia_addr.sin_addr; > + } > + } > + rtfree(rt); > + } else > + ip_src = ia->ia_addr.sin_addr; > I would expect to find ip_src address in ->ia_list. Are we sure ia_addr holds IP address attached to interface? From seeing the places where we call icmp_reflect() this else branch seems to handle some MPLS case. So there is a chance you code is still right and I miss something here. thanks and regards sashan