Index | Thread | Search

From:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject:
Re: route reject/blackhole semantics
To:
Alexander Bluhm <bluhm@openbsd.org>
Cc:
David Gwynne <david@gwynne.id.au>, tech@openbsd.org
Date:
Tue, 30 Jul 2024 08:36:11 +0200

Download raw body.

Thread
On Mon, Jul 29, 2024 at 05:15:56PM +0200, Alexander Bluhm wrote:
> On Mon, Jul 29, 2024 at 09:59:22AM +1000, David Gwynne wrote:
> > every few years i am surprised to learn that the RTF_REJECT and
> > RTF_BLACKHOLE flags on routes are only really checked when going
> > "out" on loopback interfaces. my expectation is that these route
> > checks happen when the route lookup happens. it's especially
> > surprising that routes with reject/blackhole flags can make it into
> > pf.
> 
> The lookup happens way before as an optmization.  As we need to
> know whether an address is local, we do route lookup in IP input.
> But the reject info is needed when we decide whether we should use
> the route for output.

It is still crazy that the reject or blackhole happens inside lo(4) that
is later than necessary.
 
> > this diff moves the reject/blackhole route checks into ip_output before
> > pf_test is called.
> 
> But this prevents that pf route-to can override a route.  Does my
> pf_forward regress test still pass with this change?  My understanding
> of route-to is that it can work independently of IP output routing

I think there should be a limit in what pf can screw around with. As an
example if you have no default route and no route to 10.0.0.1. Then
ip_forward will not pass the packet to pf_test in ip_output.
If you have a 10/8 reject or blackhole route suddenly 10.0.0.1 will show
up in pf_test in ip_output. Which is in my opinion very surprising and
wrong.

If you route to on an in rule it still works and it does in both cases.
 
> > i am not alone in my confusion. claudio@ seems keen for this change, but
> > also because he thinks it can help simplify handling of blackhole/reject
> > routing in things like bgpd, but mostly that it is confusing.
> 
> BSD4.4 seems to use RTF_REJECT only for routes without valid ARP
> entry.  Who know what he semantics of reject routes should be today.

If you run a router in the default free zone you need either some reject
or blackhole routes. I normaly use blackhole sind I see no point in
sending back ICMP errors to not existing net blocks.
 
> > because the stack checks those flags, lo(4) output doesnt have to
> > anymore. however, if you do want to "null route" a prefix so it usually
> > wont end up on the wire, but give pf a chance to fiddle with it, you can
> > still add a route via lo(4) for it. which leads to a bug fix.
> 
> Is it still possible to divert-to an address that was set with
> SO_BINDANY?  They do not have a RTF_LOCAL route.  There are a lot
> of corner cases in this area that should be considered.

Guess we need to look into those.
 
> > you can add routes like this (even without the reject/blackhole diffs):
> > 
> > # route add 100.64.0.0/24 127.0.0.1
> > 
> > if you then ping 100.64.0.1, you'll end up producing a traffic loop
> > inside the kernel. each ping packet will loop through lo0 until the ttl
> > expires. i think lo ouput should check that the address it's delivering
> > to has RTF_LOCAL set, which is what the if_loop.c chunk of the diff
> > does.
> > 
> > thoughts? tests? ok?
> 
> Why did you not change ip6_forward()?  Now it behaves differently
> than ip_forward() and ip6_output().  It is a typical pitfall that
> IPv4 forward calls IP output, but IPv6 forward calls interface
> output.
> 
> To me this looks very risky.  I assume that I will see fallout when
> our customers will install our product based on this diff in more
> than a year.  Usually I write regress tests to be aware of changes
> and keep behavior.
> 
> What are the benefits of the change?  Apart that the behavior is
> differnt and that it confuses you less?

We do not fully process a packet to drop it on the floor. Which is a big
deal when you blackhole a route to sink an attack.
I also think the behaviour is a lot better and will allow to make pf
route-to and reply-to to behave less crazy. There is no more needs to
fixup ICMP packets from 127.0.0.1. I think there are numerous cases that
improve the situation and thing that break currently work mostly by luck.
 
> bluhm
> 
> > Index: net/if_loop.c
> > ===================================================================
> > RCS file: /cvs/src/sys/net/if_loop.c,v
> > diff -u -p -r1.98 if_loop.c
> > --- net/if_loop.c	29 Dec 2023 11:43:04 -0000	1.98
> > +++ net/if_loop.c	28 Jul 2024 11:58:10 -0000
> > @@ -261,10 +261,10 @@ looutput(struct ifnet *ifp, struct mbuf 
> >  	if ((m->m_flags & M_PKTHDR) == 0)
> >  		panic("%s: no header mbuf", __func__);
> >  
> > -	if (rt && rt->rt_flags & (RTF_REJECT|RTF_BLACKHOLE)) {
> > +	if (rt != NULL && !ISSET(rt->rt_flags, RTF_LOCAL)) {
> >  		m_freem(m);
> > -		return (rt->rt_flags & RTF_BLACKHOLE ? 0 :
> > -			rt->rt_flags & RTF_HOST ? EHOSTUNREACH : ENETUNREACH);
> > +		return (ISSET(rt->rt_flags, RTF_HOST) ?
> > +		    EHOSTUNREACH : ENETUNREACH);
> >  	}
> >  
> >  	/*
> > Index: netinet/ip_output.c
> > ===================================================================
> > RCS file: /cvs/src/sys/netinet/ip_output.c,v
> > diff -u -p -r1.401 ip_output.c
> > --- netinet/ip_output.c	2 Jul 2024 18:33:47 -0000	1.401
> > +++ netinet/ip_output.c	28 Jul 2024 11:58:10 -0000
> > @@ -398,6 +398,20 @@ sendit:
> >  	}
> >  #endif /* IPSEC */
> >  
> > +	if (ro && ro->ro_rt) {
> > +		struct rtentry *rt = ro->ro_rt;
> > +
> > +		if (ISSET(rt->rt_flags, RTF_REJECT)) {
> > +			error = ISSET(rt->rt_flags, RTF_HOST) ?
> > +			    EHOSTUNREACH : ENETUNREACH;
> > +			goto bad;
> > +		}
> > +		if (ISSET(rt->rt_flags, RTF_BLACKHOLE)) {
> > +			error = 0;
> > +			goto bad;
> > +		}
> > +	}
> > +
> >  	/*
> >  	 * Packet filter
> >  	 */
> > Index: netinet6/ip6_output.c
> > ===================================================================
> > RCS file: /cvs/src/sys/netinet6/ip6_output.c,v
> > diff -u -p -r1.292 ip6_output.c
> > --- netinet6/ip6_output.c	4 Jul 2024 12:50:08 -0000	1.292
> > +++ netinet6/ip6_output.c	28 Jul 2024 11:58:10 -0000
> > @@ -483,9 +483,20 @@ reroute:
> >  		route6_cache(ro, &ip6->ip6_dst, NULL, m->m_pkthdr.ph_rtableid);
> >  	}
> >  
> > -	if (rt && (rt->rt_flags & RTF_GATEWAY) &&
> > -	    !IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst))
> > -		dst = satosin6(rt->rt_gateway);
> > +	if (rt != NULL) {
> > +		if (ISSET(rt->rt_flags, RTF_REJECT)) {
> > +			error = ISSET(rt->rt_flags, RTF_HOST) ?
> > +			    EHOSTUNREACH : ENETUNREACH;
> > +			goto bad;
> > +		}
> > +		if (ISSET(rt->rt_flags, RTF_BLACKHOLE)) {
> > +			error = 0;
> > +			goto bad;
> > +		}
> > +		if (ISSET(rt->rt_flags, RTF_GATEWAY) &&
> > +		    !IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst))
> > +			dst = satosin6(rt->rt_gateway);
> > +	}
> >  
> >  	if (!IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) {
> >  		/* Unicast */
> 

-- 
:wq Claudio