Index | Thread | Search

From:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Subject:
Re: route cache mpath
To:
Alexander Bluhm <alexander.bluhm@gmx.net>
Cc:
tech@openbsd.org
Date:
Tue, 27 Feb 2024 11:07:41 +0100

Download raw body.

Thread
On Mon, Feb 26, 2024 at 08:32:31PM +0100, Alexander Bluhm wrote:
> On Mon, Feb 26, 2024 at 05:22:50PM +0100, Claudio Jeker wrote:
> > On Mon, Feb 26, 2024 at 03:51:37PM +0100, Alexander Bluhm wrote:
> > > On Mon, Feb 26, 2024 at 11:56:08AM +0100, Claudio Jeker wrote:
> > > > On Mon, Feb 26, 2024 at 11:44:57AM +0100, Alexander Bluhm wrote:
> > > > > Hi,
> > > > > 
> > > > > We can combine route_cache() and rtalloc_mpath() in a new route_mpath()
> > > > > function.  Then the caller does not have to care about the uint32_t
> > > > > *src pointer and just pass struct in_addr.  All the conversions are
> > > > > done inside the functions.  ro->ro_rt is either valid or NULL.
> > > > > 
> > > > > Also IP input should pass a struct route to IP forward.  This is
> > > > > the same logic that is done when passing a route from IP forward
> > > > > to IP output.  As a result the numbers of route cache lookups in
> > > > > netstat -s should be correct now.
> > > > > 
> > > > > Finally I removed some inconsistencies between IPv4 and IPv4 and
> > > > > IP forward and IP output.
> > > > > 
> > > > > ok?
> > > > 
> > > > I would prefer if rtalloc_mpath() is fixed and just takes a struct route *
> > > > as argument. This uint32_t * in rtalloc_mpath() and rtable_match() needs
> > > > to be changed. Passing the source as a uint32_t * is bad in many regards.
> > > 
> > > Do you have a diff to explain what you want?
> > > 
> > > route6_mpath() takes struct in_addr and writes it into struct route.
> > > What is bad about this design?
> > 
> > Nothing. I wonder if we need yet another wrapper around a wrapper so that
> > another wrapper can wrap it into a wrapper before passing it the function
> > doing the lookup.
> > 
> > If rtalloc_mpath() would be as simple as
> >    struct rtentry *rtalloc_mpath(const struct route *);
> > then a lot of this would just go away since it is not needed.
> > Actually most of the code of route_mpath() and route6_mpath() could be
> > collapsed into rtalloc_mpath() (or vice versa).
> 
> We can get rid of rtalloc_mpath() later.  There are only 3 callers
> left.  pf route-to does not use struct route yet.  IP output uses
> route cache without route lookup for multicast.  Looks wrong, has
> to be investigated.
> 
> After that fixed, we can remove rtalloc_mpath() and call rt_match()
> from route_mpath().  This will remove the wrapper.

That is an option. Maybe it is good to move away from rtalloc_xyz() to
route_xyz() to improve the semantics.
 
> > Btw. I think the check to set 's' in route6_mpath() is reversed.
> 
> Good find, the ! was missing.  As multipath with IPv6 is broken
> anyway, test could not find it.

Is it still borked with your diff in? Why is IPv6 always so stupidly
special? No wonder adoption rate after 30years is nowhere.
 
> > > We might get rid of rtalloc_mpath().  But then we have to use route
> > > cache everywhere.  Rewriting the uint32_t * mess in the routing
> > > code is not on my agenda.  I try to improve things in small steps.
> > > Moving part of rtalloc_mpath() calls to route6_mpath() makes it
> > > better.  Are route6_mpath() parameter the arguments you wish?
> > 
> > > > I like the changes to ip_forward and ip6_forward and the general trend to
> > > > make code more consistent.
> > > 
> > > Splitting the diff to change ip_forward() first is hard.  The API
> > > of route6_mpath() or rtalloc_mpath() decides the way how consistent
> > > code should look like.
> > > 
> > > How do we proceed?  I don't understand how the code you wish looks.
> > 
> > Not against moving on with this but ideally there will be a few more steps
> > after this that slowly make rtalloc() and rtalloc_mpath() better functions.
> > As I said I like the trajectory of this work since it does a lot of good
> > cleanup at higher levels.
> >  
> > > > > Or should I split the diff in smaller pieces?
> 
> To make review easier I can split introduction of route_mpath()
> into smaller diff.  This contains easy callers in in_pcb.
> 
> Note that I removed useless rt != NULL check in in6_selectif().
> 
> ok?

One unrelated comment below. There is a slight change in behaviour because
of the new rtisvalid() check in route_mpath(). IMO it is best to put this
in so it gets widespread testing ASAP. OK claudio@
 
> bluhm
> 
> Index: net/route.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.c,v
> diff -u -p -r1.433 route.c
> --- net/route.c	22 Feb 2024 14:25:58 -0000	1.433
> +++ net/route.c	26 Feb 2024 18:19:42 -0000
> @@ -239,6 +239,28 @@ route_cache(struct route *ro, const stru
>  	return (ESRCH);
>  }
>  
> +/*
> + * Check cache for route, else allocate a new one, potentially using multipath
> + * to select the peer.  Update cache and return valid route or NULL.
> + */
> +struct rtentry *
> +route_mpath(struct route *ro, const struct in_addr *dst,
> +    const struct in_addr *src, u_int rtableid)
> +{
> +	if (route_cache(ro, dst, src, rtableid)) {
> +		uint32_t *s = NULL;
> +
> +		if (ro->ro_srcin.s_addr != INADDR_ANY)
> +			s = &ro->ro_srcin.s_addr;
> +		ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, s, ro->ro_tableid);
> +		if (!rtisvalid(ro->ro_rt)) {
> +			rtfree(ro->ro_rt);
> +			ro->ro_rt = NULL;
> +		}
> +	}
> +	return (ro->ro_rt);
> +}
> +
>  #ifdef INET6
>  int
>  route6_cache(struct route *ro, const struct in6_addr *dst,
> @@ -276,6 +298,24 @@ route6_cache(struct route *ro, const str
>  		ro->ro_srcin6 = *src;
>  
>  	return (ESRCH);
> +}
> +
> +struct rtentry *
> +route6_mpath(struct route *ro, const struct in6_addr *dst,
> +    const struct in6_addr *src, u_int rtableid)
> +{
> +	if (route6_cache(ro, dst, src, rtableid)) {
> +		uint32_t *s = NULL;
> +
> +		if (!IN6_IS_ADDR_UNSPECIFIED(&ro->ro_srcin6))
> +			s = &ro->ro_srcin6.s6_addr32[0];
> +		ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, s, ro->ro_tableid);
> +		if (!rtisvalid(ro->ro_rt)) {
> +			rtfree(ro->ro_rt);
> +			ro->ro_rt = NULL;
> +		}
> +	}
> +	return (ro->ro_rt);
>  }
>  #endif
>  
> Index: net/route.h
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.h,v
> diff -u -p -r1.207 route.h
> --- net/route.h	22 Feb 2024 14:25:58 -0000	1.207
> +++ net/route.h	26 Feb 2024 18:19:42 -0000
> @@ -465,7 +465,11 @@ struct bfd_config;
>  void	 route_init(void);
>  int	 route_cache(struct route *, const struct in_addr *,
>  	    const struct in_addr *, u_int);
> +struct rtentry *route_mpath(struct route *, const struct in_addr *,
> +	    const struct in_addr *, u_int);
>  int	 route6_cache(struct route *, const struct in6_addr *,
> +	    const struct in6_addr *, u_int);
> +struct rtentry *route6_mpath(struct route *, const struct in6_addr *,
>  	    const struct in6_addr *, u_int);
>  void	 rtm_ifchg(struct ifnet *);
>  void	 rtm_ifannounce(struct ifnet *, int);
> Index: netinet/in_pcb.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/in_pcb.c,v
> diff -u -p -r1.294 in_pcb.c
> --- netinet/in_pcb.c	22 Feb 2024 14:25:58 -0000	1.294
> +++ netinet/in_pcb.c	26 Feb 2024 18:19:42 -0000
> @@ -908,23 +908,15 @@ in_pcblookup_local_lock(struct inpcbtabl
>  struct rtentry *
>  in_pcbrtentry(struct inpcb *inp)
>  {
> -	struct route *ro;
> -
>  #ifdef INET6
>  	if (ISSET(inp->inp_flags, INP_IPV6))
>  		return in6_pcbrtentry(inp);
>  #endif
>  
> -	ro = &inp->inp_route;
> -
>  	if (inp->inp_faddr.s_addr == INADDR_ANY)
>  		return (NULL);
> -	if (route_cache(ro, &inp->inp_faddr, &inp->inp_laddr,
> -	    inp->inp_rtableid)) {
> -		ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa,
> -		    &inp->inp_laddr.s_addr, ro->ro_tableid);
> -	}
> -	return (ro->ro_rt);
> +	return (route_mpath(&inp->inp_route, &inp->inp_faddr, &inp->inp_laddr,
> +	    inp->inp_rtableid));
>  }
>  
>  /*
> @@ -938,7 +930,7 @@ in_pcbselsrc(struct in_addr *insrc, stru
>      struct inpcb *inp)
>  {
>  	struct ip_moptions *mopts = inp->inp_moptions;
> -	struct route *ro = &inp->inp_route;
> +	struct rtentry *rt;
>  	const struct in_addr *laddr = &inp->inp_laddr;
>  	u_int rtableid = inp->inp_rtableid;
>  	struct sockaddr	*ip4_source = NULL;
> @@ -983,17 +975,14 @@ in_pcbselsrc(struct in_addr *insrc, stru
>  	 * If route is known or can be allocated now,
>  	 * our src addr is taken from the i/f, else punt.
>  	 */
> -	if (route_cache(ro, &sin->sin_addr, NULL, rtableid)) {
> -		/* No route yet, so try to acquire one */
> -		ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, NULL, ro->ro_tableid);
> -	}
> +	rt = route_mpath(&inp->inp_route, &sin->sin_addr, NULL, rtableid);

Unrelated but this call is problematic. The issue here is that we have no
source, try to select one but then a following route_mpath() lookup may
actually return a different route. This is why local traffic using
multipath routing exits interfaces with the wrong IP.

Never figured out how this could be fixed in an elegant way.
Would it help to use toeplitz as a hash here?
 
>  	/*
>  	 * If we found a route, use the address
>  	 * corresponding to the outgoing interface.
>  	 */
> -	if (ro->ro_rt != NULL)
> -		ia = ifatoia(ro->ro_rt->rt_ifa);
> +	if (rt != NULL)
> +		ia = ifatoia(rt->rt_ifa);
>  
>  	/*
>  	 * Use preferred source address if :
> @@ -1001,8 +990,7 @@ in_pcbselsrc(struct in_addr *insrc, stru
>  	 * - preferred source address is set
>  	 * - output interface is UP
>  	 */
> -	if (ro->ro_rt && !(ro->ro_rt->rt_flags & RTF_LLINFO) &&
> -	    !(ro->ro_rt->rt_flags & RTF_HOST)) {
> +	if (rt && !(rt->rt_flags & RTF_LLINFO) && !(rt->rt_flags & RTF_HOST)) {
>  		ip4_source = rtable_getsource(rtableid, AF_INET);
>  		if (ip4_source != NULL) {
>  			struct ifaddr *ifa;
> Index: netinet6/in6_pcb.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/in6_pcb.c,v
> diff -u -p -r1.139 in6_pcb.c
> --- netinet6/in6_pcb.c	22 Feb 2024 14:25:58 -0000	1.139
> +++ netinet6/in6_pcb.c	26 Feb 2024 18:19:42 -0000
> @@ -561,16 +561,10 @@ in6_pcbnotify(struct inpcbtable *table, 
>  struct rtentry *
>  in6_pcbrtentry(struct inpcb *inp)
>  {
> -	struct route *ro = &inp->inp_route;
> -
>  	if (IN6_IS_ADDR_UNSPECIFIED(&inp->inp_faddr6))
>  		return (NULL);
> -	if (route6_cache(ro, &inp->inp_faddr6, &inp->inp_laddr6,
> -	    inp->inp_rtableid)) {
> -		ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa,
> -		    &inp->inp_laddr6.s6_addr32[0], ro->ro_tableid);
> -	}
> -	return (ro->ro_rt);
> +	return (route6_mpath(&inp->inp_route, &inp->inp_faddr6,
> +	    &inp->inp_laddr6, inp->inp_rtableid));
>  }
>  
>  struct inpcb *
> Index: netinet6/in6_src.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/in6_src.c,v
> diff -u -p -r1.95 in6_src.c
> --- netinet6/in6_src.c	22 Feb 2024 14:25:58 -0000	1.95
> +++ netinet6/in6_src.c	26 Feb 2024 18:35:00 -0000
> @@ -95,7 +95,7 @@ in6_pcbselsrc(const struct in6_addr **in
>      struct inpcb *inp, struct ip6_pktopts *opts)
>  {
>  	struct ip6_moptions *mopts = inp->inp_moptions6;
> -	struct route *ro = &inp->inp_route;
> +	struct rtentry *rt;
>  	const struct in6_addr *laddr = &inp->inp_laddr6;
>  	u_int rtableid = inp->inp_rtableid;
>  	struct ifnet *ifp = NULL;
> @@ -118,7 +118,8 @@ in6_pcbselsrc(const struct in6_addr **in
>  		struct sockaddr_in6 sa6;
>  
>  		/* get the outgoing interface */
> -		error = in6_selectif(dst, opts, mopts, ro, &ifp, rtableid);
> +		error = in6_selectif(dst, opts, mopts, &inp->inp_route, &ifp,
> +		    rtableid);
>  		if (error)
>  			return (error);
>  
> @@ -179,9 +180,7 @@ in6_pcbselsrc(const struct in6_addr **in
>  	 * If route is known or can be allocated now,
>  	 * our src addr is taken from the i/f, else punt.
>  	 */
> -	if (route6_cache(ro, dst, NULL, rtableid)) {
> -		ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, NULL, ro->ro_tableid);
> -	}
> +	rt = route6_mpath(&inp->inp_route, dst, NULL, rtableid);
>  
>  	/*
>  	 * in_pcbconnect() checks out IFF_LOOPBACK to skip using
> @@ -190,14 +189,14 @@ in6_pcbselsrc(const struct in6_addr **in
>  	 * so doesn't check out IFF_LOOPBACK.
>  	 */
>  
> -	if (ro->ro_rt) {
> -		ifp = if_get(ro->ro_rt->rt_ifidx);
> +	if (rt != NULL) {
> +		ifp = if_get(rt->rt_ifidx);
>  		if (ifp != NULL) {
>  			ia6 = in6_ifawithscope(ifp, dst, rtableid);
>  			if_put(ifp);
>  		}
>  		if (ia6 == NULL) /* xxx scope error ?*/
> -			ia6 = ifatoia6(ro->ro_rt->rt_ifa);
> +			ia6 = ifatoia6(rt->rt_ifa);
>  	}
>  
>  	/*
> @@ -206,8 +205,7 @@ in6_pcbselsrc(const struct in6_addr **in
>  	 * - preferred source address is set
>  	 * - output interface is UP
>  	 */
> -	if (ro->ro_rt && !(ro->ro_rt->rt_flags & RTF_LLINFO) &&
> -	    !(ro->ro_rt->rt_flags & RTF_HOST)) {
> +	if (rt && !(rt->rt_flags & RTF_LLINFO) && !(rt->rt_flags & RTF_HOST)) {
>  		ip6_source = rtable_getsource(rtableid, AF_INET6);
>  		if (ip6_source != NULL) {
>  			struct ifaddr *ifa;
> @@ -304,11 +302,9 @@ in6_selectroute(const struct in6_addr *d
>  	 * a new one.
>  	 */
>  	if (ro) {
> -		if (route6_cache(ro, dst, NULL, rtableid)) {
> -			/* No route yet, so try to acquire one */
> -			ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, NULL,
> -			    ro->ro_tableid);
> -		}
> +		struct rtentry *rt;
> +
> +		rt = route6_mpath(ro, dst, NULL, rtableid);
>  
>  		/*
>  		 * Check if the outgoing interface conflicts with
> @@ -319,15 +315,13 @@ in6_selectroute(const struct in6_addr *d
>  		 */
>  		if (opts && opts->ip6po_pktinfo &&
>  		    opts->ip6po_pktinfo->ipi6_ifindex) {
> -			if (ro->ro_rt != NULL &&
> -			    !ISSET(ro->ro_rt->rt_flags, RTF_LOCAL) &&
> -			    ro->ro_rt->rt_ifidx !=
> -			    opts->ip6po_pktinfo->ipi6_ifindex) {
> +			if (rt != NULL && !ISSET(rt->rt_flags, RTF_LOCAL) &&
> +			    rt->rt_ifidx != opts->ip6po_pktinfo->ipi6_ifindex) {
>  			    	return (NULL);
>  			}
>  		}
>  
> -		return (ro->ro_rt);
> +		return (rt);
>  	}
>  
>  	return (NULL);
> @@ -338,7 +332,7 @@ in6_selectif(const struct in6_addr *dst,
>      struct ip6_moptions *mopts, struct route *ro, struct ifnet **retifp,
>      u_int rtableid)
>  {
> -	struct rtentry *rt = NULL;
> +	struct rtentry *rt;
>  	struct in6_pktinfo *pi = NULL;
>  
>  	/* If the caller specify the outgoing interface explicitly, use it. */
> @@ -377,11 +371,10 @@ in6_selectif(const struct in6_addr *dst,
>  	 * Although this may not be very harmful, it should still be confusing.
>  	 * We thus reject the case here.
>  	 */
> -	if (rt && (rt->rt_flags & (RTF_REJECT | RTF_BLACKHOLE)))
> +	if (ISSET(rt->rt_flags, RTF_REJECT | RTF_BLACKHOLE))
>  		return (rt->rt_flags & RTF_HOST ? EHOSTUNREACH : ENETUNREACH);
>  
> -	if (rt != NULL)
> -		*retifp = if_get(rt->rt_ifidx);
> +	*retifp = if_get(rt->rt_ifidx);
>  
>  	return (0);
>  }
> 

-- 
:wq Claudio