From: Alexander Bluhm Subject: Re: route cache mpath To: tech@openbsd.org Date: Mon, 26 Feb 2024 20:32:31 +0100 On Mon, Feb 26, 2024 at 05:22:50PM +0100, Claudio Jeker wrote: > On Mon, Feb 26, 2024 at 03:51:37PM +0100, Alexander Bluhm wrote: > > On Mon, Feb 26, 2024 at 11:56:08AM +0100, Claudio Jeker wrote: > > > On Mon, Feb 26, 2024 at 11:44:57AM +0100, Alexander Bluhm wrote: > > > > Hi, > > > > > > > > We can combine route_cache() and rtalloc_mpath() in a new route_mpath() > > > > function. Then the caller does not have to care about the uint32_t > > > > *src pointer and just pass struct in_addr. All the conversions are > > > > done inside the functions. ro->ro_rt is either valid or NULL. > > > > > > > > Also IP input should pass a struct route to IP forward. This is > > > > the same logic that is done when passing a route from IP forward > > > > to IP output. As a result the numbers of route cache lookups in > > > > netstat -s should be correct now. > > > > > > > > Finally I removed some inconsistencies between IPv4 and IPv4 and > > > > IP forward and IP output. > > > > > > > > ok? > > > > > > I would prefer if rtalloc_mpath() is fixed and just takes a struct route * > > > as argument. This uint32_t * in rtalloc_mpath() and rtable_match() needs > > > to be changed. Passing the source as a uint32_t * is bad in many regards. > > > > Do you have a diff to explain what you want? > > > > route6_mpath() takes struct in_addr and writes it into struct route. > > What is bad about this design? > > Nothing. I wonder if we need yet another wrapper around a wrapper so that > another wrapper can wrap it into a wrapper before passing it the function > doing the lookup. > > If rtalloc_mpath() would be as simple as > struct rtentry *rtalloc_mpath(const struct route *); > then a lot of this would just go away since it is not needed. > Actually most of the code of route_mpath() and route6_mpath() could be > collapsed into rtalloc_mpath() (or vice versa). We can get rid of rtalloc_mpath() later. There are only 3 callers left. pf route-to does not use struct route yet. IP output uses route cache without route lookup for multicast. Looks wrong, has to be investigated. After that fixed, we can remove rtalloc_mpath() and call rt_match() from route_mpath(). This will remove the wrapper. > Btw. I think the check to set 's' in route6_mpath() is reversed. Good find, the ! was missing. As multipath with IPv6 is broken anyway, test could not find it. > > We might get rid of rtalloc_mpath(). But then we have to use route > > cache everywhere. Rewriting the uint32_t * mess in the routing > > code is not on my agenda. I try to improve things in small steps. > > Moving part of rtalloc_mpath() calls to route6_mpath() makes it > > better. Are route6_mpath() parameter the arguments you wish? > > > > I like the changes to ip_forward and ip6_forward and the general trend to > > > make code more consistent. > > > > Splitting the diff to change ip_forward() first is hard. The API > > of route6_mpath() or rtalloc_mpath() decides the way how consistent > > code should look like. > > > > How do we proceed? I don't understand how the code you wish looks. > > Not against moving on with this but ideally there will be a few more steps > after this that slowly make rtalloc() and rtalloc_mpath() better functions. > As I said I like the trajectory of this work since it does a lot of good > cleanup at higher levels. > > > > > Or should I split the diff in smaller pieces? To make review easier I can split introduction of route_mpath() into smaller diff. This contains easy callers in in_pcb. Note that I removed useless rt != NULL check in in6_selectif(). ok? bluhm Index: net/route.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.c,v diff -u -p -r1.433 route.c --- net/route.c 22 Feb 2024 14:25:58 -0000 1.433 +++ net/route.c 26 Feb 2024 18:19:42 -0000 @@ -239,6 +239,28 @@ route_cache(struct route *ro, const stru return (ESRCH); } +/* + * Check cache for route, else allocate a new one, potentially using multipath + * to select the peer. Update cache and return valid route or NULL. + */ +struct rtentry * +route_mpath(struct route *ro, const struct in_addr *dst, + const struct in_addr *src, u_int rtableid) +{ + if (route_cache(ro, dst, src, rtableid)) { + uint32_t *s = NULL; + + if (ro->ro_srcin.s_addr != INADDR_ANY) + s = &ro->ro_srcin.s_addr; + ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, s, ro->ro_tableid); + if (!rtisvalid(ro->ro_rt)) { + rtfree(ro->ro_rt); + ro->ro_rt = NULL; + } + } + return (ro->ro_rt); +} + #ifdef INET6 int route6_cache(struct route *ro, const struct in6_addr *dst, @@ -276,6 +298,24 @@ route6_cache(struct route *ro, const str ro->ro_srcin6 = *src; return (ESRCH); +} + +struct rtentry * +route6_mpath(struct route *ro, const struct in6_addr *dst, + const struct in6_addr *src, u_int rtableid) +{ + if (route6_cache(ro, dst, src, rtableid)) { + uint32_t *s = NULL; + + if (!IN6_IS_ADDR_UNSPECIFIED(&ro->ro_srcin6)) + s = &ro->ro_srcin6.s6_addr32[0]; + ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, s, ro->ro_tableid); + if (!rtisvalid(ro->ro_rt)) { + rtfree(ro->ro_rt); + ro->ro_rt = NULL; + } + } + return (ro->ro_rt); } #endif Index: net/route.h =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.h,v diff -u -p -r1.207 route.h --- net/route.h 22 Feb 2024 14:25:58 -0000 1.207 +++ net/route.h 26 Feb 2024 18:19:42 -0000 @@ -465,7 +465,11 @@ struct bfd_config; void route_init(void); int route_cache(struct route *, const struct in_addr *, const struct in_addr *, u_int); +struct rtentry *route_mpath(struct route *, const struct in_addr *, + const struct in_addr *, u_int); int route6_cache(struct route *, const struct in6_addr *, + const struct in6_addr *, u_int); +struct rtentry *route6_mpath(struct route *, const struct in6_addr *, const struct in6_addr *, u_int); void rtm_ifchg(struct ifnet *); void rtm_ifannounce(struct ifnet *, int); Index: netinet/in_pcb.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/in_pcb.c,v diff -u -p -r1.294 in_pcb.c --- netinet/in_pcb.c 22 Feb 2024 14:25:58 -0000 1.294 +++ netinet/in_pcb.c 26 Feb 2024 18:19:42 -0000 @@ -908,23 +908,15 @@ in_pcblookup_local_lock(struct inpcbtabl struct rtentry * in_pcbrtentry(struct inpcb *inp) { - struct route *ro; - #ifdef INET6 if (ISSET(inp->inp_flags, INP_IPV6)) return in6_pcbrtentry(inp); #endif - ro = &inp->inp_route; - if (inp->inp_faddr.s_addr == INADDR_ANY) return (NULL); - if (route_cache(ro, &inp->inp_faddr, &inp->inp_laddr, - inp->inp_rtableid)) { - ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, - &inp->inp_laddr.s_addr, ro->ro_tableid); - } - return (ro->ro_rt); + return (route_mpath(&inp->inp_route, &inp->inp_faddr, &inp->inp_laddr, + inp->inp_rtableid)); } /* @@ -938,7 +930,7 @@ in_pcbselsrc(struct in_addr *insrc, stru struct inpcb *inp) { struct ip_moptions *mopts = inp->inp_moptions; - struct route *ro = &inp->inp_route; + struct rtentry *rt; const struct in_addr *laddr = &inp->inp_laddr; u_int rtableid = inp->inp_rtableid; struct sockaddr *ip4_source = NULL; @@ -983,17 +975,14 @@ in_pcbselsrc(struct in_addr *insrc, stru * If route is known or can be allocated now, * our src addr is taken from the i/f, else punt. */ - if (route_cache(ro, &sin->sin_addr, NULL, rtableid)) { - /* No route yet, so try to acquire one */ - ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, NULL, ro->ro_tableid); - } + rt = route_mpath(&inp->inp_route, &sin->sin_addr, NULL, rtableid); /* * If we found a route, use the address * corresponding to the outgoing interface. */ - if (ro->ro_rt != NULL) - ia = ifatoia(ro->ro_rt->rt_ifa); + if (rt != NULL) + ia = ifatoia(rt->rt_ifa); /* * Use preferred source address if : @@ -1001,8 +990,7 @@ in_pcbselsrc(struct in_addr *insrc, stru * - preferred source address is set * - output interface is UP */ - if (ro->ro_rt && !(ro->ro_rt->rt_flags & RTF_LLINFO) && - !(ro->ro_rt->rt_flags & RTF_HOST)) { + if (rt && !(rt->rt_flags & RTF_LLINFO) && !(rt->rt_flags & RTF_HOST)) { ip4_source = rtable_getsource(rtableid, AF_INET); if (ip4_source != NULL) { struct ifaddr *ifa; Index: netinet6/in6_pcb.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/in6_pcb.c,v diff -u -p -r1.139 in6_pcb.c --- netinet6/in6_pcb.c 22 Feb 2024 14:25:58 -0000 1.139 +++ netinet6/in6_pcb.c 26 Feb 2024 18:19:42 -0000 @@ -561,16 +561,10 @@ in6_pcbnotify(struct inpcbtable *table, struct rtentry * in6_pcbrtentry(struct inpcb *inp) { - struct route *ro = &inp->inp_route; - if (IN6_IS_ADDR_UNSPECIFIED(&inp->inp_faddr6)) return (NULL); - if (route6_cache(ro, &inp->inp_faddr6, &inp->inp_laddr6, - inp->inp_rtableid)) { - ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, - &inp->inp_laddr6.s6_addr32[0], ro->ro_tableid); - } - return (ro->ro_rt); + return (route6_mpath(&inp->inp_route, &inp->inp_faddr6, + &inp->inp_laddr6, inp->inp_rtableid)); } struct inpcb * Index: netinet6/in6_src.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/in6_src.c,v diff -u -p -r1.95 in6_src.c --- netinet6/in6_src.c 22 Feb 2024 14:25:58 -0000 1.95 +++ netinet6/in6_src.c 26 Feb 2024 18:35:00 -0000 @@ -95,7 +95,7 @@ in6_pcbselsrc(const struct in6_addr **in struct inpcb *inp, struct ip6_pktopts *opts) { struct ip6_moptions *mopts = inp->inp_moptions6; - struct route *ro = &inp->inp_route; + struct rtentry *rt; const struct in6_addr *laddr = &inp->inp_laddr6; u_int rtableid = inp->inp_rtableid; struct ifnet *ifp = NULL; @@ -118,7 +118,8 @@ in6_pcbselsrc(const struct in6_addr **in struct sockaddr_in6 sa6; /* get the outgoing interface */ - error = in6_selectif(dst, opts, mopts, ro, &ifp, rtableid); + error = in6_selectif(dst, opts, mopts, &inp->inp_route, &ifp, + rtableid); if (error) return (error); @@ -179,9 +180,7 @@ in6_pcbselsrc(const struct in6_addr **in * If route is known or can be allocated now, * our src addr is taken from the i/f, else punt. */ - if (route6_cache(ro, dst, NULL, rtableid)) { - ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, NULL, ro->ro_tableid); - } + rt = route6_mpath(&inp->inp_route, dst, NULL, rtableid); /* * in_pcbconnect() checks out IFF_LOOPBACK to skip using @@ -190,14 +189,14 @@ in6_pcbselsrc(const struct in6_addr **in * so doesn't check out IFF_LOOPBACK. */ - if (ro->ro_rt) { - ifp = if_get(ro->ro_rt->rt_ifidx); + if (rt != NULL) { + ifp = if_get(rt->rt_ifidx); if (ifp != NULL) { ia6 = in6_ifawithscope(ifp, dst, rtableid); if_put(ifp); } if (ia6 == NULL) /* xxx scope error ?*/ - ia6 = ifatoia6(ro->ro_rt->rt_ifa); + ia6 = ifatoia6(rt->rt_ifa); } /* @@ -206,8 +205,7 @@ in6_pcbselsrc(const struct in6_addr **in * - preferred source address is set * - output interface is UP */ - if (ro->ro_rt && !(ro->ro_rt->rt_flags & RTF_LLINFO) && - !(ro->ro_rt->rt_flags & RTF_HOST)) { + if (rt && !(rt->rt_flags & RTF_LLINFO) && !(rt->rt_flags & RTF_HOST)) { ip6_source = rtable_getsource(rtableid, AF_INET6); if (ip6_source != NULL) { struct ifaddr *ifa; @@ -304,11 +302,9 @@ in6_selectroute(const struct in6_addr *d * a new one. */ if (ro) { - if (route6_cache(ro, dst, NULL, rtableid)) { - /* No route yet, so try to acquire one */ - ro->ro_rt = rtalloc_mpath(&ro->ro_dstsa, NULL, - ro->ro_tableid); - } + struct rtentry *rt; + + rt = route6_mpath(ro, dst, NULL, rtableid); /* * Check if the outgoing interface conflicts with @@ -319,15 +315,13 @@ in6_selectroute(const struct in6_addr *d */ if (opts && opts->ip6po_pktinfo && opts->ip6po_pktinfo->ipi6_ifindex) { - if (ro->ro_rt != NULL && - !ISSET(ro->ro_rt->rt_flags, RTF_LOCAL) && - ro->ro_rt->rt_ifidx != - opts->ip6po_pktinfo->ipi6_ifindex) { + if (rt != NULL && !ISSET(rt->rt_flags, RTF_LOCAL) && + rt->rt_ifidx != opts->ip6po_pktinfo->ipi6_ifindex) { return (NULL); } } - return (ro->ro_rt); + return (rt); } return (NULL); @@ -338,7 +332,7 @@ in6_selectif(const struct in6_addr *dst, struct ip6_moptions *mopts, struct route *ro, struct ifnet **retifp, u_int rtableid) { - struct rtentry *rt = NULL; + struct rtentry *rt; struct in6_pktinfo *pi = NULL; /* If the caller specify the outgoing interface explicitly, use it. */ @@ -377,11 +371,10 @@ in6_selectif(const struct in6_addr *dst, * Although this may not be very harmful, it should still be confusing. * We thus reject the case here. */ - if (rt && (rt->rt_flags & (RTF_REJECT | RTF_BLACKHOLE))) + if (ISSET(rt->rt_flags, RTF_REJECT | RTF_BLACKHOLE)) return (rt->rt_flags & RTF_HOST ? EHOSTUNREACH : ENETUNREACH); - if (rt != NULL) - *retifp = if_get(rt->rt_ifidx); + *retifp = if_get(rt->rt_ifidx); return (0); }