From: Alexander Bluhm Subject: Re: route gwroute mutex To: Vitaliy Makkoveev Cc: tech@openbsd.org Date: Thu, 13 Mar 2025 22:58:50 +0100 On Thu, Mar 06, 2025 at 05:47:59PM +0300, Vitaliy Makkoveev wrote: > On Thu, Mar 06, 2025 at 03:24:08PM +0100, Claudio Jeker wrote: > > On Thu, Mar 06, 2025 at 03:17:54PM +0100, Claudio Jeker wrote: > > > On Thu, Mar 06, 2025 at 02:47:10PM +0100, Alexander Bluhm wrote: > > > > Hi, > > > > > > > > I am still trying to make the leaves of the routing tree MP safe. > > > > > > > > A small step would be to protect rt_cachecnt and rt_gwroute writers > > > > with a per route mutex. I am aware that this is not complete, e.g. > > > > rt_flags also looks fishy. The READ_ONCE() in rt_getll() does not > > > > protect from use-after-free. I want to avoid mutex in the hot path. > > > > A previous attempt with SMR failed when running BGP. Let's make > > > > this first step and find a smarter solition later. route_cleargateway() > > > > is not in the hot packet path, so I added the mutex there. > > > > > > > > I have moved the rt_cachecnt from rt_setgwroute() to rt_putgwroute() > > > > to have the RTF_CACHED logic in one function. > > > > > > > > ok? > > > > > > I'm begrudgingly OK with this. Right now the situation is clearly > > > incorrect and the mutex solves this right now. > > > Long term we should move towards making struct rtentry more RCU friendly > > > so we can drop the mutex. But lets tackle that after 7.7 > > > > But please use the mutex in all cases and don't try to trick around it. > > I know it sucks but at least then the code is correct. > > I agree, it's better to use mutex. I'm curious, is the performance impact > such significant? I made some measurements for the different locks. http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/perform.html The performance impact of mutex, read once, mutex + refcount, smr + refcount is best seen in this graphic. http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/gnuplot/tcp.html The difference is very small, the variation of multiple measurements is bigger. It is most expensive when rt_getll() returns a refcounted route. Calling rtref() and rtfree() in the hot path is so expensive that the mutex is irrelevant. The effect is better visible in these kernel stack flame graphs. They show forwaring parallel UDP traffic. Click on ip_forward and search for getll. http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/2025-03-07T00%3A00%3A00Z/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=784.6&y=213 http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-mutex.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=769.9&y=197 http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-readonce.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=809.4&y=197 http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-refcnt.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=639.2&y=197 http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-smr.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=676.9&y=213 SMR does not help anything here. When playing with it, I saw additional races due to smr_barrier(). Maybe putting a mutex around rt_gwroute in rt_isvalid() and rt_getll() is best. But it does not make the code more correct if rt_getll() does not refcount the route while holding the mutex. If I remember mpi@ correctly, the idea of the RTF_CACHED flag is, that rt_gwroute is not removed while in use. If that is true, READ_ONCE() should be sufficient. bluhm Index: net/route.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.c,v diff -u -p -r1.443 route.c --- net/route.c 6 Mar 2025 23:09:02 -0000 1.443 +++ net/route.c 7 Mar 2025 18:52:03 -0000 @@ -324,10 +324,13 @@ rtisvalid(struct rtentry *rt) return (0); if (ISSET(rt->rt_flags, RTF_GATEWAY)) { - if (rt->rt_gwroute == NULL) + struct rtentry *gwroute; + + gwroute = READ_ONCE(rt->rt_gwroute); + if (gwroute == NULL) return (0); - KASSERT(!ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY)); - if (!ISSET(rt->rt_gwroute->rt_flags, RTF_UP)) + KASSERT(!ISSET(gwroute->rt_flags, RTF_GATEWAY)); + if (!ISSET(gwroute->rt_flags, RTF_UP)) return (0); } @@ -599,8 +602,10 @@ rt_putgwroute(struct rtentry *rt, struct mtx_leave(&nhrt->rt_mtx); } + mtx_enter(&rt->rt_mtx); onhrt = rt->rt_gwroute; rt->rt_gwroute = nhrt; + mtx_leave(&rt->rt_mtx); if (onhrt != NULL) { mtx_enter(&onhrt->rt_mtx); @@ -1169,8 +1174,11 @@ struct rtentry * rt_getll(struct rtentry *rt) { if (ISSET(rt->rt_flags, RTF_GATEWAY)) { + struct rtentry *gwroute; + + gwroute = READ_ONCE(rt->rt_gwroute); /* We may return NULL here. */ - return (rt->rt_gwroute); + return (gwroute); } return (rt); Index: net/route.h =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.h,v diff -u -p -r1.214 route.h --- net/route.h 6 Mar 2025 23:09:02 -0000 1.214 +++ net/route.h 7 Mar 2025 12:44:42 -0000 @@ -123,7 +123,7 @@ struct rtentry { caddr_t rt_llinfo; /* [L] pointer to link level info or an MPLS structure */ union { - struct rtentry *_nh; /* [X] rtentry for rt_gateway */ + struct rtentry *_nh; /* [r] rtentry for rt_gateway */ unsigned int _ref; /* [r] # gateway rtentry refs */ } RT_gw; #define rt_gwroute RT_gw._nh Index: net/rtsock.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtsock.c,v diff -u -p -r1.381 rtsock.c --- net/rtsock.c 16 Feb 2025 11:39:28 -0000 1.381 +++ net/rtsock.c 7 Mar 2025 12:44:34 -0000 @@ -1343,9 +1343,11 @@ route_cleargateway(struct rtentry *rt, v { struct rtentry *nhrt = arg; + mtx_enter(&rt->rt_mtx); if (ISSET(rt->rt_flags, RTF_GATEWAY) && rt->rt_gwroute == nhrt && !ISSET(rt->rt_locks, RTV_MTU)) atomic_store_int(&rt->rt_mtu, 0); + mtx_leave(&rt->rt_mtx); return (0); }