Download raw body.
route gwroute mutex
On Thu, Mar 06, 2025 at 05:47:59PM +0300, Vitaliy Makkoveev wrote:
> On Thu, Mar 06, 2025 at 03:24:08PM +0100, Claudio Jeker wrote:
> > On Thu, Mar 06, 2025 at 03:17:54PM +0100, Claudio Jeker wrote:
> > > On Thu, Mar 06, 2025 at 02:47:10PM +0100, Alexander Bluhm wrote:
> > > > Hi,
> > > >
> > > > I am still trying to make the leaves of the routing tree MP safe.
> > > >
> > > > A small step would be to protect rt_cachecnt and rt_gwroute writers
> > > > with a per route mutex. I am aware that this is not complete, e.g.
> > > > rt_flags also looks fishy. The READ_ONCE() in rt_getll() does not
> > > > protect from use-after-free. I want to avoid mutex in the hot path.
> > > > A previous attempt with SMR failed when running BGP. Let's make
> > > > this first step and find a smarter solition later. route_cleargateway()
> > > > is not in the hot packet path, so I added the mutex there.
> > > >
> > > > I have moved the rt_cachecnt from rt_setgwroute() to rt_putgwroute()
> > > > to have the RTF_CACHED logic in one function.
> > > >
> > > > ok?
> > >
> > > I'm begrudgingly OK with this. Right now the situation is clearly
> > > incorrect and the mutex solves this right now.
> > > Long term we should move towards making struct rtentry more RCU friendly
> > > so we can drop the mutex. But lets tackle that after 7.7
> >
> > But please use the mutex in all cases and don't try to trick around it.
> > I know it sucks but at least then the code is correct.
>
> I agree, it's better to use mutex. I'm curious, is the performance impact
> such significant?
I made some measurements for the different locks.
http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/perform.html
The performance impact of mutex, read once, mutex + refcount,
smr + refcount is best seen in this graphic.
http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/gnuplot/tcp.html
The difference is very small, the variation of multiple measurements
is bigger. It is most expensive when rt_getll() returns a refcounted
route. Calling rtref() and rtfree() in the hot path is so expensive
that the mutex is irrelevant.
The effect is better visible in these kernel stack flame graphs.
They show forwaring parallel UDP traffic. Click on ip_forward and
search for getll.
http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/2025-03-07T00%3A00%3A00Z/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=784.6&y=213
http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-mutex.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=769.9&y=197
http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-readonce.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=809.4&y=197
http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-refcnt.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=639.2&y=197
http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-smr.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=676.9&y=213
SMR does not help anything here. When playing with it, I saw
additional races due to smr_barrier().
Maybe putting a mutex around rt_gwroute in rt_isvalid() and rt_getll()
is best. But it does not make the code more correct if rt_getll()
does not refcount the route while holding the mutex. If I remember
mpi@ correctly, the idea of the RTF_CACHED flag is, that rt_gwroute
is not removed while in use. If that is true, READ_ONCE() should
be sufficient.
bluhm
Index: net/route.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.c,v
diff -u -p -r1.443 route.c
--- net/route.c 6 Mar 2025 23:09:02 -0000 1.443
+++ net/route.c 7 Mar 2025 18:52:03 -0000
@@ -324,10 +324,13 @@ rtisvalid(struct rtentry *rt)
return (0);
if (ISSET(rt->rt_flags, RTF_GATEWAY)) {
- if (rt->rt_gwroute == NULL)
+ struct rtentry *gwroute;
+
+ gwroute = READ_ONCE(rt->rt_gwroute);
+ if (gwroute == NULL)
return (0);
- KASSERT(!ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY));
- if (!ISSET(rt->rt_gwroute->rt_flags, RTF_UP))
+ KASSERT(!ISSET(gwroute->rt_flags, RTF_GATEWAY));
+ if (!ISSET(gwroute->rt_flags, RTF_UP))
return (0);
}
@@ -599,8 +602,10 @@ rt_putgwroute(struct rtentry *rt, struct
mtx_leave(&nhrt->rt_mtx);
}
+ mtx_enter(&rt->rt_mtx);
onhrt = rt->rt_gwroute;
rt->rt_gwroute = nhrt;
+ mtx_leave(&rt->rt_mtx);
if (onhrt != NULL) {
mtx_enter(&onhrt->rt_mtx);
@@ -1169,8 +1174,11 @@ struct rtentry *
rt_getll(struct rtentry *rt)
{
if (ISSET(rt->rt_flags, RTF_GATEWAY)) {
+ struct rtentry *gwroute;
+
+ gwroute = READ_ONCE(rt->rt_gwroute);
/* We may return NULL here. */
- return (rt->rt_gwroute);
+ return (gwroute);
}
return (rt);
Index: net/route.h
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.h,v
diff -u -p -r1.214 route.h
--- net/route.h 6 Mar 2025 23:09:02 -0000 1.214
+++ net/route.h 7 Mar 2025 12:44:42 -0000
@@ -123,7 +123,7 @@ struct rtentry {
caddr_t rt_llinfo; /* [L] pointer to link level info or
an MPLS structure */
union {
- struct rtentry *_nh; /* [X] rtentry for rt_gateway */
+ struct rtentry *_nh; /* [r] rtentry for rt_gateway */
unsigned int _ref; /* [r] # gateway rtentry refs */
} RT_gw;
#define rt_gwroute RT_gw._nh
Index: net/rtsock.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtsock.c,v
diff -u -p -r1.381 rtsock.c
--- net/rtsock.c 16 Feb 2025 11:39:28 -0000 1.381
+++ net/rtsock.c 7 Mar 2025 12:44:34 -0000
@@ -1343,9 +1343,11 @@ route_cleargateway(struct rtentry *rt, v
{
struct rtentry *nhrt = arg;
+ mtx_enter(&rt->rt_mtx);
if (ISSET(rt->rt_flags, RTF_GATEWAY) && rt->rt_gwroute == nhrt &&
!ISSET(rt->rt_locks, RTV_MTU))
atomic_store_int(&rt->rt_mtu, 0);
+ mtx_leave(&rt->rt_mtx);
return (0);
}
route gwroute mutex