Download raw body.
route gwroute mutex
On Fri, Mar 14, 2025 at 01:29:47PM +0100, Claudio Jeker wrote:
> On Thu, Mar 13, 2025 at 10:58:50PM +0100, Alexander Bluhm wrote:
> > On Thu, Mar 06, 2025 at 05:47:59PM +0300, Vitaliy Makkoveev wrote:
> > > On Thu, Mar 06, 2025 at 03:24:08PM +0100, Claudio Jeker wrote:
> > > > On Thu, Mar 06, 2025 at 03:17:54PM +0100, Claudio Jeker wrote:
> > > > > On Thu, Mar 06, 2025 at 02:47:10PM +0100, Alexander Bluhm wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am still trying to make the leaves of the routing tree MP safe.
> > > > > >
> > > > > > A small step would be to protect rt_cachecnt and rt_gwroute writers
> > > > > > with a per route mutex. I am aware that this is not complete, e.g.
> > > > > > rt_flags also looks fishy. The READ_ONCE() in rt_getll() does not
> > > > > > protect from use-after-free. I want to avoid mutex in the hot path.
> > > > > > A previous attempt with SMR failed when running BGP. Let's make
> > > > > > this first step and find a smarter solition later. route_cleargateway()
> > > > > > is not in the hot packet path, so I added the mutex there.
> > > > > >
> > > > > > I have moved the rt_cachecnt from rt_setgwroute() to rt_putgwroute()
> > > > > > to have the RTF_CACHED logic in one function.
> > > > > >
> > > > > > ok?
> > > > >
> > > > > I'm begrudgingly OK with this. Right now the situation is clearly
> > > > > incorrect and the mutex solves this right now.
> > > > > Long term we should move towards making struct rtentry more RCU friendly
> > > > > so we can drop the mutex. But lets tackle that after 7.7
> > > >
> > > > But please use the mutex in all cases and don't try to trick around it.
> > > > I know it sucks but at least then the code is correct.
> > >
> > > I agree, it's better to use mutex. I'm curious, is the performance impact
> > > such significant?
> >
> > I made some measurements for the different locks.
> >
> > http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/perform.html
> >
> > The performance impact of mutex, read once, mutex + refcount,
> > smr + refcount is best seen in this graphic.
> > http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/gnuplot/tcp.html
> >
> > The difference is very small, the variation of multiple measurements
> > is bigger. It is most expensive when rt_getll() returns a refcounted
> > route. Calling rtref() and rtfree() in the hot path is so expensive
> > that the mutex is irrelevant.
> >
> > The effect is better visible in these kernel stack flame graphs.
> > They show forwaring parallel UDP traffic. Click on ip_forward and
> > search for getll.
> >
> > http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/2025-03-07T00%3A00%3A00Z/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=784.6&y=213
> > http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-mutex.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=769.9&y=197
> > http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-readonce.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=809.4&y=197
> > http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-refcnt.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=639.2&y=197
> > http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-smr.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=676.9&y=213
> >
> > SMR does not help anything here. When playing with it, I saw
> > additional races due to smr_barrier().
> >
> > Maybe putting a mutex around rt_gwroute in rt_isvalid() and rt_getll()
> > is best. But it does not make the code more correct if rt_getll()
> > does not refcount the route while holding the mutex. If I remember
> > mpi@ correctly, the idea of the RTF_CACHED flag is, that rt_gwroute
> > is not removed while in use. If that is true, READ_ONCE() should
> > be sufficient.
> >
>
> Thanks for all this work. I have to say I dislike to take sortcuts for
> performance. Especially those that can turn bad later on.
>
> Your SMR diff is too simplistic and so you don't really get the benefit. I
> think the trick here is that you want to SMR all of struct rt_entry and
> then you only need to take a refcnt if you want to sleep holding the
> rt_entry.
>
> It is interesting that refcnt_take() is so expensive. Something is off
> there. I wonder why the mutex is cheap but the refcnt_take() is not.
> Both require similar atomic instructions and I would expect similar
> results for the two. Maybe this is some profiling artifact.
>
dt(4) provides this refcnt_take() performance impact.
> I doubt there are minimal patches to fix the issues around routes (without
> affecting performance). We need to sit down and rethink and reqrite a lot
> of this.
>
> > bluhm
> >
> > Index: net/route.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.c,v
> > diff -u -p -r1.443 route.c
> > --- net/route.c 6 Mar 2025 23:09:02 -0000 1.443
> > +++ net/route.c 7 Mar 2025 18:52:03 -0000
> > @@ -324,10 +324,13 @@ rtisvalid(struct rtentry *rt)
> > return (0);
> >
> > if (ISSET(rt->rt_flags, RTF_GATEWAY)) {
> > - if (rt->rt_gwroute == NULL)
> > + struct rtentry *gwroute;
> > +
> > + gwroute = READ_ONCE(rt->rt_gwroute);
> > + if (gwroute == NULL)
> > return (0);
> > - KASSERT(!ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY));
> > - if (!ISSET(rt->rt_gwroute->rt_flags, RTF_UP))
> > + KASSERT(!ISSET(gwroute->rt_flags, RTF_GATEWAY));
> > + if (!ISSET(gwroute->rt_flags, RTF_UP))
> > return (0);
> > }
> >
> > @@ -599,8 +602,10 @@ rt_putgwroute(struct rtentry *rt, struct
> > mtx_leave(&nhrt->rt_mtx);
> > }
> >
> > + mtx_enter(&rt->rt_mtx);
> > onhrt = rt->rt_gwroute;
> > rt->rt_gwroute = nhrt;
> > + mtx_leave(&rt->rt_mtx);
> >
> > if (onhrt != NULL) {
> > mtx_enter(&onhrt->rt_mtx);
> > @@ -1169,8 +1174,11 @@ struct rtentry *
> > rt_getll(struct rtentry *rt)
> > {
> > if (ISSET(rt->rt_flags, RTF_GATEWAY)) {
> > + struct rtentry *gwroute;
> > +
> > + gwroute = READ_ONCE(rt->rt_gwroute);
> > /* We may return NULL here. */
> > - return (rt->rt_gwroute);
> > + return (gwroute);
> > }
> >
> > return (rt);
> > Index: net/route.h
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.h,v
> > diff -u -p -r1.214 route.h
> > --- net/route.h 6 Mar 2025 23:09:02 -0000 1.214
> > +++ net/route.h 7 Mar 2025 12:44:42 -0000
> > @@ -123,7 +123,7 @@ struct rtentry {
> > caddr_t rt_llinfo; /* [L] pointer to link level info or
> > an MPLS structure */
> > union {
> > - struct rtentry *_nh; /* [X] rtentry for rt_gateway */
> > + struct rtentry *_nh; /* [r] rtentry for rt_gateway */
> > unsigned int _ref; /* [r] # gateway rtentry refs */
> > } RT_gw;
> > #define rt_gwroute RT_gw._nh
> > Index: net/rtsock.c
> > ===================================================================
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtsock.c,v
> > diff -u -p -r1.381 rtsock.c
> > --- net/rtsock.c 16 Feb 2025 11:39:28 -0000 1.381
> > +++ net/rtsock.c 7 Mar 2025 12:44:34 -0000
> > @@ -1343,9 +1343,11 @@ route_cleargateway(struct rtentry *rt, v
> > {
> > struct rtentry *nhrt = arg;
> >
> > + mtx_enter(&rt->rt_mtx);
> > if (ISSET(rt->rt_flags, RTF_GATEWAY) && rt->rt_gwroute == nhrt &&
> > !ISSET(rt->rt_locks, RTV_MTU))
> > atomic_store_int(&rt->rt_mtu, 0);
> > + mtx_leave(&rt->rt_mtx);
> >
> > return (0);
> > }
> >
>
> --
> :wq Claudio
>
route gwroute mutex