Download raw body.
route gwroute mutex
On Thu, Mar 13, 2025 at 10:58:50PM +0100, Alexander Bluhm wrote:
> On Thu, Mar 06, 2025 at 05:47:59PM +0300, Vitaliy Makkoveev wrote:
> > On Thu, Mar 06, 2025 at 03:24:08PM +0100, Claudio Jeker wrote:
> > > On Thu, Mar 06, 2025 at 03:17:54PM +0100, Claudio Jeker wrote:
> > > > On Thu, Mar 06, 2025 at 02:47:10PM +0100, Alexander Bluhm wrote:
> > > > > Hi,
> > > > >
> > > > > I am still trying to make the leaves of the routing tree MP safe.
> > > > >
> > > > > A small step would be to protect rt_cachecnt and rt_gwroute writers
> > > > > with a per route mutex. I am aware that this is not complete, e.g.
> > > > > rt_flags also looks fishy. The READ_ONCE() in rt_getll() does not
> > > > > protect from use-after-free. I want to avoid mutex in the hot path.
> > > > > A previous attempt with SMR failed when running BGP. Let's make
> > > > > this first step and find a smarter solition later. route_cleargateway()
> > > > > is not in the hot packet path, so I added the mutex there.
> > > > >
> > > > > I have moved the rt_cachecnt from rt_setgwroute() to rt_putgwroute()
> > > > > to have the RTF_CACHED logic in one function.
> > > > >
> > > > > ok?
> > > >
> > > > I'm begrudgingly OK with this. Right now the situation is clearly
> > > > incorrect and the mutex solves this right now.
> > > > Long term we should move towards making struct rtentry more RCU friendly
> > > > so we can drop the mutex. But lets tackle that after 7.7
> > >
> > > But please use the mutex in all cases and don't try to trick around it.
> > > I know it sucks but at least then the code is correct.
> >
> > I agree, it's better to use mutex. I'm curious, is the performance impact
> > such significant?
>
> I made some measurements for the different locks.
>
> http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/perform.html
>
> The performance impact of mutex, read once, mutex + refcount,
> smr + refcount is best seen in this graphic.
> http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/gnuplot/tcp.html
>
> The difference is very small, the variation of multiple measurements
> is bigger. It is most expensive when rt_getll() returns a refcounted
> route. Calling rtref() and rtfree() in the hot path is so expensive
> that the mutex is irrelevant.
>
> The effect is better visible in these kernel stack flame graphs.
> They show forwaring parallel UDP traffic. Click on ip_forward and
> search for getll.
>
> http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/2025-03-07T00%3A00%3A00Z/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=784.6&y=213
> http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-mutex.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=769.9&y=197
> http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-readonce.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=809.4&y=197
> http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-refcnt.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=639.2&y=197
> http://bluhm.genua.de/perform/results/2025-03-07T02:19:19Z/patch-sys-rt-gwroute-smr.0/btrace/netbench.pl_-v_-B1000000000_-b1000000_-d1_-f1_-i3_-N10_-cperform%40lt13_-sperform%40lt16_-a10.3.46.60_-t10_udpbench-btrace-kstack.0.svg?s=getll&x=676.9&y=213
>
> SMR does not help anything here. When playing with it, I saw
> additional races due to smr_barrier().
>
> Maybe putting a mutex around rt_gwroute in rt_isvalid() and rt_getll()
> is best. But it does not make the code more correct if rt_getll()
> does not refcount the route while holding the mutex. If I remember
> mpi@ correctly, the idea of the RTF_CACHED flag is, that rt_gwroute
> is not removed while in use. If that is true, READ_ONCE() should
> be sufficient.
>
Thanks for all this work. I have to say I dislike to take sortcuts for
performance. Especially those that can turn bad later on.
Your SMR diff is too simplistic and so you don't really get the benefit. I
think the trick here is that you want to SMR all of struct rt_entry and
then you only need to take a refcnt if you want to sleep holding the
rt_entry.
It is interesting that refcnt_take() is so expensive. Something is off
there. I wonder why the mutex is cheap but the refcnt_take() is not.
Both require similar atomic instructions and I would expect similar
results for the two. Maybe this is some profiling artifact.
I doubt there are minimal patches to fix the issues around routes (without
affecting performance). We need to sit down and rethink and reqrite a lot
of this.
> bluhm
>
> Index: net/route.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.c,v
> diff -u -p -r1.443 route.c
> --- net/route.c 6 Mar 2025 23:09:02 -0000 1.443
> +++ net/route.c 7 Mar 2025 18:52:03 -0000
> @@ -324,10 +324,13 @@ rtisvalid(struct rtentry *rt)
> return (0);
>
> if (ISSET(rt->rt_flags, RTF_GATEWAY)) {
> - if (rt->rt_gwroute == NULL)
> + struct rtentry *gwroute;
> +
> + gwroute = READ_ONCE(rt->rt_gwroute);
> + if (gwroute == NULL)
> return (0);
> - KASSERT(!ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY));
> - if (!ISSET(rt->rt_gwroute->rt_flags, RTF_UP))
> + KASSERT(!ISSET(gwroute->rt_flags, RTF_GATEWAY));
> + if (!ISSET(gwroute->rt_flags, RTF_UP))
> return (0);
> }
>
> @@ -599,8 +602,10 @@ rt_putgwroute(struct rtentry *rt, struct
> mtx_leave(&nhrt->rt_mtx);
> }
>
> + mtx_enter(&rt->rt_mtx);
> onhrt = rt->rt_gwroute;
> rt->rt_gwroute = nhrt;
> + mtx_leave(&rt->rt_mtx);
>
> if (onhrt != NULL) {
> mtx_enter(&onhrt->rt_mtx);
> @@ -1169,8 +1174,11 @@ struct rtentry *
> rt_getll(struct rtentry *rt)
> {
> if (ISSET(rt->rt_flags, RTF_GATEWAY)) {
> + struct rtentry *gwroute;
> +
> + gwroute = READ_ONCE(rt->rt_gwroute);
> /* We may return NULL here. */
> - return (rt->rt_gwroute);
> + return (gwroute);
> }
>
> return (rt);
> Index: net/route.h
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.h,v
> diff -u -p -r1.214 route.h
> --- net/route.h 6 Mar 2025 23:09:02 -0000 1.214
> +++ net/route.h 7 Mar 2025 12:44:42 -0000
> @@ -123,7 +123,7 @@ struct rtentry {
> caddr_t rt_llinfo; /* [L] pointer to link level info or
> an MPLS structure */
> union {
> - struct rtentry *_nh; /* [X] rtentry for rt_gateway */
> + struct rtentry *_nh; /* [r] rtentry for rt_gateway */
> unsigned int _ref; /* [r] # gateway rtentry refs */
> } RT_gw;
> #define rt_gwroute RT_gw._nh
> Index: net/rtsock.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtsock.c,v
> diff -u -p -r1.381 rtsock.c
> --- net/rtsock.c 16 Feb 2025 11:39:28 -0000 1.381
> +++ net/rtsock.c 7 Mar 2025 12:44:34 -0000
> @@ -1343,9 +1343,11 @@ route_cleargateway(struct rtentry *rt, v
> {
> struct rtentry *nhrt = arg;
>
> + mtx_enter(&rt->rt_mtx);
> if (ISSET(rt->rt_flags, RTF_GATEWAY) && rt->rt_gwroute == nhrt &&
> !ISSET(rt->rt_locks, RTV_MTU))
> atomic_store_int(&rt->rt_mtu, 0);
> + mtx_leave(&rt->rt_mtx);
>
> return (0);
> }
>
--
:wq Claudio
route gwroute mutex