Download raw body.
make route rt_locks atomic
On Mon, Jan 20, 2025 at 02:51:02PM +0100, Alexander Bluhm wrote:
> On Mon, Jan 20, 2025 at 02:09:03PM +0100, Claudio Jeker wrote:
> > On Tue, Jan 14, 2025 at 12:16:59AM +0100, Alexander Bluhm wrote:
> > > Hi,
> > >
> > > rt_locks is accessed lockless. Setting the bits has to be done
> > > atomically. Loads should be marked as atomic to avoid wrong compiler
> > > optimizations.
> > >
> > > The correlation between rt_locks and rt_mtu is very weak. When
> > > RTV_MTU is set, stop modifying rt_mtu. But is is not really necessary
> > > to stop updating rt_mtu immediately when another thread modifies
> > > rt_locks. The events in the network stack changing those values
> > > are not synchronized anyway. So I came to the conclusion that
> > > memory barriers are not necessary.
> > >
> > > In route message output, I have left setting the flags in exclusive
> > > net lock as they are modified together with other route attributes.
> > >
> > > ok?
> >
> > I'm probably late to the party but again, how can rt_locks be atomic when
> > the value rt_locks protects is changed not in an atomic transaction with
> > the rt_locks value?
> > This is not truly atomic and while what you do is probably OK it is not
> > correct.
>
> Changes of rt_locks are atomic with this diff. Old code is lucky
> without atomic_setbits_int() as RTV_MTU is the only flag used in
> rt_locks.
I know. The current code is not correct. Our routing code was declared MP
safe without ever looking at what happens on the nodes.
Or actually parallel softnet was enabled without solving that piece of
the puzzle.
> Another thing is how rt_locks and rt_mtu are related. Do they need
> a common mutex? I say no. Do the need memmory barries? I also
> say no, as rt_mtu is only valid if RTV_MTU is clean. If one CPU
> misses the RTV_MTU update, there is no rt_mtu update that must be
> handled. And if it misses the rt_mtu update, it does not matter
> as RTV_MTU is set.
Why is rt_mtu only valid if RTV_MTU is unset? That is not true.
You can lock the MTU of a route to some value from userland.
So if one cpu sets RTV_MTU and another cpu updates rt_mtu at the same time
you may end up with the wrong value locked.
Also the rtv_mtu is not fully ignored when RTV_MTU is set. So I think we
need to be careful here.
What about RTV_EXPIRE? This is not only about one value of rt_locks.
Now RTV_EXPIRE is not really used (apart from route(8)). Still it is part
of rt_locks.
I dislike that we mark a value as [a] when in reality it is more [a*]
with a big * comment why we consider it atomic.
> The only problematic path may be clearing RTV_MTU. But I left this
> code in exclusive net lock.
>
> Note that the diff makes things only better. If path MTU is really
> not correct, we have to backout parallel softnet. Or provide a
> diff which fixes it. I cannot see a bug.
Yes, we introduced this mess with parallel softnet. I fear with unlocking
tcp the situation will get worse at first since tcp is the number one
cause that fiddles with rt_mtu and RTV_MTU.
> The reason I want the volatile around rt_locks is to avoid compiler
> misoptmisations. Nothing can go wrong by CPU reordering in this
> case.
This is fine. I think it would be good to have an atomic_checkbits_int()
function since many of the checks of such atomic bitfields scare me as
well.
> bluhm
>
> > > Index: net/route.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.c,v
> > > diff -u -p -r1.438 route.c
> > > --- net/route.c 3 Jan 2025 21:27:40 -0000 1.438
> > > +++ net/route.c 13 Jan 2025 22:50:54 -0000
> > > @@ -557,7 +557,7 @@ rt_setgwroute(struct rtentry *rt, const
> > > * If the MTU of next hop is 0, this will reset the MTU of the
> > > * route to run PMTUD again from scratch.
> > > */
> > > - if (!ISSET(rt->rt_locks, RTV_MTU)) {
> > > + if (!ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU)) {
> > > u_int mtu, nhmtu;
> > >
> > > mtu = atomic_load_int(&rt->rt_mtu);
> > > Index: net/route.h
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/route.h,v
> > > diff -u -p -r1.211 route.h
> > > --- net/route.h 3 Jan 2025 21:27:40 -0000 1.211
> > > +++ net/route.h 13 Jan 2025 22:50:54 -0000
> > > @@ -38,6 +38,7 @@
> > > /*
> > > * Locks used to protect struct members in this file:
> > > * I immutable after creation
> > > + * a atomic operations
> > > * N net lock
> > > * X exclusive net lock, or shared net lock + kernel lock
> > > * R art (rtable) lock
> > > @@ -60,7 +61,7 @@
> > > struct rt_kmetrics {
> > > u_int64_t rmx_pksent; /* packets sent using this route */
> > > int64_t rmx_expire; /* lifetime for route, e.g. redirect */
> > > - u_int rmx_locks; /* Kernel must leave these values */
> > > + u_int rmx_locks; /* [a] Kernel must leave these values */
> > > u_int rmx_mtu; /* [a] MTU for this path */
> > > };
> > > #endif
> > > Index: net/rtsock.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtsock.c,v
> > > diff -u -p -r1.377 rtsock.c
> > > --- net/rtsock.c 9 Jan 2025 18:20:29 -0000 1.377
> > > +++ net/rtsock.c 13 Jan 2025 22:50:54 -0000
> > > @@ -1198,8 +1198,9 @@ change:
> > > }
> > > if_group_routechange(info->rti_info[RTAX_DST],
> > > info->rti_info[RTAX_NETMASK]);
> > > - rt->rt_locks &= ~(rtm->rtm_inits);
> > > - rt->rt_locks |= (rtm->rtm_inits & rtm->rtm_rmx.rmx_locks);
> > > + atomic_clearbits_int(&rt->rt_locks, rtm->rtm_inits);
> > > + atomic_setbits_int(&rt->rt_locks,
> > > + rtm->rtm_inits & rtm->rtm_rmx.rmx_locks);
> > > NET_UNLOCK();
> > > break;
> > > case RTM_GET:
> > > @@ -1344,7 +1345,7 @@ route_cleargateway(struct rtentry *rt, v
> > > struct rtentry *nhrt = arg;
> > >
> > > if (ISSET(rt->rt_flags, RTF_GATEWAY) && rt->rt_gwroute == nhrt &&
> > > - !ISSET(rt->rt_locks, RTV_MTU))
> > > + !ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU))
> > > atomic_store_int(&rt->rt_mtu, 0);
> > >
> > > return (0);
> > > @@ -1420,7 +1421,7 @@ rtm_getmetrics(const struct rtentry *rt,
> > > }
> > >
> > > bzero(out, sizeof(*out));
> > > - out->rmx_locks = in->rmx_locks;
> > > + out->rmx_locks = atomic_load_int(&in->rmx_locks);
> > > out->rmx_mtu = atomic_load_int(&in->rmx_mtu);
> > > out->rmx_expire = expire;
> > > out->rmx_pksent = in->rmx_pksent;
> > > Index: netinet/ip_icmp.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_icmp.c,v
> > > diff -u -p -r1.198 ip_icmp.c
> > > --- netinet/ip_icmp.c 3 Jan 2025 21:27:40 -0000 1.198
> > > +++ netinet/ip_icmp.c 13 Jan 2025 22:50:54 -0000
> > > @@ -1054,9 +1054,9 @@ icmp_mtudisc(struct icmp *icp, u_int rta
> > > * on a route. We should be using a separate flag
> > > * for the kernel to indicate this.
> > > */
> > > - if ((rt->rt_locks & RTV_MTU) == 0) {
> > > + if (!ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU)) {
> > > if (mtu < 296 || mtu > ifp->if_mtu)
> > > - rt->rt_locks |= RTV_MTU;
> > > + atomic_setbits_int(&rt->rt_locks, RTV_MTU);
> > > else if (rtmtu > mtu || rtmtu == 0)
> > > atomic_cas_uint(&rt->rt_mtu, rtmtu, mtu);
> > > }
> > > @@ -1090,7 +1090,7 @@ icmp_mtudisc_timeout(struct rtentry *rt,
> > > (*ctlfunc)(PRC_MTUINC, sintosa(&sin),
> > > rtableid, NULL);
> > > } else {
> > > - if ((rt->rt_locks & RTV_MTU) == 0)
> > > + if (!ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU))
> > > atomic_store_int(&rt->rt_mtu, 0);
> > > }
> > >
> > > Index: netinet/ip_output.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_output.c,v
> > > diff -u -p -r1.402 ip_output.c
> > > --- netinet/ip_output.c 3 Jan 2025 21:27:40 -0000 1.402
> > > +++ netinet/ip_output.c 13 Jan 2025 22:50:54 -0000
> > > @@ -384,7 +384,7 @@ sendit:
> > > * the route's MTU is locked.
> > > */
> > > if ((flags & IP_MTUDISC) && ro && ro->ro_rt &&
> > > - (ro->ro_rt->rt_locks & RTV_MTU) == 0)
> > > + (!ISSET(atomic_load_int(&ro->ro_rt->rt_locks), RTV_MTU)))
> > > ip->ip_off |= htons(IP_DF);
> > >
> > > #ifdef IPSEC
> > > @@ -471,7 +471,7 @@ sendit:
> > > */
> > > if (rtisvalid(ro->ro_rt) &&
> > > ISSET(ro->ro_rt->rt_flags, RTF_HOST) &&
> > > - !(ro->ro_rt->rt_locks & RTV_MTU)) {
> > > + !ISSET(atomic_load_int(&ro->ro_rt->rt_locks), RTV_MTU)) {
> > > u_int rtmtu;
> > >
> > > rtmtu = atomic_load_int(&ro->ro_rt->rt_mtu);
> > > Index: netinet/tcp_subr.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_subr.c,v
> > > diff -u -p -r1.204 tcp_subr.c
> > > --- netinet/tcp_subr.c 3 Jan 2025 17:23:51 -0000 1.204
> > > +++ netinet/tcp_subr.c 13 Jan 2025 22:50:54 -0000
> > > @@ -868,7 +868,8 @@ tcp_mtudisc(struct inpcb *inp, int errno
> > >
> > > rt = in_pcbrtentry(inp);
> > > if (rt != NULL) {
> > > - unsigned int orig_mtulock = (rt->rt_locks & RTV_MTU);
> > > + unsigned int orig_mtulock =
> > > + (atomic_load_int(&rt->rt_locks) & RTV_MTU);
> > >
> > > /*
> > > * If this was not a host route, remove and realloc.
> > > @@ -878,7 +879,7 @@ tcp_mtudisc(struct inpcb *inp, int errno
> > > if ((rt = in_pcbrtentry(inp)) == NULL)
> > > return;
> > > }
> > > - if (orig_mtulock < (rt->rt_locks & RTV_MTU))
> > > + if (orig_mtulock < (atomic_load_int(&rt->rt_locks) & RTV_MTU))
> > > change = 1;
> > > }
> > > tcp_mss(tp, -1);
> > > Index: netinet/tcp_timer.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_timer.c,v
> > > diff -u -p -r1.80 tcp_timer.c
> > > --- netinet/tcp_timer.c 5 Jan 2025 12:18:48 -0000 1.80
> > > +++ netinet/tcp_timer.c 13 Jan 2025 22:50:54 -0000
> > > @@ -282,7 +282,7 @@ tcp_timer_rexmt(void *arg)
> > > rt = in_pcbrtentry(inp);
> > > /* Check if path MTU discovery is disabled already */
> > > if (rt && (rt->rt_flags & RTF_HOST) &&
> > > - (rt->rt_locks & RTV_MTU))
> > > + ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU))
> > > goto leave;
> > >
> > > rt = NULL;
> > > @@ -303,8 +303,8 @@ tcp_timer_rexmt(void *arg)
> > > }
> > > if (rt != NULL) {
> > > /* Disable path MTU discovery */
> > > - if ((rt->rt_locks & RTV_MTU) == 0) {
> > > - rt->rt_locks |= RTV_MTU;
> > > + if (!ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU)) {
> > > + atomic_setbits_int(&rt->rt_locks, RTV_MTU);
> > > in_rtchange(inp, 0);
> > > }
> > >
> > > Index: netinet6/icmp6.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/icmp6.c,v
> > > diff -u -p -r1.256 icmp6.c
> > > --- netinet6/icmp6.c 3 Jan 2025 21:27:40 -0000 1.256
> > > +++ netinet6/icmp6.c 13 Jan 2025 22:50:54 -0000
> > > @@ -1016,7 +1016,7 @@ icmp6_mtudisc_update(struct ip6ctlparam
> > > rt = icmp6_mtudisc_clone(&sin6, m->m_pkthdr.ph_rtableid, 0);
> > >
> > > if (rt != NULL && ISSET(rt->rt_flags, RTF_HOST) &&
> > > - !(rt->rt_locks & RTV_MTU)) {
> > > + !ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU)) {
> > > u_int rtmtu;
> > >
> > > rtmtu = atomic_load_int(&rt->rt_mtu);
> > > @@ -1851,7 +1851,7 @@ icmp6_mtudisc_timeout(struct rtentry *rt
> > > if ((rt->rt_flags & (RTF_DYNAMIC|RTF_HOST)) == (RTF_DYNAMIC|RTF_HOST)) {
> > > rtdeletemsg(rt, ifp, rtableid);
> > > } else {
> > > - if (!(rt->rt_locks & RTV_MTU))
> > > + if (!ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU))
> > > atomic_store_int(&rt->rt_mtu, 0);
> > > }
> > >
> > > Index: netinet6/ip6_output.c
> > > ===================================================================
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_output.c,v
> > > diff -u -p -r1.294 ip6_output.c
> > > --- netinet6/ip6_output.c 3 Jan 2025 21:27:40 -0000 1.294
> > > +++ netinet6/ip6_output.c 13 Jan 2025 22:50:54 -0000
> > > @@ -1047,7 +1047,7 @@ ip6_getpmtu(struct rtentry *rt, struct i
> > > * field isn't locked).
> > > */
> > > mtu = ifp->if_mtu;
> > > - if (!(rt->rt_locks & RTV_MTU))
> > > + if (!ISSET(atomic_load_int(&rt->rt_locks), RTV_MTU))
> > > atomic_cas_uint(&rt->rt_mtu, rtmtu, mtu);
> > > }
> > > } else {
> > >
> >
> > --
> > :wq Claudio
>
--
:wq Claudio
make route rt_locks atomic