From: Vitaliy Makkoveev Subject: Re: keep socket lock in sonewconn To: Alexander Bluhm Cc: OpenBSD tech Date: Sun, 26 Jan 2025 02:18:28 +0300 > On 26 Jan 2025, at 02:10, Alexander Bluhm wrote: > > On Sat, Jan 25, 2025 at 06:37:49PM +0300, Vitaliy Makkoveev wrote: >> On Fri, Jan 24, 2025 at 10:02:09AM +0100, Alexander Bluhm wrote: >>> Hi, >>> >>> For TCP input unlocking we need a consistent lock of a newly created >>> socket. Instead of releasing the lock in sonewconn() and grabbing >>> it again later, I think it is better that sonewconn() returns a >>> locked socket. >>> >>> Currently I only changed syn_cache_get() which calls in_pcbsounlock_rele() >>> at the end. Following diffs will push the unlock into tcp_input(). >>> >>> ok? >> >> UNIX sockets are much complicated in this path, because we need to >> handle 3 sockets and keep right locking order between them. Also, at >> his stage newly created UNIX socket linked to `so_q0' and accessible >> only from garbage collector, which will not close it. >> >> However, this is not true for TCP socket, because connection is >> already established and concurrent FIN or RST packet could start >> demolition. >> >> ok mvs, but keep in mind locking order between sockets, we always lock >> listening socket aka `head' first. > > Maybe we can simplify locking dance unp_connect() a bit. But I > wanted to be carefull and did a minimal change. > >> Isn't it better to pass `oldso' to sonewconn()? Does the `head' looks >> more reasonable here? >> >>> + oldso = so; >>> + oldinp = inp; >>> so = sonewconn(so, SS_ISCONNECTED, M_DONTWAIT); > > "old" is quite generic. TCP does not talk about "head". How about > "listen" socket? > > ok? > head used in socket’s layer, but listen is fine too. ok mvs > bluhm > > Index: netinet/tcp_input.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_input.c,v > diff -u -p -r1.424 tcp_input.c > --- netinet/tcp_input.c 25 Jan 2025 22:06:41 -0000 1.424 > +++ netinet/tcp_input.c 25 Jan 2025 23:04:10 -0000 > @@ -3532,10 +3532,10 @@ syn_cache_get(struct sockaddr *src, stru > { > struct syn_cache *sc; > struct syn_cache_head *scp; > - struct inpcb *inp, *oldinp; > + struct socket *listenso; > + struct inpcb *inp, *listeninp; > struct tcpcb *tp = NULL; > struct mbuf *am; > - struct socket *oldso; > u_int rtableid; > > NET_ASSERT_LOCKED(); > @@ -3573,35 +3573,35 @@ syn_cache_get(struct sockaddr *src, stru > * connection when the SYN arrived. If we can't create > * the connection, abort it. > */ > - oldso = so; > - oldinp = inp; > - so = sonewconn(so, SS_ISCONNECTED, M_DONTWAIT); > + listenso = so; > + listeninp = inp; > + so = sonewconn(listenso, SS_ISCONNECTED, M_DONTWAIT); > if (so == NULL) > goto resetandabort; > soassertlocked(so); > soref(so); > inp = sotoinpcb(so); > + tp = intotcpcb(inp); > > #ifdef IPSEC > /* > - * We need to copy the required security levels > - * from the old pcb. Ditto for any other > - * IPsec-related information. > + * We need to copy the required security levels from the listen pcb. > + * Ditto for any other IPsec-related information. > */ > - inp->inp_seclevel = oldinp->inp_seclevel; > + inp->inp_seclevel = listeninp->inp_seclevel; > #endif /* IPSEC */ > #ifdef INET6 > if (ISSET(inp->inp_flags, INP_IPV6)) { > - KASSERT(ISSET(oldinp->inp_flags, INP_IPV6)); > + KASSERT(ISSET(listeninp->inp_flags, INP_IPV6)); > > - inp->inp_ipv6.ip6_hlim = oldinp->inp_ipv6.ip6_hlim; > - inp->inp_hops = oldinp->inp_hops; > + inp->inp_ipv6.ip6_hlim = listeninp->inp_ipv6.ip6_hlim; > + inp->inp_hops = listeninp->inp_hops; > } else > #endif > { > - KASSERT(!ISSET(oldinp->inp_flags, INP_IPV6)); > + KASSERT(!ISSET(listeninp->inp_flags, INP_IPV6)); > > - inp->inp_ip.ip_ttl = oldinp->inp_ip.ip_ttl; > + inp->inp_ip.ip_ttl = listeninp->inp_ip.ip_ttl; > inp->inp_options = ip_srcroute(m); > if (inp->inp_options == NULL) { > inp->inp_options = sc->sc_ipopts; > @@ -3639,8 +3639,7 @@ syn_cache_get(struct sockaddr *src, stru > } > (void) m_free(am); > > - tp = intotcpcb(inp); > - tp->t_flags = sototcpcb(oldso)->t_flags & (TF_NOPUSH|TF_NODELAY); > + tp->t_flags = intotcpcb(listeninp)->t_flags & (TF_NOPUSH|TF_NODELAY); > if (sc->sc_request_r_scale != 15) { > tp->requested_s_scale = sc->sc_requested_s_scale; > tp->request_r_scale = sc->sc_request_r_scale;