Index | Thread | Search

From:
Jack Burton <jack@saosce.com.au>
Subject:
Re: [patch 2 of 3] iscsid(8): avoid double free & race during cleanup
To:
Claudio Jeker <cjeker@diehard.n-r-g.com>
Cc:
tech@openbsd.org
Date:
Fri, 17 Jan 2025 18:04:32 +1030

Download raw body.

Thread
On Thu, 16 Jan 2025 21:27:34 +0100
Claudio Jeker <cjeker@diehard.n-r-g.com> wrote:

> On Thu, Jan 16, 2025 at 01:37:13PM +1030, Jack Burton wrote:
> > On Wed, 15 Jan 2025 14:46:02 +0100
> > Claudio Jeker <cjeker@diehard.n-r-g.com> wrote:  
> > > On Tue, Dec 31, 2024 at 06:33:08PM +1030, Jack Burton wrote:  
> > > > Fixing that one unmasked a latent race condition during
> > > > shutdown. Depending on whether session reestablishment attempts
> > > > result in any new connections entering the login phase before
> > > > ISCSI_EXIT_WAIT expires, EITHER conn_free() would get called
> > > > multiple times on the same connection OR there'd be an outright
> > > > access after free.
> > > > 
> > > > To avoid adding extra states to the fsm, simply check
> > > > exit_rounds (which gets set in shutdown_cb) before freeing
> > > > connections in sess_do_free(): if it's set then we're about to
> > > > exit anyway, so it doesn't matter that some connections may get
> > > > reallocated without freeing the old ones.    
> > > 
> > > Uhm. I don't understand how conn_free() can be called multiple
> > > times.  
> > 
> > sess_do_free() calls conn_free() in the loop for each connection in
> > the TAILQ.  Then sess_do_conn_fail() calls conn_free() because
> > connection state is CONN_FREE.
> > 
> > I don't really understand why sess_do_conn_fail() is getting called
> > at all, as after sess_do_free() the current session state should be
> > SESS_FREE (not SESS_RUNNING) ... but somehow it does get called.
> >   
> 
> Ah, I see. The problem is that sessev has another pointer to the
> connection and the sessev is not removed in conn_free().
> I need to reread the code to see how we can fix that mess.
> But what happens is that we have a event queued with a pointer to a
> connection that was already freed.

Thanks, that's just the clue I needed!

You're right, it is a bit of a mess.  Not your fault though: most of
that mess is necessary if we want to keep the fsm reasonably true to the
standard.

Simplest way I could think of to avoid the double free (*without* the sledge-hammer approach of my original patch #2) was to keep track
of any one-shot event queued for each connection, dequeuing it (if any)
in conn_free().

[I am of course assuming that no connection ever gets more than one
one-shot event pending at the same time ... but in my testing I have not
been able to produce any scenario that violates that assumption.]

The patch below does just that.  With this patch applied I get clean
shutdowns on SIGTERM, regardless of which target(s) iscsid is connected
to and regardless of whether each of them is in login, running, or
failed when I send the signal.

Is this a reasonable approach for the time being?

If so, I'll rewrite my patch #3 to suit once this one is in the tree.

Index: connection.c
===================================================================
RCS file: /cvs/src/usr.sbin/iscsid/connection.c,v
diff -u -p -r1.22 connection.c
--- connection.c	16 Jan 2025 16:19:39 -0000	1.22
+++ connection.c	17 Jan 2025 07:10:39 -0000
@@ -115,6 +115,11 @@ conn_free(struct connection *c)
 
 	event_del(&c->ev);
 	event_del(&c->wev);
+	if (c->oev != NULL) {
+		event_del(c->oev);
+		free(c->oev);
+		c->oev = NULL;
+	}
 	if (c->fd != -1)
 		close(c->fd);
 
Index: iscsid.h
===================================================================
RCS file: /cvs/src/usr.sbin/iscsid/iscsid.h,v
diff -u -p -r1.20 iscsid.h
--- iscsid.h	16 Jan 2025 16:19:39 -0000	1.20
+++ iscsid.h	17 Jan 2025 07:10:39 -0000
@@ -263,6 +263,7 @@ struct session_poll {
 
 struct connection {
 	struct event		 ev;
+	struct event		*oev;
 	struct event		 wev;
 	TAILQ_ENTRY(connection)	 entry;
 	struct connection_params mine;
Index: session.c
===================================================================
RCS file: /cvs/src/usr.sbin/iscsid/session.c,v
diff -u -p -r1.10 session.c
--- session.c	16 Jan 2025 16:17:32 -0000	1.10
+++ session.c	17 Jan 2025 07:10:39 -0000
@@ -209,12 +209,14 @@ session_fsm(struct session *s, enum s_ev
 {
 	struct timeval tv;
 	struct sessev *sev;
+	struct event *oev;
 
 	log_debug("session_fsm[%s]: %s ev %s timeout %d",
 	    s->config.SessionName, sess_state(s->state),
 	    sess_event(ev), timeout);
 
-	if ((sev = malloc(sizeof(*sev))) == NULL)
+	if ((oev = malloc(sizeof(*oev))) == NULL
+	    || (sev = malloc(sizeof(*sev))) == NULL)
 		fatal("session_fsm");
 	sev->conn = c;
 	sev->sess = s;
@@ -222,8 +224,12 @@ session_fsm(struct session *s, enum s_ev
 
 	timerclear(&tv);
 	tv.tv_sec = timeout;
-	if (event_once(-1, EV_TIMEOUT, session_fsm_callback, sev, &tv)
== -1) +
+	evtimer_set(oev, session_fsm_callback, sev);
+	if (event_add(oev, &tv))
 		fatal("session_fsm");
+	if (c != NULL)
+		c->oev = oev;
 }
 
 struct {