From: Jack Burton Subject: Re: [patch 2 of 3] iscsid(8): avoid double free & race during cleanup To: Claudio Jeker Cc: tech@openbsd.org Date: Fri, 17 Jan 2025 18:04:32 +1030 On Thu, 16 Jan 2025 21:27:34 +0100 Claudio Jeker wrote: > On Thu, Jan 16, 2025 at 01:37:13PM +1030, Jack Burton wrote: > > On Wed, 15 Jan 2025 14:46:02 +0100 > > Claudio Jeker wrote: > > > On Tue, Dec 31, 2024 at 06:33:08PM +1030, Jack Burton wrote: > > > > Fixing that one unmasked a latent race condition during > > > > shutdown. Depending on whether session reestablishment attempts > > > > result in any new connections entering the login phase before > > > > ISCSI_EXIT_WAIT expires, EITHER conn_free() would get called > > > > multiple times on the same connection OR there'd be an outright > > > > access after free. > > > > > > > > To avoid adding extra states to the fsm, simply check > > > > exit_rounds (which gets set in shutdown_cb) before freeing > > > > connections in sess_do_free(): if it's set then we're about to > > > > exit anyway, so it doesn't matter that some connections may get > > > > reallocated without freeing the old ones. > > > > > > Uhm. I don't understand how conn_free() can be called multiple > > > times. > > > > sess_do_free() calls conn_free() in the loop for each connection in > > the TAILQ. Then sess_do_conn_fail() calls conn_free() because > > connection state is CONN_FREE. > > > > I don't really understand why sess_do_conn_fail() is getting called > > at all, as after sess_do_free() the current session state should be > > SESS_FREE (not SESS_RUNNING) ... but somehow it does get called. > > > > Ah, I see. The problem is that sessev has another pointer to the > connection and the sessev is not removed in conn_free(). > I need to reread the code to see how we can fix that mess. > But what happens is that we have a event queued with a pointer to a > connection that was already freed. Thanks, that's just the clue I needed! You're right, it is a bit of a mess. Not your fault though: most of that mess is necessary if we want to keep the fsm reasonably true to the standard. Simplest way I could think of to avoid the double free (*without* the sledge-hammer approach of my original patch #2) was to keep track of any one-shot event queued for each connection, dequeuing it (if any) in conn_free(). [I am of course assuming that no connection ever gets more than one one-shot event pending at the same time ... but in my testing I have not been able to produce any scenario that violates that assumption.] The patch below does just that. With this patch applied I get clean shutdowns on SIGTERM, regardless of which target(s) iscsid is connected to and regardless of whether each of them is in login, running, or failed when I send the signal. Is this a reasonable approach for the time being? If so, I'll rewrite my patch #3 to suit once this one is in the tree. Index: connection.c =================================================================== RCS file: /cvs/src/usr.sbin/iscsid/connection.c,v diff -u -p -r1.22 connection.c --- connection.c 16 Jan 2025 16:19:39 -0000 1.22 +++ connection.c 17 Jan 2025 07:10:39 -0000 @@ -115,6 +115,11 @@ conn_free(struct connection *c) event_del(&c->ev); event_del(&c->wev); + if (c->oev != NULL) { + event_del(c->oev); + free(c->oev); + c->oev = NULL; + } if (c->fd != -1) close(c->fd); Index: iscsid.h =================================================================== RCS file: /cvs/src/usr.sbin/iscsid/iscsid.h,v diff -u -p -r1.20 iscsid.h --- iscsid.h 16 Jan 2025 16:19:39 -0000 1.20 +++ iscsid.h 17 Jan 2025 07:10:39 -0000 @@ -263,6 +263,7 @@ struct session_poll { struct connection { struct event ev; + struct event *oev; struct event wev; TAILQ_ENTRY(connection) entry; struct connection_params mine; Index: session.c =================================================================== RCS file: /cvs/src/usr.sbin/iscsid/session.c,v diff -u -p -r1.10 session.c --- session.c 16 Jan 2025 16:17:32 -0000 1.10 +++ session.c 17 Jan 2025 07:10:39 -0000 @@ -209,12 +209,14 @@ session_fsm(struct session *s, enum s_ev { struct timeval tv; struct sessev *sev; + struct event *oev; log_debug("session_fsm[%s]: %s ev %s timeout %d", s->config.SessionName, sess_state(s->state), sess_event(ev), timeout); - if ((sev = malloc(sizeof(*sev))) == NULL) + if ((oev = malloc(sizeof(*oev))) == NULL + || (sev = malloc(sizeof(*sev))) == NULL) fatal("session_fsm"); sev->conn = c; sev->sess = s; @@ -222,8 +224,12 @@ session_fsm(struct session *s, enum s_ev timerclear(&tv); tv.tv_sec = timeout; - if (event_once(-1, EV_TIMEOUT, session_fsm_callback, sev, &tv) == -1) + + evtimer_set(oev, session_fsm_callback, sev); + if (event_add(oev, &tv)) fatal("session_fsm"); + if (c != NULL) + c->oev = oev; } struct {