Download raw body.
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
On Thu, 16 Jan 2025 21:27:34 +0100
Claudio Jeker <cjeker@diehard.n-r-g.com> wrote:
> On Thu, Jan 16, 2025 at 01:37:13PM +1030, Jack Burton wrote:
> > On Wed, 15 Jan 2025 14:46:02 +0100
> > Claudio Jeker <cjeker@diehard.n-r-g.com> wrote:
> > > On Tue, Dec 31, 2024 at 06:33:08PM +1030, Jack Burton wrote:
> > > > Fixing that one unmasked a latent race condition during
> > > > shutdown. Depending on whether session reestablishment attempts
> > > > result in any new connections entering the login phase before
> > > > ISCSI_EXIT_WAIT expires, EITHER conn_free() would get called
> > > > multiple times on the same connection OR there'd be an outright
> > > > access after free.
> > > >
> > > > To avoid adding extra states to the fsm, simply check
> > > > exit_rounds (which gets set in shutdown_cb) before freeing
> > > > connections in sess_do_free(): if it's set then we're about to
> > > > exit anyway, so it doesn't matter that some connections may get
> > > > reallocated without freeing the old ones.
> > >
> > > Uhm. I don't understand how conn_free() can be called multiple
> > > times.
> >
> > sess_do_free() calls conn_free() in the loop for each connection in
> > the TAILQ. Then sess_do_conn_fail() calls conn_free() because
> > connection state is CONN_FREE.
> >
> > I don't really understand why sess_do_conn_fail() is getting called
> > at all, as after sess_do_free() the current session state should be
> > SESS_FREE (not SESS_RUNNING) ... but somehow it does get called.
> >
>
> Ah, I see. The problem is that sessev has another pointer to the
> connection and the sessev is not removed in conn_free().
> I need to reread the code to see how we can fix that mess.
> But what happens is that we have a event queued with a pointer to a
> connection that was already freed.
Thanks, that's just the clue I needed!
You're right, it is a bit of a mess. Not your fault though: most of
that mess is necessary if we want to keep the fsm reasonably true to the
standard.
Simplest way I could think of to avoid the double free (*without* the sledge-hammer approach of my original patch #2) was to keep track
of any one-shot event queued for each connection, dequeuing it (if any)
in conn_free().
[I am of course assuming that no connection ever gets more than one
one-shot event pending at the same time ... but in my testing I have not
been able to produce any scenario that violates that assumption.]
The patch below does just that. With this patch applied I get clean
shutdowns on SIGTERM, regardless of which target(s) iscsid is connected
to and regardless of whether each of them is in login, running, or
failed when I send the signal.
Is this a reasonable approach for the time being?
If so, I'll rewrite my patch #3 to suit once this one is in the tree.
Index: connection.c
===================================================================
RCS file: /cvs/src/usr.sbin/iscsid/connection.c,v
diff -u -p -r1.22 connection.c
--- connection.c 16 Jan 2025 16:19:39 -0000 1.22
+++ connection.c 17 Jan 2025 07:10:39 -0000
@@ -115,6 +115,11 @@ conn_free(struct connection *c)
event_del(&c->ev);
event_del(&c->wev);
+ if (c->oev != NULL) {
+ event_del(c->oev);
+ free(c->oev);
+ c->oev = NULL;
+ }
if (c->fd != -1)
close(c->fd);
Index: iscsid.h
===================================================================
RCS file: /cvs/src/usr.sbin/iscsid/iscsid.h,v
diff -u -p -r1.20 iscsid.h
--- iscsid.h 16 Jan 2025 16:19:39 -0000 1.20
+++ iscsid.h 17 Jan 2025 07:10:39 -0000
@@ -263,6 +263,7 @@ struct session_poll {
struct connection {
struct event ev;
+ struct event *oev;
struct event wev;
TAILQ_ENTRY(connection) entry;
struct connection_params mine;
Index: session.c
===================================================================
RCS file: /cvs/src/usr.sbin/iscsid/session.c,v
diff -u -p -r1.10 session.c
--- session.c 16 Jan 2025 16:17:32 -0000 1.10
+++ session.c 17 Jan 2025 07:10:39 -0000
@@ -209,12 +209,14 @@ session_fsm(struct session *s, enum s_ev
{
struct timeval tv;
struct sessev *sev;
+ struct event *oev;
log_debug("session_fsm[%s]: %s ev %s timeout %d",
s->config.SessionName, sess_state(s->state),
sess_event(ev), timeout);
- if ((sev = malloc(sizeof(*sev))) == NULL)
+ if ((oev = malloc(sizeof(*oev))) == NULL
+ || (sev = malloc(sizeof(*sev))) == NULL)
fatal("session_fsm");
sev->conn = c;
sev->sess = s;
@@ -222,8 +224,12 @@ session_fsm(struct session *s, enum s_ev
timerclear(&tv);
tv.tv_sec = timeout;
- if (event_once(-1, EV_TIMEOUT, session_fsm_callback, sev, &tv)
== -1) +
+ evtimer_set(oev, session_fsm_callback, sev);
+ if (event_add(oev, &tv))
fatal("session_fsm");
+ if (c != NULL)
+ c->oev = oev;
}
struct {
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup