Download raw body.
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
On Mon, 20 Jan 2025 13:54:20 +0100 Claudio Jeker <cjeker@diehard.n-r-g.com> wrote: > On Fri, Jan 17, 2025 at 06:04:32PM +1030, Jack Burton wrote: > > On Thu, 16 Jan 2025 21:27:34 +0100 > > Claudio Jeker <cjeker@diehard.n-r-g.com> wrote: > > > > > On Thu, Jan 16, 2025 at 01:37:13PM +1030, Jack Burton wrote: > > > > On Wed, 15 Jan 2025 14:46:02 +0100 > > > > Claudio Jeker <cjeker@diehard.n-r-g.com> wrote: > > > > > On Tue, Dec 31, 2024 at 06:33:08PM +1030, Jack Burton wrote: > > > > > > > > > > > Fixing that one unmasked a latent race condition during > > > > > > shutdown. Depending on whether session reestablishment > > > > > > attempts result in any new connections entering the login > > > > > > phase before ISCSI_EXIT_WAIT expires, EITHER conn_free() > > > > > > would get called multiple times on the same connection OR > > > > > > there'd be an outright access after free. > > > > > > > > > > > > To avoid adding extra states to the fsm, simply check > > > > > > exit_rounds (which gets set in shutdown_cb) before freeing > > > > > > connections in sess_do_free(): if it's set then we're about > > > > > > to exit anyway, so it doesn't matter that some connections > > > > > > may get reallocated without freeing the old ones. > > > > > > > > > > Uhm. I don't understand how conn_free() can be called multiple > > > > > times. > > > > > > > > sess_do_free() calls conn_free() in the loop for each > > > > connection in the TAILQ. Then sess_do_conn_fail() calls > > > > conn_free() because connection state is CONN_FREE. > > > > > > > > I don't really understand why sess_do_conn_fail() is getting > > > > called at all, as after sess_do_free() the current session > > > > state should be SESS_FREE (not SESS_RUNNING) ... but somehow it > > > > does get called. > > > > > > Ah, I see. The problem is that sessev has another pointer to the > > > connection and the sessev is not removed in conn_free(). > > > I need to reread the code to see how we can fix that mess. > > > But what happens is that we have a event queued with a pointer to > > > a connection that was already freed. > > > > Thanks, that's just the clue I needed! > > > > You're right, it is a bit of a mess. Not your fault though: most of > > that mess is necessary if we want to keep the fsm reasonably true > > to the standard. > > > > Simplest way I could think of to avoid the double free (*without* > > the sledge-hammer approach of my original patch #2) was to keep > > track of any one-shot event queued for each connection, dequeuing > > it (if any) in conn_free(). > > > > [I am of course assuming that no connection ever gets more than one > > one-shot event pending at the same time ... but in my testing I > > have not been able to produce any scenario that violates that > > assumption.] > > > > The patch below does just that. With this patch applied I get clean > > shutdowns on SIGTERM, regardless of which target(s) iscsid is > > connected to and regardless of whether each of them is in login, > > running, or failed when I send the signal. > > > > Is this a reasonable approach for the time being? > > > > I took the liberty and cleaned up this a bit further. I added a struct > sessev to the session and connection structs and use those for the > session_fsm. So now a pending sessev event can be removed in > connection_free. > > What you think about that? Can you test this against the non-netbsd > targets? It's a bit more intrusive ... but it does make the code paths clearer to anyone reading the source, so yes I like your approach better. Testing gave the same results as my last iteration: clean shutdown on SIGTERM regardless of what targets we're connected to and their current state(s). As an added bonus, in the special case where all connections are in a failed state when SIGTERM is received, shutdown is much faster with your patch than with my last one. Many thanks.
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup
[patch 2 of 3] iscsid(8): avoid double free & race during cleanup