Download raw body.
sdhc(4): fix intr status race
> Date: Mon, 5 Aug 2024 00:31:01 +0200
> From: Patrick Wildt <patrick@blueri.se>
>
> Hi,
>
> on my RK3588 the eMMC only worked when running with GENERIC.MP, but with
> GENERIC or RAMDISK it failed to read command responses unless I compiled
> it with MULTIPROCESSOR, even if I only run it with one core.
>
> As it turns out, this led to a timing change in the command sequence:
>
> As one of the initialization steps, the eMMC is "enumerated". It is
> supposed to timeout/error if there's no second "card". That's normal.
> After enumeration, it tries to read more information, which is where
> it then failed.
>
> > sdhc_exec_command: start fresh command (ENUMERATION)
> > dwmshc0: interrupt status=8000<ERROR>
> > dwmshc0: interrupt status=1<CMD>
> > sdhc_exec_command: start fresh command (READ)
> > dwmshc0: interrupt status=1<CMD>
I wonder what other SDHC controllers do. Do they set ERROR and CMD
simultaneously? Or don't they set CMD at all?
> As can be seen, the error-ing command seems to throw two separate
> interrupts.
>
> This first interrupt will:
> * hp->intr_status |= ERROR;
> * wakeup() the tsleep-ing sdhc_wait_intr(), which will return the
> information that the command failed
>
> Before the next command is started, the second interrupt will:
> * hp->intr_status |= CMD_COMPLETE;
>
> When the next command is executed, CMD_COMPLETE remains set, hence
> sdhc_wait_intr() will assume it's already complete, even though it
> isn't. Hence invalid data is read.
>
> I'm not sure if there's a more elegant solution, if we should block
> CMD completion interrupts if we're not starting a new command or
> something like that, but the easiest thing seems to be to just reset
> the state before we start the next command.
>
> Opinions?
As far as I can tell there is no guarantee that the CMD interrupt
happens before you clear the variable. But maybe that is just a
theoretical problem.
I wonder how other OSes hadle this. I quickly looked at NetBSD this
morning but didn't immediately spot something that would fix this
issue in their version of the code.
> diff --git a/sys/dev/sdmmc/sdhc.c b/sys/dev/sdmmc/sdhc.c
> index eb0a8e5374b..637c99f485d 100644
> --- a/sys/dev/sdmmc/sdhc.c
> +++ b/sys/dev/sdmmc/sdhc.c
> @@ -1057,6 +1057,9 @@ sdhc_start_command(struct sdhc_host *hp, struct sdmmc_command *cmd)
> DPRINTF(1,("%s: cmd=%#x mode=%#x blksize=%d blkcount=%d\n",
> DEVNAME(hp->sc), command, mode, blksize, blkcount));
>
> + /* We're starting a new command, reset state. */
> + hp->intr_status = 0;
> +
> /*
> * Start a CPU data transfer. Writing to the high order byte
> * of the SDHC_COMMAND register triggers the SD command. (1.5)
>
>
sdhc(4): fix intr status race