From: Mark Kettenis Subject: Re: sdhc(4): fix intr status race To: Patrick Wildt Cc: tech@openbsd.org, kettenis@openbsd.org, dlg@openbsd.org Date: Mon, 05 Aug 2024 11:07:58 +0200 > Date: Mon, 5 Aug 2024 00:31:01 +0200 > From: Patrick Wildt > > Hi, > > on my RK3588 the eMMC only worked when running with GENERIC.MP, but with > GENERIC or RAMDISK it failed to read command responses unless I compiled > it with MULTIPROCESSOR, even if I only run it with one core. > > As it turns out, this led to a timing change in the command sequence: > > As one of the initialization steps, the eMMC is "enumerated". It is > supposed to timeout/error if there's no second "card". That's normal. > After enumeration, it tries to read more information, which is where > it then failed. > > > sdhc_exec_command: start fresh command (ENUMERATION) > > dwmshc0: interrupt status=8000 > > dwmshc0: interrupt status=1 > > sdhc_exec_command: start fresh command (READ) > > dwmshc0: interrupt status=1 I wonder what other SDHC controllers do. Do they set ERROR and CMD simultaneously? Or don't they set CMD at all? > As can be seen, the error-ing command seems to throw two separate > interrupts. > > This first interrupt will: > * hp->intr_status |= ERROR; > * wakeup() the tsleep-ing sdhc_wait_intr(), which will return the > information that the command failed > > Before the next command is started, the second interrupt will: > * hp->intr_status |= CMD_COMPLETE; > > When the next command is executed, CMD_COMPLETE remains set, hence > sdhc_wait_intr() will assume it's already complete, even though it > isn't. Hence invalid data is read. > > I'm not sure if there's a more elegant solution, if we should block > CMD completion interrupts if we're not starting a new command or > something like that, but the easiest thing seems to be to just reset > the state before we start the next command. > > Opinions? As far as I can tell there is no guarantee that the CMD interrupt happens before you clear the variable. But maybe that is just a theoretical problem. I wonder how other OSes hadle this. I quickly looked at NetBSD this morning but didn't immediately spot something that would fix this issue in their version of the code. > diff --git a/sys/dev/sdmmc/sdhc.c b/sys/dev/sdmmc/sdhc.c > index eb0a8e5374b..637c99f485d 100644 > --- a/sys/dev/sdmmc/sdhc.c > +++ b/sys/dev/sdmmc/sdhc.c > @@ -1057,6 +1057,9 @@ sdhc_start_command(struct sdhc_host *hp, struct sdmmc_command *cmd) > DPRINTF(1,("%s: cmd=%#x mode=%#x blksize=%d blkcount=%d\n", > DEVNAME(hp->sc), command, mode, blksize, blkcount)); > > + /* We're starting a new command, reset state. */ > + hp->intr_status = 0; > + > /* > * Start a CPU data transfer. Writing to the high order byte > * of the SDHC_COMMAND register triggers the SD command. (1.5) > >