From: Patrick Wildt Subject: Re: sdhc(4): fix intr status race To: Mark Kettenis Cc: tech@openbsd.org, dlg@openbsd.org Date: Mon, 5 Aug 2024 20:00:47 +0200 Am Mon, Aug 05, 2024 at 11:07:58AM +0200 schrieb Mark Kettenis: > > Date: Mon, 5 Aug 2024 00:31:01 +0200 > > From: Patrick Wildt > > > > Hi, > > > > on my RK3588 the eMMC only worked when running with GENERIC.MP, but with > > GENERIC or RAMDISK it failed to read command responses unless I compiled > > it with MULTIPROCESSOR, even if I only run it with one core. > > > > As it turns out, this led to a timing change in the command sequence: > > > > As one of the initialization steps, the eMMC is "enumerated". It is > > supposed to timeout/error if there's no second "card". That's normal. > > After enumeration, it tries to read more information, which is where > > it then failed. > > > > > sdhc_exec_command: start fresh command (ENUMERATION) > > > dwmshc0: interrupt status=8000 > > > dwmshc0: interrupt status=1 > > > sdhc_exec_command: start fresh command (READ) > > > dwmshc0: interrupt status=1 > > I wonder what other SDHC controllers do. Do they set ERROR and CMD > simultaneously? Or don't they set CMD at all? On the Armada 8040 I *only* see ERROR, no CMD. Sometimes I wonder if the CMD is caused by the host reset we execute. > > As can be seen, the error-ing command seems to throw two separate > > interrupts. > > > > This first interrupt will: > > * hp->intr_status |= ERROR; > > * wakeup() the tsleep-ing sdhc_wait_intr(), which will return the > > information that the command failed > > > > Before the next command is started, the second interrupt will: > > * hp->intr_status |= CMD_COMPLETE; > > > > When the next command is executed, CMD_COMPLETE remains set, hence > > sdhc_wait_intr() will assume it's already complete, even though it > > isn't. Hence invalid data is read. > > > > I'm not sure if there's a more elegant solution, if we should block > > CMD completion interrupts if we're not starting a new command or > > something like that, but the easiest thing seems to be to just reset > > the state before we start the next command. > > > > Opinions? > > As far as I can tell there is no guarantee that the CMD interrupt > happens before you clear the variable. But maybe that is just a > theoretical problem. On this machine I definitely get the IRQ immediately after the splx() in sdhc_wait_intr(). I would hope/expect that the host reset should ensure there are no other ones incoming later. > I wonder how other OSes hadle this. I quickly looked at NetBSD this > morning but didn't immediately spot something that would fix this > issue in their version of the code. > > > diff --git a/sys/dev/sdmmc/sdhc.c b/sys/dev/sdmmc/sdhc.c > > index eb0a8e5374b..637c99f485d 100644 > > --- a/sys/dev/sdmmc/sdhc.c > > +++ b/sys/dev/sdmmc/sdhc.c > > @@ -1057,6 +1057,9 @@ sdhc_start_command(struct sdhc_host *hp, struct sdmmc_command *cmd) > > DPRINTF(1,("%s: cmd=%#x mode=%#x blksize=%d blkcount=%d\n", > > DEVNAME(hp->sc), command, mode, blksize, blkcount)); > > > > + /* We're starting a new command, reset state. */ > > + hp->intr_status = 0; > > + > > /* > > * Start a CPU data transfer. Writing to the high order byte > > * of the SDHC_COMMAND register triggers the SD command. (1.5) > > > >