Download raw body.
sdhc(4): fix intr status race
Am Mon, Aug 05, 2024 at 11:07:58AM +0200 schrieb Mark Kettenis:
> > Date: Mon, 5 Aug 2024 00:31:01 +0200
> > From: Patrick Wildt <patrick@blueri.se>
> >
> > Hi,
> >
> > on my RK3588 the eMMC only worked when running with GENERIC.MP, but with
> > GENERIC or RAMDISK it failed to read command responses unless I compiled
> > it with MULTIPROCESSOR, even if I only run it with one core.
> >
> > As it turns out, this led to a timing change in the command sequence:
> >
> > As one of the initialization steps, the eMMC is "enumerated". It is
> > supposed to timeout/error if there's no second "card". That's normal.
> > After enumeration, it tries to read more information, which is where
> > it then failed.
> >
> > > sdhc_exec_command: start fresh command (ENUMERATION)
> > > dwmshc0: interrupt status=8000<ERROR>
> > > dwmshc0: interrupt status=1<CMD>
> > > sdhc_exec_command: start fresh command (READ)
> > > dwmshc0: interrupt status=1<CMD>
>
> I wonder what other SDHC controllers do. Do they set ERROR and CMD
> simultaneously? Or don't they set CMD at all?
On the Armada 8040 I *only* see ERROR, no CMD. Sometimes I wonder if
the CMD is caused by the host reset we execute.
> > As can be seen, the error-ing command seems to throw two separate
> > interrupts.
> >
> > This first interrupt will:
> > * hp->intr_status |= ERROR;
> > * wakeup() the tsleep-ing sdhc_wait_intr(), which will return the
> > information that the command failed
> >
> > Before the next command is started, the second interrupt will:
> > * hp->intr_status |= CMD_COMPLETE;
> >
> > When the next command is executed, CMD_COMPLETE remains set, hence
> > sdhc_wait_intr() will assume it's already complete, even though it
> > isn't. Hence invalid data is read.
> >
> > I'm not sure if there's a more elegant solution, if we should block
> > CMD completion interrupts if we're not starting a new command or
> > something like that, but the easiest thing seems to be to just reset
> > the state before we start the next command.
> >
> > Opinions?
>
> As far as I can tell there is no guarantee that the CMD interrupt
> happens before you clear the variable. But maybe that is just a
> theoretical problem.
On this machine I definitely get the IRQ immediately after the splx()
in sdhc_wait_intr(). I would hope/expect that the host reset should
ensure there are no other ones incoming later.
> I wonder how other OSes hadle this. I quickly looked at NetBSD this
> morning but didn't immediately spot something that would fix this
> issue in their version of the code.
>
> > diff --git a/sys/dev/sdmmc/sdhc.c b/sys/dev/sdmmc/sdhc.c
> > index eb0a8e5374b..637c99f485d 100644
> > --- a/sys/dev/sdmmc/sdhc.c
> > +++ b/sys/dev/sdmmc/sdhc.c
> > @@ -1057,6 +1057,9 @@ sdhc_start_command(struct sdhc_host *hp, struct sdmmc_command *cmd)
> > DPRINTF(1,("%s: cmd=%#x mode=%#x blksize=%d blkcount=%d\n",
> > DEVNAME(hp->sc), command, mode, blksize, blkcount));
> >
> > + /* We're starting a new command, reset state. */
> > + hp->intr_status = 0;
> > +
> > /*
> > * Start a CPU data transfer. Writing to the high order byte
> > * of the SDHC_COMMAND register triggers the SD command. (1.5)
> >
> >
sdhc(4): fix intr status race