Index | Thread | Search

From:
"Pavel Knoblokh" <knoblokh@gmail.com>
Subject:
Re: ahci: fix interrupt issues
To:
"Jan Klemkow" <jan@openbsd.org>
Cc:
<tech@openbsd.org>, <rob.schmersel@bahnhof.se>
Date:
Fri, 13 Dec 2024 21:22:59 +1000

Download raw body.

Thread
Thanks for the fix! If only this patch had come out a month earlier...

I tried to debug this and even figured out that the interrupt never happened, but could not find why. I set the timeout to a low value to be able to do anything with the disks, but gave up after a few months of intermittent attempts to fix it. It was still good learning.

I think the same problem was reported in [1], and I mentioned two affected controllers at the end of [2]. Unfortunately, I can't test them anymore as I had to install FreeBSD on that machine to get things going. I can still drop off an ASMedia ASM1166 M.2 adapter [3] somewhere around Brisbane as I know some devs live nearby. Please DM if interested in giving it a shot.

Also, since you are around, would it be possible to get the patch from [2] through, or explain why not? Any feedback would be really appreciated. Thanks.

[1] https://marc.info/?l=openbsd-misc&m=172207535832024&w=2
[2] https://marc.info/?l=openbsd-tech&m=172266168914808&w=2
[3] https://www.aliexpress.com/item/1005005669800665.html

Regards,
Pavel Knoblokh

On Fri Nov 22, 2024 at 3:20 AM AEST, Jan Klemkow wrote:
> Hi,
>
> Attaching ahci(4) with newer AMD 600 Series AHCI controler hangs.  This is
> caused via atascsi_attach() which creates residual bits in the interrupt status
> register.  atascsi_attach() runs in polling mode, but sets the interrupt status
> bits anyway.  When these bits are already set, we never get an interrupt and
> run in a 60 second timeout for every ATA command send from sd(4) layer.  Which
> causes a kernel boot to last over an hour.
>
> The SATA AHCI 1.3.1 specification also saids:
>
> 	10.1.2 System Software Specific Initialization
>
> 	[...] system software must always ensure that the PxIS (clear
> 	this first) and IS.IPS (clear this second) registers are cleared to ‘0’
> 	before programming the PxIE and GHC.IE registers.  This will prevent any
> 	residual bits set in these registers from causing an interrupt to be
> 	asserted.
>
> The diff below simply clears the interupt status register, which fixes this
> issue.
>
> Tested with:
> ahci0 at pci12 dev 0 function 0 "AMD 600 Series AHCI" rev 0x01: msi, AHCI 1.3.1
>
> ok?
>
> bye,
> Jan
>
> Index: dev/ic/ahci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/ic/ahci.c,v
> diff -u -p -r1.42 ahci.c
> --- dev/ic/ahci.c	4 Sep 2024 07:54:52 -0000	1.42
> +++ dev/ic/ahci.c	21 Nov 2024 16:39:16 -0000
> @@ -321,6 +321,9 @@ noccc:
>  
>  	sc->sc_atascsi = atascsi_attach(&sc->sc_dev, &aaa);
>  
> +	/* Flush all residual bits of the interrupt status register */
> +	ahci_write(sc, AHCI_REG_IS, ahci_read(sc, AHCI_REG_IS));
> +
>  	/* Enable interrupts */
>  	ahci_write(sc, AHCI_REG_GHC, AHCI_REG_GHC_AE | AHCI_REG_GHC_IE);
>