Download raw body.
sys/ihidev: prevent crash on interrupt storm
> On 25 Dec 2024, at 00:09, Kirill A. Korinsky <kirill@korins.ky> wrote:
>
> On Tue, 24 Dec 2024 10:40:08 +0100,
> Vitaliy Makkoveev <otto@bsdbox.dev> wrote:
>>
>>> On 24 Dec 2024, at 12:17, Martin Pieuchot <mpi@grenadille.net> wrote:
>>>
>>> On 23/12/24(Mon) 16:58, Kirill A. Korinsky wrote:
>>>> Vitaliy,
>>>>
>>>> Thanks for review!
>>>>
>>>> On Mon, 23 Dec 2024 12:39:19 +0100,
>>>> Vitaliy Makkoveev <otto@bsdbox.dev> wrote:
>>>>>
>>>>>> On 23 Dec 2024, at 14:23, Vitaliy Makkoveev <otto@bsdbox.dev> wrote:
>>>>>>
>>>>>>> On 23 Dec 2024, at 13:21, Kirill A. Korinsky <kirill@korins.ky> wrote:
>>>>>>>
>>>>>>> tech@,
>>>>>>>
>>>>>>> Before I move forward with quirks, I'd like to fix the cause of a crash in a
>>>>>>> system on HONOR MagicBook Art 14 Snapdragon.
>>>>>>>
>>>>>>> Let's assume that we have a device on a machine I2C bus which goes into a
>>>>>>> so-called interrupt storm for some reason. Under this condition, the system
>>>>>>> crashes if an interrupt arrives before we allocate a buffer to read it.
>>>>>>>
>>>>>>> Here's an example of a trace:
>>>>>>>
>>>>>>> panic: attempt to access user address 0x0 from EL1
>>>>>>> Stopped at panic+0x140: cmp w21, #0x0
>>>>>>>
>>>>>>> TID PID UID PRFLAGS PFLAGS CPU COMMAND
>>>>>>> db_enter() at panic+0x13c
>>>>>>> panic() at kdata_abort+0x180
>>>>>>> do_el0_sync() at handle_el1h_sync+0x68
>>>>>>> handle_el1h_sync() at qciic_exec+0x2d4
>>>>>>> qciic_exec() at ihidev_intr+0x70
>>>>>>> ihidev_intr() at qcgpio_intr+0xac
>>>>>>> qcgpio_intr() at agintc_irq_handler+0x2bc
>>>>>>>
>>>>>>> Tested on the same machine, without any additional patches. System boots,
>>>>>>> but touchpad doesn't work.
>>>>>>>
>>>>>>> Ok?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> This is not related to your diff, but since you are working in this area,
>>>>>> please look at the following code.
>>>>>>
>>>>>> timeout_del_barrier(9) looks very suspicious to me. As I understand, the
>>>>>> following code could be executed from interrupt context, so barrier can’t
>>>>>> be used here. Sorry if I’m wrong and this is always thread context.
>>>>>>
>>>>>
>>>>> This is not the thread context, even the timer is interrupt context :)
>>>>>
>>>>
>>>> Well, ihidev_intr migth be called from different places, at least from:
>>>> 1. a device interrup
>>>> 2. a timeout for the poll
>>>> 3. ikbd_cngetc
>>>>
>>>> I think moving place where iic_intr_establish is called is a wrong way. The
>>>> right way is confirm inside ihidev_intr that devices is fully attached, and
>>>> don't try to poll anything from not attached devices.
>>>>
>>>> Also, you're right regarding timeout_del_barrier. I have replaced them to
>>>> just timeout_del because this code is called from different contexts.
>>>>
>>>> So, resulted diff which boot on HONOR MagicBook Art 14 Snapdragon:
>>>
>>> Instead of adding an `sc_attached' variable I'd suggest moving the block
>>> that calls iic_intr_establish() down.
>>
>> According "HID over I2C Protocol Specification" and [1] we do device
>> attachment incorrect. HID device should assert interrupt after reset
>> command, and the reset command should precede to get report descriptor.
>>
>> So, we should install interrupt handler before reset command and wait
>> interrupt for 10 seconds. In the interrupt handler we should handle
>> reset condition separately, because buffer is not yet allocated. Current
>> code doesn’t do that.
>>
>> Something like below:
>>
>> ihidev_attach()
>> {
>> sc->st = INITIALIZING;
>>
>> ihidev_hid_command(I2C_HID_CMD_DESCR);
>>
>> iic_intr_establish(ihidev_intr);
>>
>> ihidev_hid_command(I2C_HID_CMD_SET_POWER);
>>
>> sc->st = RESET_WAIT;
>> ihidev_hid_command(I2C_HID_CMD_RESET);
>> error = tsleep_nsec(sc, .., SEC_TO_NSEC(10));
>> if (error == EWOULDBLOCK) {
>> printf("reset timeout\n");
>> goto err;
>> }
>> if(sc->st != RESET_DONE){
>> printf("reset failed\n");
>> goto err;
>> }
>> ihidev_hid_command(I2C_HID_REPORT_DESCR);
>> sc->sc_ibuf = malloc();
>>
>> err:
>> return (error);
>> }
>>
>> ihidev_intr()
>> {
>> switch(sc->st) {
>> case CMD_RESET: {
>> uint8_t buf[2];
>> iic_exec(.., I2C_OP_READ_WITH_STOP, buf, sizeof(buf), ..);
>> if (buf[0] == 0x00 && buf[1] == 0x00)
>> sc->st = RESET_DONE;
>> else
>> sc->st = RESET_FAILED;
>> wakeup_one(sc);
>> break;
>> }
>> case ..:
>> }
>> }
>>
>>
>
> Thanks for the suggestion. I tried to implement it but discovered that
> tsleep inside attach sleeps for 1 microsecond. I tried using delay(9)
> instead, to sleep for 10 seconds, but no interrupt arrived while it was
> waiting. My guess that it waits until autoconfiguration is over, but I'm not
> sure.
>
> Extracting part of the logic from attach to some function that is called
> from intr and by the timer could be a solution, but this function uses
> config_found_sm and it requires i2c_attach_args. All of that seems like the
> wrong way. Am I mistaken?
>
My HID touchpad 04f3:3282 doesn’t throw interrupt reset. We had no
reports before, so I suspect the most devices do the same.
delay(9) is arch specific, IIRC it doesn’t interrupts on x86.
> + sc->sc_st = I2C_HID_RESET_WAIT;
> +
> + /* device may report failure on RESET, but works */
> + if (ihidev_hid_command(sc, I2C_HID_CMD_RESET, 0)) {
> + printf("%s: failed to reset\n", sc->sc_dev.dv_xname);
> + goto error;
> + }
> +
> + was = nsecuptime();
> + for (i = 0, err = 0; i < 10 && err == 0; i++) {
> + err = tsleep_nsec(sc, PZERO, "ihidev_reset", SEC_TO_NSEC(1));
> + DPRINTF(("%s: %d sleep for %lld ns, res: %d\n",
> + sc->sc_dev.dv_xname, i, nsecuptime() - was, err));
> + }
> +
> + if (err == EWOULDBLOCK) {
> + printf("%s: reset timeout\n", sc->sc_dev.dv_xname);
> + goto error;
> + }
> +
> + if (sc->sc_st != I2C_HID_RESET_DONE) {
> + printf("%s: unexpected reset response, state: 0x%x\n",
> + sc->sc_dev.dv_xname, sc->sc_st);
> + goto error;
> + }
At least you should check err in the for loop, otherwise it could
be overwritten with the EWOULDBLOCK even in the non error case. Also
sc_st check and sleep should be serialised with sc_st modification
within interrupt handler, otherwise the wakeup could be lost.
I propose you to try something like below:
sc->sc_st = I2C_HID_RESET_WAIT;
for (i = 0; i < 3; ++i) {
ihidev_hid_command(sc, I2C_HID_CMD_RESET);
delay(10 * 1000 * 1000);
if (sc->sc_st != I2C_HID_RESET_WAIT)
break;
}
if (sc->sc_st != I2C_HID_RESET_DONE) {
if (sc->sc_st == I2C_HID_RESET_WAIT)
printf("timeout");
else
printf("other error");
goto error;
}
> But I might be totally wrong, so here’s the diff that I've used:
>
> Index: sys/dev/i2c/ihidev.c
> ===================================================================
> RCS file: /home/cvs/src/sys/dev/i2c/ihidev.c,v
> diff -u -p -r1.33 ihidev.c
> --- sys/dev/i2c/ihidev.c 18 Oct 2024 12:53:49 -0000 1.33
> +++ sys/dev/i2c/ihidev.c 24 Dec 2024 20:10:06 -0000
> @@ -30,7 +30,7 @@
>
> #include <dev/hid/hid.h>
>
> -/* #define IHIDEV_DEBUG */
> +#define IHIDEV_DEBUG
>
> #ifdef IHIDEV_DEBUG
> #define DPRINTF(x) printf x
> @@ -57,6 +57,14 @@ enum {
> I2C_HID_REPORT_DESCR = 0x100,
> };
>
> +enum {
> + I2C_HID_INITIALIZING = 0x1,
> + I2C_HID_RESET_WAIT = 0x2,
> + I2C_HID_RESET_DONE = 0x3,
> + I2C_HID_RESET_FAILED = 0x4,
> + I2C_HID_BROKEN = 0x5,
> +};
> +
> static int I2C_HID_POWER_ON = 0x0;
> static int I2C_HID_POWER_OFF = 0x1;
>
> @@ -105,8 +113,11 @@ ihidev_attach(struct device *parent, str
> struct i2c_attach_args *ia = aux;
> struct ihidev_attach_arg iha;
> struct device *dev;
> - int repid, repsz;
> + int repid, repsz, err, i;
> int repsizes[256];
> + uint64_t was;
> +
> + sc->sc_st = I2C_HID_INITIALIZING;
>
> sc->sc_tag = ia->ia_tag;
> sc->sc_addr = ia->ia_addr;
> @@ -115,7 +126,7 @@ ihidev_attach(struct device *parent, str
> if (ihidev_hid_command(sc, I2C_HID_CMD_DESCR, NULL) ||
> ihidev_hid_desc_parse(sc)) {
> printf(", failed fetching initial HID descriptor\n");
> - return;
> + goto error;
> }
>
> if (ia->ia_intr) {
> @@ -137,9 +148,49 @@ ihidev_attach(struct device *parent, str
> letoh16(sc->hid_desc.wVendorID), letoh16(sc->hid_desc.wProductID),
> (char *)ia->ia_cookie);
>
> + if (ihidev_hid_command(sc, I2C_HID_CMD_SET_POWER, &I2C_HID_POWER_ON)) {
> + printf("%s: failed to power on\n", sc->sc_dev.dv_xname);
> + goto error;
> + }
> +
> + sc->sc_st = I2C_HID_RESET_WAIT;
> +
> + /* device may report failure on RESET, but works */
> + if (ihidev_hid_command(sc, I2C_HID_CMD_RESET, 0)) {
> + printf("%s: failed to reset\n", sc->sc_dev.dv_xname);
> + goto error;
> + }
> +
> + was = nsecuptime();
> + for (i = 0, err = 0; i < 10 && err == 0; i++) {
> + err = tsleep_nsec(sc, PZERO, "ihidev_reset", SEC_TO_NSEC(1));
> + DPRINTF(("%s: %d sleep for %lld ns, res: %d\n",
> + sc->sc_dev.dv_xname, i, nsecuptime() - was, err));
> + }
> +
> + if (err == EWOULDBLOCK) {
> + printf("%s: reset timeout\n", sc->sc_dev.dv_xname);
> + goto error;
> + }
> +
> + if (sc->sc_st != I2C_HID_RESET_DONE) {
> + printf("%s: unexpected reset response, state: 0x%x\n",
> + sc->sc_dev.dv_xname, sc->sc_st);
> + goto error;
> + }
> +
> + if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) {
> + printf("%s: failed fetching HID report\n",
> + sc->sc_dev.dv_xname);
> + goto error;
> + }
> +
> sc->sc_nrepid = ihidev_maxrepid(sc->sc_report, sc->sc_reportlen);
> - if (sc->sc_nrepid < 0)
> - return;
> + if (sc->sc_nrepid < 0) {
> + printf("%s: invalid report ID\n",
> + sc->sc_dev.dv_xname);
> + goto error;
> + }
>
> printf("%s: %d report id%s\n", sc->sc_dev.dv_xname, sc->sc_nrepid,
> sc->sc_nrepid > 1 ? "s" : "");
> @@ -203,6 +254,12 @@ ihidev_attach(struct device *parent, str
> printf("%s: failed to power down\n", sc->sc_dev.dv_xname);
> return;
> }
> +
> + return;
> +
> +error:
> + sc->sc_st = I2C_HID_BROKEN;
> + printf("%s: marked as broken\n", sc->sc_dev.dv_xname);
> }
>
> int
> @@ -570,8 +627,6 @@ ihidev_reset(struct ihidev_softc *sc)
> int
> ihidev_hid_desc_parse(struct ihidev_softc *sc)
> {
> - int retries = 3;
> -
> /* must be v01.00 */
> if (letoh16(sc->hid_desc.bcdVersion) != 0x0100) {
> printf("%s: bad HID descriptor bcdVersion (0x%x)\n",
> @@ -597,26 +652,9 @@ ihidev_hid_desc_parse(struct ihidev_soft
> return (1);
> }
>
> - while (retries-- > 0) {
> - if (ihidev_reset(sc)) {
> - if (retries == 0)
> - return(1);
> -
> - ihidev_sleep(sc, 10);
> - }
> - else
> - break;
> - }
> -
> sc->sc_reportlen = letoh16(sc->hid_desc.wReportDescLength);
> sc->sc_report = malloc(sc->sc_reportlen, M_DEVBUF, M_WAITOK | M_ZERO);
>
> - if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) {
> - printf("%s: failed fetching HID report\n",
> - sc->sc_dev.dv_xname);
> - return (1);
> - }
> -
> return (0);
> }
>
> @@ -638,6 +676,43 @@ ihidev_intr(void *arg)
> int psize, res, i, fast = 0;
> u_char *p;
> u_int rep = 0;
> +
> + switch (sc->sc_st) {
> + case I2C_HID_RESET_WAIT: {
> + uint8_t buf[2];
> +
> + iic_acquire_bus(sc->sc_tag, I2C_F_POLL);
> + res = iic_exec(sc->sc_tag, I2C_OP_READ_WITH_STOP,
> + sc->sc_addr, NULL, 0,
> + buf, sizeof(buf), I2C_F_POLL);
> + iic_release_bus(sc->sc_tag, I2C_F_POLL);
> +
> + DPRINTF(("%s: read at reset: 0x%x, 0x%x\n",
> + sc->sc_dev.dv_xname, buf[0], buf[1]));
> +
> + if (buf[0] == 0x00 && buf[1] == 0x00)
> + sc->sc_st = I2C_HID_RESET_DONE;
> + else
> + sc->sc_st = I2C_HID_RESET_FAILED;
> +
> + wakeup_one(sc);
> +
> + return (0);
> + }
> +
> + case I2C_HID_RESET_DONE:
> + break;
> +
> + case I2C_HID_INITIALIZING:
> + case I2C_HID_RESET_FAILED:
> + return (1);
> +
> + default:
> + printf("%s: unexpected state 0x%x, marked as failed\n",
> + sc->sc_dev.dv_xname, sc->sc_st);
> + sc->sc_st = I2C_HID_RESET_FAILED;
> + return (1);
> + }
>
> if (sc->sc_poll && !sc->sc_frompoll) {
> DPRINTF(("%s: received interrupt while polling, disabling "
> Index: sys/dev/i2c/ihidev.h
> ===================================================================
> RCS file: /home/cvs/src/sys/dev/i2c/ihidev.h,v
> diff -u -p -r1.9 ihidev.h
> --- sys/dev/i2c/ihidev.h 3 Sep 2022 15:48:16 -0000 1.9
> +++ sys/dev/i2c/ihidev.h 24 Dec 2024 20:10:06 -0000
> @@ -77,6 +77,8 @@ struct ihidev_softc {
> struct i2c_hid_desc hid_desc;
> };
>
> + int sc_st;
> +
> uint8_t *sc_report;
> int sc_reportlen;
>
>
>
> --
> wbr, Kirill
sys/ihidev: prevent crash on interrupt storm