From: Vitaliy Makkoveev Subject: Re: sys/ihidev: prevent crash on interrupt storm To: "Kirill A. Korinsky" Cc: Martin Pieuchot , OpenBSD tech Date: Wed, 25 Dec 2024 01:26:30 +0300 > On 25 Dec 2024, at 00:09, Kirill A. Korinsky wrote: > > On Tue, 24 Dec 2024 10:40:08 +0100, > Vitaliy Makkoveev wrote: >> >>> On 24 Dec 2024, at 12:17, Martin Pieuchot wrote: >>> >>> On 23/12/24(Mon) 16:58, Kirill A. Korinsky wrote: >>>> Vitaliy, >>>> >>>> Thanks for review! >>>> >>>> On Mon, 23 Dec 2024 12:39:19 +0100, >>>> Vitaliy Makkoveev wrote: >>>>> >>>>>> On 23 Dec 2024, at 14:23, Vitaliy Makkoveev wrote: >>>>>> >>>>>>> On 23 Dec 2024, at 13:21, Kirill A. Korinsky wrote: >>>>>>> >>>>>>> tech@, >>>>>>> >>>>>>> Before I move forward with quirks, I'd like to fix the cause of a crash in a >>>>>>> system on HONOR MagicBook Art 14 Snapdragon. >>>>>>> >>>>>>> Let's assume that we have a device on a machine I2C bus which goes into a >>>>>>> so-called interrupt storm for some reason. Under this condition, the system >>>>>>> crashes if an interrupt arrives before we allocate a buffer to read it. >>>>>>> >>>>>>> Here's an example of a trace: >>>>>>> >>>>>>> panic: attempt to access user address 0x0 from EL1 >>>>>>> Stopped at panic+0x140: cmp w21, #0x0 >>>>>>> >>>>>>> TID PID UID PRFLAGS PFLAGS CPU COMMAND >>>>>>> db_enter() at panic+0x13c >>>>>>> panic() at kdata_abort+0x180 >>>>>>> do_el0_sync() at handle_el1h_sync+0x68 >>>>>>> handle_el1h_sync() at qciic_exec+0x2d4 >>>>>>> qciic_exec() at ihidev_intr+0x70 >>>>>>> ihidev_intr() at qcgpio_intr+0xac >>>>>>> qcgpio_intr() at agintc_irq_handler+0x2bc >>>>>>> >>>>>>> Tested on the same machine, without any additional patches. System boots, >>>>>>> but touchpad doesn't work. >>>>>>> >>>>>>> Ok? >>>>>>> >>>>>> >>>>>> >>>>>> This is not related to your diff, but since you are working in this area, >>>>>> please look at the following code. >>>>>> >>>>>> timeout_del_barrier(9) looks very suspicious to me. As I understand, the >>>>>> following code could be executed from interrupt context, so barrier can’t >>>>>> be used here. Sorry if I’m wrong and this is always thread context. >>>>>> >>>>> >>>>> This is not the thread context, even the timer is interrupt context :) >>>>> >>>> >>>> Well, ihidev_intr migth be called from different places, at least from: >>>> 1. a device interrup >>>> 2. a timeout for the poll >>>> 3. ikbd_cngetc >>>> >>>> I think moving place where iic_intr_establish is called is a wrong way. The >>>> right way is confirm inside ihidev_intr that devices is fully attached, and >>>> don't try to poll anything from not attached devices. >>>> >>>> Also, you're right regarding timeout_del_barrier. I have replaced them to >>>> just timeout_del because this code is called from different contexts. >>>> >>>> So, resulted diff which boot on HONOR MagicBook Art 14 Snapdragon: >>> >>> Instead of adding an `sc_attached' variable I'd suggest moving the block >>> that calls iic_intr_establish() down. >> >> According "HID over I2C Protocol Specification" and [1] we do device >> attachment incorrect. HID device should assert interrupt after reset >> command, and the reset command should precede to get report descriptor. >> >> So, we should install interrupt handler before reset command and wait >> interrupt for 10 seconds. In the interrupt handler we should handle >> reset condition separately, because buffer is not yet allocated. Current >> code doesn’t do that. >> >> Something like below: >> >> ihidev_attach() >> { >> sc->st = INITIALIZING; >> >> ihidev_hid_command(I2C_HID_CMD_DESCR); >> >> iic_intr_establish(ihidev_intr); >> >> ihidev_hid_command(I2C_HID_CMD_SET_POWER); >> >> sc->st = RESET_WAIT; >> ihidev_hid_command(I2C_HID_CMD_RESET); >> error = tsleep_nsec(sc, .., SEC_TO_NSEC(10)); >> if (error == EWOULDBLOCK) { >> printf("reset timeout\n"); >> goto err; >> } >> if(sc->st != RESET_DONE){ >> printf("reset failed\n"); >> goto err; >> } >> ihidev_hid_command(I2C_HID_REPORT_DESCR); >> sc->sc_ibuf = malloc(); >> >> err: >> return (error); >> } >> >> ihidev_intr() >> { >> switch(sc->st) { >> case CMD_RESET: { >> uint8_t buf[2]; >> iic_exec(.., I2C_OP_READ_WITH_STOP, buf, sizeof(buf), ..); >> if (buf[0] == 0x00 && buf[1] == 0x00) >> sc->st = RESET_DONE; >> else >> sc->st = RESET_FAILED; >> wakeup_one(sc); >> break; >> } >> case ..: >> } >> } >> >> > > Thanks for the suggestion. I tried to implement it but discovered that > tsleep inside attach sleeps for 1 microsecond. I tried using delay(9) > instead, to sleep for 10 seconds, but no interrupt arrived while it was > waiting. My guess that it waits until autoconfiguration is over, but I'm not > sure. > > Extracting part of the logic from attach to some function that is called > from intr and by the timer could be a solution, but this function uses > config_found_sm and it requires i2c_attach_args. All of that seems like the > wrong way. Am I mistaken? > My HID touchpad 04f3:3282 doesn’t throw interrupt reset. We had no reports before, so I suspect the most devices do the same. delay(9) is arch specific, IIRC it doesn’t interrupts on x86. > + sc->sc_st = I2C_HID_RESET_WAIT; > + > + /* device may report failure on RESET, but works */ > + if (ihidev_hid_command(sc, I2C_HID_CMD_RESET, 0)) { > + printf("%s: failed to reset\n", sc->sc_dev.dv_xname); > + goto error; > + } > + > + was = nsecuptime(); > + for (i = 0, err = 0; i < 10 && err == 0; i++) { > + err = tsleep_nsec(sc, PZERO, "ihidev_reset", SEC_TO_NSEC(1)); > + DPRINTF(("%s: %d sleep for %lld ns, res: %d\n", > + sc->sc_dev.dv_xname, i, nsecuptime() - was, err)); > + } > + > + if (err == EWOULDBLOCK) { > + printf("%s: reset timeout\n", sc->sc_dev.dv_xname); > + goto error; > + } > + > + if (sc->sc_st != I2C_HID_RESET_DONE) { > + printf("%s: unexpected reset response, state: 0x%x\n", > + sc->sc_dev.dv_xname, sc->sc_st); > + goto error; > + } At least you should check err in the for loop, otherwise it could be overwritten with the EWOULDBLOCK even in the non error case. Also sc_st check and sleep should be serialised with sc_st modification within interrupt handler, otherwise the wakeup could be lost. I propose you to try something like below: sc->sc_st = I2C_HID_RESET_WAIT; for (i = 0; i < 3; ++i) { ihidev_hid_command(sc, I2C_HID_CMD_RESET); delay(10 * 1000 * 1000); if (sc->sc_st != I2C_HID_RESET_WAIT) break; } if (sc->sc_st != I2C_HID_RESET_DONE) { if (sc->sc_st == I2C_HID_RESET_WAIT) printf("timeout"); else printf("other error"); goto error; } > But I might be totally wrong, so here’s the diff that I've used: > > Index: sys/dev/i2c/ihidev.c > =================================================================== > RCS file: /home/cvs/src/sys/dev/i2c/ihidev.c,v > diff -u -p -r1.33 ihidev.c > --- sys/dev/i2c/ihidev.c 18 Oct 2024 12:53:49 -0000 1.33 > +++ sys/dev/i2c/ihidev.c 24 Dec 2024 20:10:06 -0000 > @@ -30,7 +30,7 @@ > > #include > > -/* #define IHIDEV_DEBUG */ > +#define IHIDEV_DEBUG > > #ifdef IHIDEV_DEBUG > #define DPRINTF(x) printf x > @@ -57,6 +57,14 @@ enum { > I2C_HID_REPORT_DESCR = 0x100, > }; > > +enum { > + I2C_HID_INITIALIZING = 0x1, > + I2C_HID_RESET_WAIT = 0x2, > + I2C_HID_RESET_DONE = 0x3, > + I2C_HID_RESET_FAILED = 0x4, > + I2C_HID_BROKEN = 0x5, > +}; > + > static int I2C_HID_POWER_ON = 0x0; > static int I2C_HID_POWER_OFF = 0x1; > > @@ -105,8 +113,11 @@ ihidev_attach(struct device *parent, str > struct i2c_attach_args *ia = aux; > struct ihidev_attach_arg iha; > struct device *dev; > - int repid, repsz; > + int repid, repsz, err, i; > int repsizes[256]; > + uint64_t was; > + > + sc->sc_st = I2C_HID_INITIALIZING; > > sc->sc_tag = ia->ia_tag; > sc->sc_addr = ia->ia_addr; > @@ -115,7 +126,7 @@ ihidev_attach(struct device *parent, str > if (ihidev_hid_command(sc, I2C_HID_CMD_DESCR, NULL) || > ihidev_hid_desc_parse(sc)) { > printf(", failed fetching initial HID descriptor\n"); > - return; > + goto error; > } > > if (ia->ia_intr) { > @@ -137,9 +148,49 @@ ihidev_attach(struct device *parent, str > letoh16(sc->hid_desc.wVendorID), letoh16(sc->hid_desc.wProductID), > (char *)ia->ia_cookie); > > + if (ihidev_hid_command(sc, I2C_HID_CMD_SET_POWER, &I2C_HID_POWER_ON)) { > + printf("%s: failed to power on\n", sc->sc_dev.dv_xname); > + goto error; > + } > + > + sc->sc_st = I2C_HID_RESET_WAIT; > + > + /* device may report failure on RESET, but works */ > + if (ihidev_hid_command(sc, I2C_HID_CMD_RESET, 0)) { > + printf("%s: failed to reset\n", sc->sc_dev.dv_xname); > + goto error; > + } > + > + was = nsecuptime(); > + for (i = 0, err = 0; i < 10 && err == 0; i++) { > + err = tsleep_nsec(sc, PZERO, "ihidev_reset", SEC_TO_NSEC(1)); > + DPRINTF(("%s: %d sleep for %lld ns, res: %d\n", > + sc->sc_dev.dv_xname, i, nsecuptime() - was, err)); > + } > + > + if (err == EWOULDBLOCK) { > + printf("%s: reset timeout\n", sc->sc_dev.dv_xname); > + goto error; > + } > + > + if (sc->sc_st != I2C_HID_RESET_DONE) { > + printf("%s: unexpected reset response, state: 0x%x\n", > + sc->sc_dev.dv_xname, sc->sc_st); > + goto error; > + } > + > + if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) { > + printf("%s: failed fetching HID report\n", > + sc->sc_dev.dv_xname); > + goto error; > + } > + > sc->sc_nrepid = ihidev_maxrepid(sc->sc_report, sc->sc_reportlen); > - if (sc->sc_nrepid < 0) > - return; > + if (sc->sc_nrepid < 0) { > + printf("%s: invalid report ID\n", > + sc->sc_dev.dv_xname); > + goto error; > + } > > printf("%s: %d report id%s\n", sc->sc_dev.dv_xname, sc->sc_nrepid, > sc->sc_nrepid > 1 ? "s" : ""); > @@ -203,6 +254,12 @@ ihidev_attach(struct device *parent, str > printf("%s: failed to power down\n", sc->sc_dev.dv_xname); > return; > } > + > + return; > + > +error: > + sc->sc_st = I2C_HID_BROKEN; > + printf("%s: marked as broken\n", sc->sc_dev.dv_xname); > } > > int > @@ -570,8 +627,6 @@ ihidev_reset(struct ihidev_softc *sc) > int > ihidev_hid_desc_parse(struct ihidev_softc *sc) > { > - int retries = 3; > - > /* must be v01.00 */ > if (letoh16(sc->hid_desc.bcdVersion) != 0x0100) { > printf("%s: bad HID descriptor bcdVersion (0x%x)\n", > @@ -597,26 +652,9 @@ ihidev_hid_desc_parse(struct ihidev_soft > return (1); > } > > - while (retries-- > 0) { > - if (ihidev_reset(sc)) { > - if (retries == 0) > - return(1); > - > - ihidev_sleep(sc, 10); > - } > - else > - break; > - } > - > sc->sc_reportlen = letoh16(sc->hid_desc.wReportDescLength); > sc->sc_report = malloc(sc->sc_reportlen, M_DEVBUF, M_WAITOK | M_ZERO); > > - if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) { > - printf("%s: failed fetching HID report\n", > - sc->sc_dev.dv_xname); > - return (1); > - } > - > return (0); > } > > @@ -638,6 +676,43 @@ ihidev_intr(void *arg) > int psize, res, i, fast = 0; > u_char *p; > u_int rep = 0; > + > + switch (sc->sc_st) { > + case I2C_HID_RESET_WAIT: { > + uint8_t buf[2]; > + > + iic_acquire_bus(sc->sc_tag, I2C_F_POLL); > + res = iic_exec(sc->sc_tag, I2C_OP_READ_WITH_STOP, > + sc->sc_addr, NULL, 0, > + buf, sizeof(buf), I2C_F_POLL); > + iic_release_bus(sc->sc_tag, I2C_F_POLL); > + > + DPRINTF(("%s: read at reset: 0x%x, 0x%x\n", > + sc->sc_dev.dv_xname, buf[0], buf[1])); > + > + if (buf[0] == 0x00 && buf[1] == 0x00) > + sc->sc_st = I2C_HID_RESET_DONE; > + else > + sc->sc_st = I2C_HID_RESET_FAILED; > + > + wakeup_one(sc); > + > + return (0); > + } > + > + case I2C_HID_RESET_DONE: > + break; > + > + case I2C_HID_INITIALIZING: > + case I2C_HID_RESET_FAILED: > + return (1); > + > + default: > + printf("%s: unexpected state 0x%x, marked as failed\n", > + sc->sc_dev.dv_xname, sc->sc_st); > + sc->sc_st = I2C_HID_RESET_FAILED; > + return (1); > + } > > if (sc->sc_poll && !sc->sc_frompoll) { > DPRINTF(("%s: received interrupt while polling, disabling " > Index: sys/dev/i2c/ihidev.h > =================================================================== > RCS file: /home/cvs/src/sys/dev/i2c/ihidev.h,v > diff -u -p -r1.9 ihidev.h > --- sys/dev/i2c/ihidev.h 3 Sep 2022 15:48:16 -0000 1.9 > +++ sys/dev/i2c/ihidev.h 24 Dec 2024 20:10:06 -0000 > @@ -77,6 +77,8 @@ struct ihidev_softc { > struct i2c_hid_desc hid_desc; > }; > > + int sc_st; > + > uint8_t *sc_report; > int sc_reportlen; > > > > -- > wbr, Kirill