From: Kirill A. Korinsky Subject: Re: sys/ihidev: prevent crash on interrupt storm To: Vitaliy Makkoveev Cc: Martin Pieuchot , OpenBSD tech Date: Tue, 24 Dec 2024 22:09:39 +0100 On Tue, 24 Dec 2024 10:40:08 +0100, Vitaliy Makkoveev wrote: > > > On 24 Dec 2024, at 12:17, Martin Pieuchot wrote: > > > > On 23/12/24(Mon) 16:58, Kirill A. Korinsky wrote: > >> Vitaliy, > >> > >> Thanks for review! > >> > >> On Mon, 23 Dec 2024 12:39:19 +0100, > >> Vitaliy Makkoveev wrote: > >>> > >>>> On 23 Dec 2024, at 14:23, Vitaliy Makkoveev wrote: > >>>> > >>>>> On 23 Dec 2024, at 13:21, Kirill A. Korinsky wrote: > >>>>> > >>>>> tech@, > >>>>> > >>>>> Before I move forward with quirks, I'd like to fix the cause of a crash in a > >>>>> system on HONOR MagicBook Art 14 Snapdragon. > >>>>> > >>>>> Let's assume that we have a device on a machine I2C bus which goes into a > >>>>> so-called interrupt storm for some reason. Under this condition, the system > >>>>> crashes if an interrupt arrives before we allocate a buffer to read it. > >>>>> > >>>>> Here's an example of a trace: > >>>>> > >>>>> panic: attempt to access user address 0x0 from EL1 > >>>>> Stopped at panic+0x140: cmp w21, #0x0 > >>>>> > >>>>> TID PID UID PRFLAGS PFLAGS CPU COMMAND > >>>>> db_enter() at panic+0x13c > >>>>> panic() at kdata_abort+0x180 > >>>>> do_el0_sync() at handle_el1h_sync+0x68 > >>>>> handle_el1h_sync() at qciic_exec+0x2d4 > >>>>> qciic_exec() at ihidev_intr+0x70 > >>>>> ihidev_intr() at qcgpio_intr+0xac > >>>>> qcgpio_intr() at agintc_irq_handler+0x2bc > >>>>> > >>>>> Tested on the same machine, without any additional patches. System boots, > >>>>> but touchpad doesn't work. > >>>>> > >>>>> Ok? > >>>>> > >>>> > >>>> > >>>> This is not related to your diff, but since you are working in this area, > >>>> please look at the following code. > >>>> > >>>> timeout_del_barrier(9) looks very suspicious to me. As I understand, the > >>>> following code could be executed from interrupt context, so barrier can’t > >>>> be used here. Sorry if I’m wrong and this is always thread context. > >>>> > >>> > >>> This is not the thread context, even the timer is interrupt context :) > >>> > >> > >> Well, ihidev_intr migth be called from different places, at least from: > >> 1. a device interrup > >> 2. a timeout for the poll > >> 3. ikbd_cngetc > >> > >> I think moving place where iic_intr_establish is called is a wrong way. The > >> right way is confirm inside ihidev_intr that devices is fully attached, and > >> don't try to poll anything from not attached devices. > >> > >> Also, you're right regarding timeout_del_barrier. I have replaced them to > >> just timeout_del because this code is called from different contexts. > >> > >> So, resulted diff which boot on HONOR MagicBook Art 14 Snapdragon: > > > > Instead of adding an `sc_attached' variable I'd suggest moving the block > > that calls iic_intr_establish() down. > > According "HID over I2C Protocol Specification" and [1] we do device > attachment incorrect. HID device should assert interrupt after reset > command, and the reset command should precede to get report descriptor. > > So, we should install interrupt handler before reset command and wait > interrupt for 10 seconds. In the interrupt handler we should handle > reset condition separately, because buffer is not yet allocated. Current > code doesn’t do that. > > Something like below: > > ihidev_attach() > { > sc->st = INITIALIZING; > > ihidev_hid_command(I2C_HID_CMD_DESCR); > > iic_intr_establish(ihidev_intr); > > ihidev_hid_command(I2C_HID_CMD_SET_POWER); > > sc->st = RESET_WAIT; > ihidev_hid_command(I2C_HID_CMD_RESET); > error = tsleep_nsec(sc, .., SEC_TO_NSEC(10)); > if (error == EWOULDBLOCK) { > printf("reset timeout\n"); > goto err; > } > if(sc->st != RESET_DONE){ > printf("reset failed\n"); > goto err; > } > ihidev_hid_command(I2C_HID_REPORT_DESCR); > sc->sc_ibuf = malloc(); > > err: > return (error); > } > > ihidev_intr() > { > switch(sc->st) { > case CMD_RESET: { > uint8_t buf[2]; > iic_exec(.., I2C_OP_READ_WITH_STOP, buf, sizeof(buf), ..); > if (buf[0] == 0x00 && buf[1] == 0x00) > sc->st = RESET_DONE; > else > sc->st = RESET_FAILED; > wakeup_one(sc); > break; > } > case ..: > } > } > > Thanks for the suggestion. I tried to implement it but discovered that tsleep inside attach sleeps for 1 microsecond. I tried using delay(9) instead, to sleep for 10 seconds, but no interrupt arrived while it was waiting. My guess that it waits until autoconfiguration is over, but I'm not sure. Extracting part of the logic from attach to some function that is called from intr and by the timer could be a solution, but this function uses config_found_sm and it requires i2c_attach_args. All of that seems like the wrong way. Am I mistaken? But I might be totally wrong, so here’s the diff that I've used: Index: sys/dev/i2c/ihidev.c =================================================================== RCS file: /home/cvs/src/sys/dev/i2c/ihidev.c,v diff -u -p -r1.33 ihidev.c --- sys/dev/i2c/ihidev.c 18 Oct 2024 12:53:49 -0000 1.33 +++ sys/dev/i2c/ihidev.c 24 Dec 2024 20:10:06 -0000 @@ -30,7 +30,7 @@ #include -/* #define IHIDEV_DEBUG */ +#define IHIDEV_DEBUG #ifdef IHIDEV_DEBUG #define DPRINTF(x) printf x @@ -57,6 +57,14 @@ enum { I2C_HID_REPORT_DESCR = 0x100, }; +enum { + I2C_HID_INITIALIZING = 0x1, + I2C_HID_RESET_WAIT = 0x2, + I2C_HID_RESET_DONE = 0x3, + I2C_HID_RESET_FAILED = 0x4, + I2C_HID_BROKEN = 0x5, +}; + static int I2C_HID_POWER_ON = 0x0; static int I2C_HID_POWER_OFF = 0x1; @@ -105,8 +113,11 @@ ihidev_attach(struct device *parent, str struct i2c_attach_args *ia = aux; struct ihidev_attach_arg iha; struct device *dev; - int repid, repsz; + int repid, repsz, err, i; int repsizes[256]; + uint64_t was; + + sc->sc_st = I2C_HID_INITIALIZING; sc->sc_tag = ia->ia_tag; sc->sc_addr = ia->ia_addr; @@ -115,7 +126,7 @@ ihidev_attach(struct device *parent, str if (ihidev_hid_command(sc, I2C_HID_CMD_DESCR, NULL) || ihidev_hid_desc_parse(sc)) { printf(", failed fetching initial HID descriptor\n"); - return; + goto error; } if (ia->ia_intr) { @@ -137,9 +148,49 @@ ihidev_attach(struct device *parent, str letoh16(sc->hid_desc.wVendorID), letoh16(sc->hid_desc.wProductID), (char *)ia->ia_cookie); + if (ihidev_hid_command(sc, I2C_HID_CMD_SET_POWER, &I2C_HID_POWER_ON)) { + printf("%s: failed to power on\n", sc->sc_dev.dv_xname); + goto error; + } + + sc->sc_st = I2C_HID_RESET_WAIT; + + /* device may report failure on RESET, but works */ + if (ihidev_hid_command(sc, I2C_HID_CMD_RESET, 0)) { + printf("%s: failed to reset\n", sc->sc_dev.dv_xname); + goto error; + } + + was = nsecuptime(); + for (i = 0, err = 0; i < 10 && err == 0; i++) { + err = tsleep_nsec(sc, PZERO, "ihidev_reset", SEC_TO_NSEC(1)); + DPRINTF(("%s: %d sleep for %lld ns, res: %d\n", + sc->sc_dev.dv_xname, i, nsecuptime() - was, err)); + } + + if (err == EWOULDBLOCK) { + printf("%s: reset timeout\n", sc->sc_dev.dv_xname); + goto error; + } + + if (sc->sc_st != I2C_HID_RESET_DONE) { + printf("%s: unexpected reset response, state: 0x%x\n", + sc->sc_dev.dv_xname, sc->sc_st); + goto error; + } + + if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) { + printf("%s: failed fetching HID report\n", + sc->sc_dev.dv_xname); + goto error; + } + sc->sc_nrepid = ihidev_maxrepid(sc->sc_report, sc->sc_reportlen); - if (sc->sc_nrepid < 0) - return; + if (sc->sc_nrepid < 0) { + printf("%s: invalid report ID\n", + sc->sc_dev.dv_xname); + goto error; + } printf("%s: %d report id%s\n", sc->sc_dev.dv_xname, sc->sc_nrepid, sc->sc_nrepid > 1 ? "s" : ""); @@ -203,6 +254,12 @@ ihidev_attach(struct device *parent, str printf("%s: failed to power down\n", sc->sc_dev.dv_xname); return; } + + return; + +error: + sc->sc_st = I2C_HID_BROKEN; + printf("%s: marked as broken\n", sc->sc_dev.dv_xname); } int @@ -570,8 +627,6 @@ ihidev_reset(struct ihidev_softc *sc) int ihidev_hid_desc_parse(struct ihidev_softc *sc) { - int retries = 3; - /* must be v01.00 */ if (letoh16(sc->hid_desc.bcdVersion) != 0x0100) { printf("%s: bad HID descriptor bcdVersion (0x%x)\n", @@ -597,26 +652,9 @@ ihidev_hid_desc_parse(struct ihidev_soft return (1); } - while (retries-- > 0) { - if (ihidev_reset(sc)) { - if (retries == 0) - return(1); - - ihidev_sleep(sc, 10); - } - else - break; - } - sc->sc_reportlen = letoh16(sc->hid_desc.wReportDescLength); sc->sc_report = malloc(sc->sc_reportlen, M_DEVBUF, M_WAITOK | M_ZERO); - if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) { - printf("%s: failed fetching HID report\n", - sc->sc_dev.dv_xname); - return (1); - } - return (0); } @@ -638,6 +676,43 @@ ihidev_intr(void *arg) int psize, res, i, fast = 0; u_char *p; u_int rep = 0; + + switch (sc->sc_st) { + case I2C_HID_RESET_WAIT: { + uint8_t buf[2]; + + iic_acquire_bus(sc->sc_tag, I2C_F_POLL); + res = iic_exec(sc->sc_tag, I2C_OP_READ_WITH_STOP, + sc->sc_addr, NULL, 0, + buf, sizeof(buf), I2C_F_POLL); + iic_release_bus(sc->sc_tag, I2C_F_POLL); + + DPRINTF(("%s: read at reset: 0x%x, 0x%x\n", + sc->sc_dev.dv_xname, buf[0], buf[1])); + + if (buf[0] == 0x00 && buf[1] == 0x00) + sc->sc_st = I2C_HID_RESET_DONE; + else + sc->sc_st = I2C_HID_RESET_FAILED; + + wakeup_one(sc); + + return (0); + } + + case I2C_HID_RESET_DONE: + break; + + case I2C_HID_INITIALIZING: + case I2C_HID_RESET_FAILED: + return (1); + + default: + printf("%s: unexpected state 0x%x, marked as failed\n", + sc->sc_dev.dv_xname, sc->sc_st); + sc->sc_st = I2C_HID_RESET_FAILED; + return (1); + } if (sc->sc_poll && !sc->sc_frompoll) { DPRINTF(("%s: received interrupt while polling, disabling " Index: sys/dev/i2c/ihidev.h =================================================================== RCS file: /home/cvs/src/sys/dev/i2c/ihidev.h,v diff -u -p -r1.9 ihidev.h --- sys/dev/i2c/ihidev.h 3 Sep 2022 15:48:16 -0000 1.9 +++ sys/dev/i2c/ihidev.h 24 Dec 2024 20:10:06 -0000 @@ -77,6 +77,8 @@ struct ihidev_softc { struct i2c_hid_desc hid_desc; }; + int sc_st; + uint8_t *sc_report; int sc_reportlen; -- wbr, Kirill