From: Vitaliy Makkoveev Subject: Re: sys/ihidev: prevent crash on interrupt storm To: Martin Pieuchot Cc: OpenBSD tech Date: Tue, 24 Dec 2024 12:40:08 +0300 > On 24 Dec 2024, at 12:17, Martin Pieuchot wrote: > > On 23/12/24(Mon) 16:58, Kirill A. Korinsky wrote: >> Vitaliy, >> >> Thanks for review! >> >> On Mon, 23 Dec 2024 12:39:19 +0100, >> Vitaliy Makkoveev wrote: >>> >>>> On 23 Dec 2024, at 14:23, Vitaliy Makkoveev wrote: >>>> >>>>> On 23 Dec 2024, at 13:21, Kirill A. Korinsky wrote: >>>>> >>>>> tech@, >>>>> >>>>> Before I move forward with quirks, I'd like to fix the cause of a crash in a >>>>> system on HONOR MagicBook Art 14 Snapdragon. >>>>> >>>>> Let's assume that we have a device on a machine I2C bus which goes into a >>>>> so-called interrupt storm for some reason. Under this condition, the system >>>>> crashes if an interrupt arrives before we allocate a buffer to read it. >>>>> >>>>> Here's an example of a trace: >>>>> >>>>> panic: attempt to access user address 0x0 from EL1 >>>>> Stopped at panic+0x140: cmp w21, #0x0 >>>>> >>>>> TID PID UID PRFLAGS PFLAGS CPU COMMAND >>>>> db_enter() at panic+0x13c >>>>> panic() at kdata_abort+0x180 >>>>> do_el0_sync() at handle_el1h_sync+0x68 >>>>> handle_el1h_sync() at qciic_exec+0x2d4 >>>>> qciic_exec() at ihidev_intr+0x70 >>>>> ihidev_intr() at qcgpio_intr+0xac >>>>> qcgpio_intr() at agintc_irq_handler+0x2bc >>>>> >>>>> Tested on the same machine, without any additional patches. System boots, >>>>> but touchpad doesn't work. >>>>> >>>>> Ok? >>>>> >>>> >>>> >>>> This is not related to your diff, but since you are working in this area, >>>> please look at the following code. >>>> >>>> timeout_del_barrier(9) looks very suspicious to me. As I understand, the >>>> following code could be executed from interrupt context, so barrier can’t >>>> be used here. Sorry if I’m wrong and this is always thread context. >>>> >>> >>> This is not the thread context, even the timer is interrupt context :) >>> >> >> Well, ihidev_intr migth be called from different places, at least from: >> 1. a device interrup >> 2. a timeout for the poll >> 3. ikbd_cngetc >> >> I think moving place where iic_intr_establish is called is a wrong way. The >> right way is confirm inside ihidev_intr that devices is fully attached, and >> don't try to poll anything from not attached devices. >> >> Also, you're right regarding timeout_del_barrier. I have replaced them to >> just timeout_del because this code is called from different contexts. >> >> So, resulted diff which boot on HONOR MagicBook Art 14 Snapdragon: > > Instead of adding an `sc_attached' variable I'd suggest moving the block > that calls iic_intr_establish() down. According "HID over I2C Protocol Specification" and [1] we do device attachment incorrect. HID device should assert interrupt after reset command, and the reset command should precede to get report descriptor. So, we should install interrupt handler before reset command and wait interrupt for 10 seconds. In the interrupt handler we should handle reset condition separately, because buffer is not yet allocated. Current code doesn’t do that. Something like below: ihidev_attach() { sc->st = INITIALIZING; ihidev_hid_command(I2C_HID_CMD_DESCR); iic_intr_establish(ihidev_intr); ihidev_hid_command(I2C_HID_CMD_SET_POWER); sc->st = RESET_WAIT; ihidev_hid_command(I2C_HID_CMD_RESET); error = tsleep_nsec(sc, .., SEC_TO_NSEC(10)); if (error == EWOULDBLOCK) { printf("reset timeout\n"); goto err; } if(sc->st != RESET_DONE){ printf("reset failed\n"); goto err; } ihidev_hid_command(I2C_HID_REPORT_DESCR); sc->sc_ibuf = malloc(); err: return (error); } ihidev_intr() { switch(sc->st) { case CMD_RESET: { uint8_t buf[2]; iic_exec(.., I2C_OP_READ_WITH_STOP, buf, sizeof(buf), ..); if (buf[0] == 0x00 && buf[1] == 0x00) sc->st = RESET_DONE; else sc->st = RESET_FAILED; wakeup_one(sc); break; } case ..: } } 1. https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/plug-and-play-support-and-power-management