Index | Thread | Search

From:
Vitaliy Makkoveev <otto@bsdbox.dev>
Subject:
Re: sys/ihidev: prevent crash on interrupt storm
To:
"Kirill A. Korinsky" <kirill@korins.ky>
Cc:
Martin Pieuchot <mpi@grenadille.net>, OpenBSD tech <tech@openbsd.org>
Date:
Wed, 25 Dec 2024 01:26:30 +0300

Download raw body.

Thread
> On 25 Dec 2024, at 00:09, Kirill A. Korinsky <kirill@korins.ky> wrote:
> 
> On Tue, 24 Dec 2024 10:40:08 +0100,
> Vitaliy Makkoveev <otto@bsdbox.dev> wrote:
>> 
>>> On 24 Dec 2024, at 12:17, Martin Pieuchot <mpi@grenadille.net> wrote:
>>> 
>>> On 23/12/24(Mon) 16:58, Kirill A. Korinsky wrote:
>>>> Vitaliy,
>>>> 
>>>> Thanks for review!
>>>> 
>>>> On Mon, 23 Dec 2024 12:39:19 +0100,
>>>> Vitaliy Makkoveev <otto@bsdbox.dev> wrote:
>>>>> 
>>>>>> On 23 Dec 2024, at 14:23, Vitaliy Makkoveev <otto@bsdbox.dev> wrote:
>>>>>> 
>>>>>>> On 23 Dec 2024, at 13:21, Kirill A. Korinsky <kirill@korins.ky> wrote:
>>>>>>> 
>>>>>>> tech@,
>>>>>>> 
>>>>>>> Before I move forward with quirks, I'd like to fix the cause of a crash in a
>>>>>>> system on HONOR MagicBook Art 14 Snapdragon.
>>>>>>> 
>>>>>>> Let's assume that we have a device on a machine I2C bus which goes into a
>>>>>>> so-called interrupt storm for some reason. Under this condition, the system
>>>>>>> crashes if an interrupt arrives before we allocate a buffer to read it.
>>>>>>> 
>>>>>>> Here's an example of a trace:
>>>>>>> 
>>>>>>>     panic: attempt to access user address 0x0 from EL1
>>>>>>>     Stopped at panic+0x140: cmp w21, #0x0
>>>>>>> 
>>>>>>>     TID PID UID PRFLAGS PFLAGS CPU COMMAND
>>>>>>>     db_enter() at panic+0x13c
>>>>>>>     panic() at kdata_abort+0x180
>>>>>>>     do_el0_sync() at handle_el1h_sync+0x68
>>>>>>>     handle_el1h_sync() at qciic_exec+0x2d4
>>>>>>>     qciic_exec() at ihidev_intr+0x70
>>>>>>>     ihidev_intr() at qcgpio_intr+0xac
>>>>>>>     qcgpio_intr() at agintc_irq_handler+0x2bc
>>>>>>> 
>>>>>>> Tested on the same machine, without any additional patches. System boots,
>>>>>>> but touchpad doesn't work.
>>>>>>> 
>>>>>>> Ok?
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> This is not related to your diff, but since you are working in this area,
>>>>>> please look at the following code.
>>>>>> 
>>>>>> timeout_del_barrier(9) looks very suspicious to me. As I understand, the
>>>>>> following code could be executed from interrupt context, so barrier can’t
>>>>>> be used here. Sorry if I’m wrong and this is always thread context.
>>>>>> 
>>>>> 
>>>>> This is not the thread context, even the timer is interrupt context :)
>>>>> 
>>>> 
>>>> Well, ihidev_intr migth be called from different places, at least from:
>>>> 1. a device interrup
>>>> 2. a timeout for the poll
>>>> 3. ikbd_cngetc
>>>> 
>>>> I think moving place where iic_intr_establish is called is a wrong way. The
>>>> right way is confirm inside ihidev_intr that devices is fully attached, and
>>>> don't try to poll anything from not attached devices.
>>>> 
>>>> Also, you're right regarding timeout_del_barrier. I have replaced them to
>>>> just timeout_del because this code is called from different contexts.
>>>> 
>>>> So, resulted diff which boot on HONOR MagicBook Art 14 Snapdragon:
>>> 
>>> Instead of adding an `sc_attached' variable I'd suggest moving the block
>>> that calls iic_intr_establish() down.
>> 
>> According "HID over I2C Protocol Specification" and [1] we do device
>> attachment incorrect. HID device should assert interrupt after reset
>> command, and the reset command should precede to get report descriptor.
>> 
>> So, we should install interrupt handler before reset command and wait
>> interrupt for 10 seconds. In the interrupt handler we should handle
>> reset condition separately, because buffer is not yet allocated. Current
>> code doesn’t do that.
>> 
>> Something like below:
>> 
>> ihidev_attach()
>> {
>> 	sc->st = INITIALIZING;
>> 
>> 	ihidev_hid_command(I2C_HID_CMD_DESCR);
>> 
>> 	iic_intr_establish(ihidev_intr);
>> 
>> 	ihidev_hid_command(I2C_HID_CMD_SET_POWER);
>> 
>> 	sc->st = RESET_WAIT;
>> 	ihidev_hid_command(I2C_HID_CMD_RESET);
>> 	error = tsleep_nsec(sc, .., SEC_TO_NSEC(10));
>> 	if (error == EWOULDBLOCK) {
>> 		printf("reset timeout\n");
>> 		goto err;
>> 	}
>> 	if(sc->st != RESET_DONE){
>> 		printf("reset failed\n");
>> 		goto err;
>> 	}
>> 	ihidev_hid_command(I2C_HID_REPORT_DESCR);
>> 	sc->sc_ibuf = malloc();
>> 
>> err:
>> 	return (error);
>> }
>> 
>> ihidev_intr()
>> {
>> 	switch(sc->st) {
>> 	case CMD_RESET: {
>> 		uint8_t buf[2];
>> 		iic_exec(.., I2C_OP_READ_WITH_STOP, buf, sizeof(buf), ..);
>> 		if (buf[0] == 0x00 && buf[1] == 0x00)
>> 			sc->st = RESET_DONE;
>> 		else
>> 			sc->st = RESET_FAILED;
>> 		wakeup_one(sc);
>> 		break;
>> 	}
>> 	case ..:
>> 	}
>> } 
>> 
>> 
> 
> Thanks for the suggestion. I tried to implement it but discovered that
> tsleep inside attach sleeps for 1 microsecond. I tried using delay(9)
> instead, to sleep for 10 seconds, but no interrupt arrived while it was
> waiting. My guess that it waits until autoconfiguration is over, but I'm not
> sure.
> 
> Extracting part of the logic from attach to some function that is called
> from intr and by the timer could be a solution, but this function uses
> config_found_sm and it requires i2c_attach_args. All of that seems like the
> wrong way. Am I mistaken?
> 

My HID touchpad 04f3:3282 doesn’t throw interrupt reset. We had no
reports before, so I suspect the most devices do the same.

delay(9) is arch specific, IIRC it doesn’t interrupts on x86.

> +	sc->sc_st = I2C_HID_RESET_WAIT;
> +
> +	/* device may report failure on RESET, but works */
> +	if (ihidev_hid_command(sc, I2C_HID_CMD_RESET, 0)) {
> +		printf("%s: failed to reset\n", sc->sc_dev.dv_xname);
> +		goto error;
> +	}
> +
> +	was = nsecuptime();
> +	for (i = 0, err = 0; i < 10 && err == 0; i++) {
> +		err = tsleep_nsec(sc, PZERO, "ihidev_reset", SEC_TO_NSEC(1));
> +		DPRINTF(("%s: %d sleep for %lld ns, res: %d\n",
> +		    sc->sc_dev.dv_xname, i, nsecuptime() - was, err));
> +	}
> +
> +	if (err == EWOULDBLOCK) {
> +		printf("%s: reset timeout\n", sc->sc_dev.dv_xname);
> +		goto error;
> +	}
> +
> +	if (sc->sc_st != I2C_HID_RESET_DONE) {
> +		printf("%s: unexpected reset response, state: 0x%x\n",
> +		    sc->sc_dev.dv_xname, sc->sc_st);
> +		goto error;
> +	}

At least you should check err in the for loop, otherwise it could
be overwritten with the EWOULDBLOCK even in the non error case. Also
sc_st check and sleep should be serialised with sc_st modification
within interrupt handler, otherwise the wakeup could be lost.

I propose you to try something like below:

	sc->sc_st = I2C_HID_RESET_WAIT;
	for (i = 0; i < 3; ++i) {
		ihidev_hid_command(sc, I2C_HID_CMD_RESET);
		delay(10 * 1000 * 1000);		
		if (sc->sc_st != I2C_HID_RESET_WAIT)
			break;
	}
	if (sc->sc_st != I2C_HID_RESET_DONE) {
		if (sc->sc_st == I2C_HID_RESET_WAIT)
			printf("timeout");
		else
			printf("other error");
		goto error;
	}



> But I might be totally wrong, so here’s the diff that I've used:
> 
> Index: sys/dev/i2c/ihidev.c
> ===================================================================
> RCS file: /home/cvs/src/sys/dev/i2c/ihidev.c,v
> diff -u -p -r1.33 ihidev.c
> --- sys/dev/i2c/ihidev.c	18 Oct 2024 12:53:49 -0000	1.33
> +++ sys/dev/i2c/ihidev.c	24 Dec 2024 20:10:06 -0000
> @@ -30,7 +30,7 @@
> 
> #include <dev/hid/hid.h>
> 
> -/* #define IHIDEV_DEBUG */
> +#define IHIDEV_DEBUG
> 
> #ifdef IHIDEV_DEBUG
> #define DPRINTF(x) printf x
> @@ -57,6 +57,14 @@ enum {
> 	I2C_HID_REPORT_DESCR	= 0x100,
> };
> 
> +enum {
> +	I2C_HID_INITIALIZING	= 0x1,
> +	I2C_HID_RESET_WAIT	= 0x2,
> +	I2C_HID_RESET_DONE	= 0x3,
> +	I2C_HID_RESET_FAILED	= 0x4,
> +	I2C_HID_BROKEN		= 0x5,
> +};
> +
> static int I2C_HID_POWER_ON	= 0x0;
> static int I2C_HID_POWER_OFF	= 0x1;
> 
> @@ -105,8 +113,11 @@ ihidev_attach(struct device *parent, str
> 	struct i2c_attach_args *ia = aux;
> 	struct ihidev_attach_arg iha;
> 	struct device *dev;
> -	int repid, repsz;
> +	int repid, repsz, err, i;
> 	int repsizes[256];
> +	uint64_t was;
> +
> +	sc->sc_st = I2C_HID_INITIALIZING;
> 
> 	sc->sc_tag = ia->ia_tag;
> 	sc->sc_addr = ia->ia_addr;
> @@ -115,7 +126,7 @@ ihidev_attach(struct device *parent, str
> 	if (ihidev_hid_command(sc, I2C_HID_CMD_DESCR, NULL) ||
> 	    ihidev_hid_desc_parse(sc)) {
> 		printf(", failed fetching initial HID descriptor\n");
> -		return;
> +		goto error;
> 	}
> 
> 	if (ia->ia_intr) {
> @@ -137,9 +148,49 @@ ihidev_attach(struct device *parent, str
> 	    letoh16(sc->hid_desc.wVendorID), letoh16(sc->hid_desc.wProductID),
> 	    (char *)ia->ia_cookie);
> 
> +	if (ihidev_hid_command(sc, I2C_HID_CMD_SET_POWER, &I2C_HID_POWER_ON)) {
> +		printf("%s: failed to power on\n", sc->sc_dev.dv_xname);
> +		goto error;
> +	}
> +
> +	sc->sc_st = I2C_HID_RESET_WAIT;
> +
> +	/* device may report failure on RESET, but works */
> +	if (ihidev_hid_command(sc, I2C_HID_CMD_RESET, 0)) {
> +		printf("%s: failed to reset\n", sc->sc_dev.dv_xname);
> +		goto error;
> +	}
> +
> +	was = nsecuptime();
> +	for (i = 0, err = 0; i < 10 && err == 0; i++) {
> +		err = tsleep_nsec(sc, PZERO, "ihidev_reset", SEC_TO_NSEC(1));
> +		DPRINTF(("%s: %d sleep for %lld ns, res: %d\n",
> +		    sc->sc_dev.dv_xname, i, nsecuptime() - was, err));
> +	}
> +
> +	if (err == EWOULDBLOCK) {
> +		printf("%s: reset timeout\n", sc->sc_dev.dv_xname);
> +		goto error;
> +	}
> +
> +	if (sc->sc_st != I2C_HID_RESET_DONE) {
> +		printf("%s: unexpected reset response, state: 0x%x\n",
> +		    sc->sc_dev.dv_xname, sc->sc_st);
> +		goto error;
> +	}
> +
> +	if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) {
> +		printf("%s: failed fetching HID report\n",
> +		    sc->sc_dev.dv_xname);
> +		goto error;
> +	}
> +
> 	sc->sc_nrepid = ihidev_maxrepid(sc->sc_report, sc->sc_reportlen);
> -	if (sc->sc_nrepid < 0)
> -		return;
> +	if (sc->sc_nrepid < 0) {
> +		printf("%s: invalid report ID\n",
> +		    sc->sc_dev.dv_xname);
> +		goto error;
> +	}
> 
> 	printf("%s: %d report id%s\n", sc->sc_dev.dv_xname, sc->sc_nrepid,
> 	    sc->sc_nrepid > 1 ? "s" : "");
> @@ -203,6 +254,12 @@ ihidev_attach(struct device *parent, str
> 		printf("%s: failed to power down\n", sc->sc_dev.dv_xname);
> 		return;
> 	}
> +
> +	return;
> +
> +error:
> +	sc->sc_st = I2C_HID_BROKEN;
> +	printf("%s: marked as broken\n", sc->sc_dev.dv_xname);
> }
> 
> int
> @@ -570,8 +627,6 @@ ihidev_reset(struct ihidev_softc *sc)
> int
> ihidev_hid_desc_parse(struct ihidev_softc *sc)
> {
> -	int retries = 3;
> -
> 	/* must be v01.00 */
> 	if (letoh16(sc->hid_desc.bcdVersion) != 0x0100) {
> 		printf("%s: bad HID descriptor bcdVersion (0x%x)\n",
> @@ -597,26 +652,9 @@ ihidev_hid_desc_parse(struct ihidev_soft
> 		return (1);
> 	}
> 
> -	while (retries-- > 0) {
> -		if (ihidev_reset(sc)) {
> -			if (retries == 0)
> -				return(1);
> -
> -			ihidev_sleep(sc, 10);
> -		}
> -		else
> -			break;
> -	}
> -
> 	sc->sc_reportlen = letoh16(sc->hid_desc.wReportDescLength);
> 	sc->sc_report = malloc(sc->sc_reportlen, M_DEVBUF, M_WAITOK | M_ZERO);
> 
> -	if (ihidev_hid_command(sc, I2C_HID_REPORT_DESCR, 0)) {
> -		printf("%s: failed fetching HID report\n",
> -		    sc->sc_dev.dv_xname);
> -		return (1);
> -	}
> -
> 	return (0);
> }
> 
> @@ -638,6 +676,43 @@ ihidev_intr(void *arg)
> 	int psize, res, i, fast = 0;
> 	u_char *p;
> 	u_int rep = 0;
> +
> +	switch (sc->sc_st) {
> +	case I2C_HID_RESET_WAIT: {
> +		uint8_t buf[2];
> +
> +		iic_acquire_bus(sc->sc_tag, I2C_F_POLL);
> +		res = iic_exec(sc->sc_tag, I2C_OP_READ_WITH_STOP,
> +			       sc->sc_addr, NULL, 0,
> +			       buf, sizeof(buf), I2C_F_POLL);
> +		iic_release_bus(sc->sc_tag, I2C_F_POLL);
> +
> +		DPRINTF(("%s: read at reset: 0x%x, 0x%x\n",
> +		    sc->sc_dev.dv_xname, buf[0], buf[1]));
> +
> +		if (buf[0] == 0x00 && buf[1] == 0x00)
> +			sc->sc_st = I2C_HID_RESET_DONE;
> +		else
> +			sc->sc_st = I2C_HID_RESET_FAILED;
> +
> +		wakeup_one(sc);
> +
> +		return (0);
> +	}
> +
> +	case I2C_HID_RESET_DONE:
> +		break;
> +
> +	case I2C_HID_INITIALIZING:
> +	case I2C_HID_RESET_FAILED:
> +		return (1);
> +
> +	default:
> +		printf("%s: unexpected state 0x%x, marked as failed\n",
> +		    sc->sc_dev.dv_xname, sc->sc_st);
> +		sc->sc_st = I2C_HID_RESET_FAILED;
> +		return (1);
> +	}
> 
> 	if (sc->sc_poll && !sc->sc_frompoll) {
> 		DPRINTF(("%s: received interrupt while polling, disabling "
> Index: sys/dev/i2c/ihidev.h
> ===================================================================
> RCS file: /home/cvs/src/sys/dev/i2c/ihidev.h,v
> diff -u -p -r1.9 ihidev.h
> --- sys/dev/i2c/ihidev.h	3 Sep 2022 15:48:16 -0000	1.9
> +++ sys/dev/i2c/ihidev.h	24 Dec 2024 20:10:06 -0000
> @@ -77,6 +77,8 @@ struct ihidev_softc {
> 		struct i2c_hid_desc hid_desc;
> 	};
> 
> +	int		sc_st;
> +
> 	uint8_t		*sc_report;
> 	int		sc_reportlen;
> 
> 
> 
> -- 
> wbr, Kirill