Index | Thread | Search

From:
Kirill A. Korinsky <kirill@korins.ky>
Subject:
Re: sys/uvideo: avoid using queue.h
To:
marcus@nazgul.ch, tech@openbsd.org
Date:
Thu, 20 Mar 2025 10:57:21 +0100

Download raw body.

Thread
On Thu, 20 Mar 2025 10:00:52 +0100,
Martin Pieuchot <mpi@grenadille.net> wrote:
> 
> On 19/03/25(Wed) 23:27, Kirill A. Korinsky wrote:
> > On Wed, 19 Mar 2025 22:47:21 +0100,
> > Marcus Glocker <marcus@nazgul.ch> wrote:
> > > 
> > > On Wed, Mar 19, 2025 at 09:04:34PM GMT, Marcus Glocker wrote:
> > > 
> > > > On Wed, Mar 19, 2025 at 11:35:09AM GMT, Martin Pieuchot wrote:
> > > > 
> > > > > There are two issues here.  The first one is that uvideo_detach()
> > > > > doesn't "close" the USB endpoints which should abort the transfers.  I'd
> > > > > suggest uvideo_detach() calls uvideo_close().
> > > > > 
> > > > > The second issue is that uvideo_close() doesn't close the pipe due to
> > > > > the wrong isdying() check.  This is what aborts transfers and stop the
> > > > > callbacks.
> > > > > 
> > > > > Here's an untested diff that illustrates all of this.  Note that the
> > > > > kernel might now call uvideo_close() twice and some pointers might have
> > > > > to be cleaned.
> > > > > 
> > > > > The "dying" check should be avoided as much as possible.  I don't think
> > > > > it is necessary in the callback if the pipe is aborted during the close
> > > > > operation.  Btw usbd_close_pipe() implies usbd_abort_pipe().
> > > > 
> > > > Thanks for the diff!
> > > > 
> > > > I don't get an uvm fault panic anymore, but I still can manage to
> > > > freeze the system when using ffplay and detaching the device during
> > > > that in X.  I also tried to remove both usbd_delay_ms() for a test,
> > > > but makes no difference.  I currently can't see what is happening here.
> > > > 
> > > > We anyway could remove the usbd_delay_ms() in detach, since the only
> > > > purpose was to get around this issue by naively waiting for
> > > > uvideo_vs_cb() to complete, which doesn't work out very well we know
> > > > now ;-)
> > > > 
> > > > Needs some more deep dive it looks like ...
> > > 
> > > I was playing a little bit puzzle with both of your diffs to understand
> > > which part fixes what.
> > > 
> > > With the attached diff I can't provoke an uvm fault panic, nor a freeze
> > > anymore.  While mpi's@ diff was taking care about the uvm fault panic,
> > > kirill's@ diff was fixing the freeze.
> > > 
> > > What did I change:
> > > 
> > > mpi@ diff [1]:
> > > 
> > > - I removed the usbd_delay_ms() from uvideo_detach(), since now it's
> > >   simple not required anymore.
> > > - I removed the usbd_is_dying() comment from uvideo_vs_cb(), since it
> > >   seems to be enough to check for the usb xfer status.
> > > 
> > > kirill@ diff [2]:
> > > 
> > > - I only took over the videoioctl() peace to understand where ffplay
> > >   went crazy.  I'm not sure if ffplay is just keep looping over
> > >   ioctl().  device_lookup() seems to be useful, since it does additional
> > >   checks to find out if the device is still active, like DVF_ACTIVE, as
> > >   already stated by kirill@ in another mail.  I'm not sure about
> > >   device_unref(), but audio(4) uses it as well.
> > >  
> > > In case testing for others would be successful too, I think we should
> > > commit the fixes in separate parts.  The uvideo(4) diff for fixing
> > > the uvm fault panic, the video(4) diff to fix the freeze.  I'm
> > > potentially fine with kirill's@ full video(4) diff [2], but would like
> > > to have mpi@'s feedback on that as well.
> > > 
> > > [1] https://marc.info/?l=openbsd-tech&m=174238030006419&w=2
> > > [2] https://marc.info/?l=openbsd-tech&m=174180759929893&w=2
> > >
> > 
> > I had tried your short diff and it works, but after multiple attach and
> > detach of webcam (I haven't count, but it was something like few dozen
> > times) the system simple crashed with stacktrace:
> > 
> >         panic: attempt to access user address 0x28 from EL1
> >         Stopped at    panic+0x140:    cmp    w21, #0x0
> >             TID  PID    UID  PRFLAGS    PFLAGS  CPU COMMAND
> >         *123765 9168     0   0x14000     0x200   1% usbtask
> >         db_enter() at panic+0x13c
> >         panic() at kdata_abort+0x180
> >         do_el0_sync() at handle_el1h_sync+0x68
> >         handle_el1h_sync() at uvideo_vs_close+0x20
> >         uvideo_vs_close() at uvideo_close+0x20
> 
> Which instruction is that?  Which NULL pointer is being dereferenced?
> 
> The offset of 0x28 suggests that `sc_vs_cur' is NULL.  Could it be that
> uvideo_attach() didn't finish without error and that detaching such
> device is busted?
>

It was this line:

        if (sc->sc_vs_cur->bulk_running == 1) {

so, you're right `sc_vs_cur' is NULL.

> >         uvideo_close() at uvideo_detach+0x40
> >         uvideo_detach() at config_detach+0x160
> >         https://www.openbsd.org/ddb.html describes the minimum info required in bug
> >         reports.  Insufficient info makes it difficult to find and fix bugs.
> >         ddb{1}>
> 
> That means the detach thread called uvideo_close() on an incomplete
> descriptor.
> 
> Please make sure to execute "ps" when you see such backtrace it is helpful
> to understand if another thread is currently sleeping inside uvideo(4) code
> and where.

Unfortently I can reproduce uvideo issues quite easly only on snpadragon
laptop which has i2c keyboard which doesn't work in ddb.

So, I can't use ddb at all here.

On another machines reproducing uvideo issue requires much more effort, and
it wasn't so easy, so I never tried.

> 
> What did you do in your test?  Was the /dev/video* being closed at the
> same time you detached it?  What it the kind of race that you were
> triggering?
> 

Physically attach and detach webcam, and running ffmpeg in console between
it.

BTW I was able to reproduce it with enabled debug log, and this crashes only
if I detach webcam before it hadn't parsed descriptors.

So, here a small tweak of the last Markus diff which I can't crash anymore
when I use it with https://marc.info/?l=openbsd-tech&m=174180759929893&w=2

Index: sys/dev/usb/uvideo.c
===================================================================
RCS file: /home/cvs/src/sys/dev/usb/uvideo.c,v
diff -u -p -r1.253 uvideo.c
--- sys/dev/usb/uvideo.c	18 Mar 2025 13:38:15 -0000	1.253
+++ sys/dev/usb/uvideo.c	20 Mar 2025 09:43:10 -0000
@@ -685,13 +685,12 @@ uvideo_detach(struct device *self, int f
 	struct uvideo_softc *sc = (struct uvideo_softc *)self;
 	int rv = 0;
 
-	/* Wait for outstanding requests to complete */
-	usbd_delay_ms(sc->sc_udev, UVIDEO_NFRAMES_MAX);
-
 	if (sc->sc_videodev != NULL)
 		rv = config_detach(sc->sc_videodev, flags);
 
-	uvideo_vs_free_frame(sc);
+	/* avoid free when device detached before descriptor was parsed */
+	if (sc->sc_vs_cur)
+		uvideo_close(sc);
 
 	return (rv);
 }
@@ -2136,9 +2135,6 @@ skip_set_alt:
 void
 uvideo_vs_close(struct uvideo_softc *sc)
 {
-	if (usbd_is_dying(sc->sc_udev))
-		return;
-
 	if (sc->sc_vs_cur->bulk_running == 1) {
 		sc->sc_vs_cur->bulk_running = 0;
 
@@ -2154,6 +2150,10 @@ uvideo_vs_close(struct uvideo_softc *sc)
 		sc->sc_vs_cur->pipeh = NULL;
 	}
 
+	/* No need to mess with HW if the device is gone. */
+	if (usbd_is_dying(sc->sc_udev))
+		return;
+
 	if (sc->sc_vs_cur->bulk_endpoint) {
 		/*
 		 * UVC doesn't specify how to notify a bulk-based device
@@ -2173,8 +2173,7 @@ uvideo_vs_close(struct uvideo_softc *sc)
 		usbd_delay_ms(sc->sc_udev, 100);
 
 		/* switch back to default interface (turns off cam LED) */
-		if (!usbd_is_dying(sc->sc_udev))
-			(void)usbd_set_interface(sc->sc_vs_cur->ifaceh, 0);
+		(void)usbd_set_interface(sc->sc_vs_cur->ifaceh, 0);
 	}
 }
 
@@ -2282,9 +2281,6 @@ uvideo_vs_start_isoc_ixfer(struct uvideo
 	usbd_status error;
 
 	DPRINTF(2, "%s: %s\n", DEVNAME(sc), __func__);
-
-	if (usbd_is_dying(sc->sc_udev))
-		return;
 
 	for (i = 0; i < sc->sc_nframes; i++)
 		ixfer->size[i] = sc->sc_vs_cur->psize;