From: Martin Pieuchot Subject: Re: sys/uvideo: avoid using queue.h To: marcus@nazgul.ch, tech@openbsd.org Date: Thu, 20 Mar 2025 12:59:43 +0100 On 20/03/25(Thu) 10:57, Kirill A. Korinsky wrote: > On Thu, 20 Mar 2025 10:00:52 +0100, > Martin Pieuchot wrote: > > > > On 19/03/25(Wed) 23:27, Kirill A. Korinsky wrote: > > > On Wed, 19 Mar 2025 22:47:21 +0100, > > > Marcus Glocker wrote: > > > > > > > > On Wed, Mar 19, 2025 at 09:04:34PM GMT, Marcus Glocker wrote: > > > > > > > > > On Wed, Mar 19, 2025 at 11:35:09AM GMT, Martin Pieuchot wrote: > > > > > > > > > > > There are two issues here. The first one is that uvideo_detach() > > > > > > doesn't "close" the USB endpoints which should abort the transfers. I'd > > > > > > suggest uvideo_detach() calls uvideo_close(). > > > > > > > > > > > > The second issue is that uvideo_close() doesn't close the pipe due to > > > > > > the wrong isdying() check. This is what aborts transfers and stop the > > > > > > callbacks. > > > > > > > > > > > > Here's an untested diff that illustrates all of this. Note that the > > > > > > kernel might now call uvideo_close() twice and some pointers might have > > > > > > to be cleaned. > > > > > > > > > > > > The "dying" check should be avoided as much as possible. I don't think > > > > > > it is necessary in the callback if the pipe is aborted during the close > > > > > > operation. Btw usbd_close_pipe() implies usbd_abort_pipe(). > > > > > > > > > > Thanks for the diff! > > > > > > > > > > I don't get an uvm fault panic anymore, but I still can manage to > > > > > freeze the system when using ffplay and detaching the device during > > > > > that in X. I also tried to remove both usbd_delay_ms() for a test, > > > > > but makes no difference. I currently can't see what is happening here. > > > > > > > > > > We anyway could remove the usbd_delay_ms() in detach, since the only > > > > > purpose was to get around this issue by naively waiting for > > > > > uvideo_vs_cb() to complete, which doesn't work out very well we know > > > > > now ;-) > > > > > > > > > > Needs some more deep dive it looks like ... > > > > > > > > I was playing a little bit puzzle with both of your diffs to understand > > > > which part fixes what. > > > > > > > > With the attached diff I can't provoke an uvm fault panic, nor a freeze > > > > anymore. While mpi's@ diff was taking care about the uvm fault panic, > > > > kirill's@ diff was fixing the freeze. > > > > > > > > What did I change: > > > > > > > > mpi@ diff [1]: > > > > > > > > - I removed the usbd_delay_ms() from uvideo_detach(), since now it's > > > > simple not required anymore. > > > > - I removed the usbd_is_dying() comment from uvideo_vs_cb(), since it > > > > seems to be enough to check for the usb xfer status. > > > > > > > > kirill@ diff [2]: > > > > > > > > - I only took over the videoioctl() peace to understand where ffplay > > > > went crazy. I'm not sure if ffplay is just keep looping over > > > > ioctl(). device_lookup() seems to be useful, since it does additional > > > > checks to find out if the device is still active, like DVF_ACTIVE, as > > > > already stated by kirill@ in another mail. I'm not sure about > > > > device_unref(), but audio(4) uses it as well. > > > > > > > > In case testing for others would be successful too, I think we should > > > > commit the fixes in separate parts. The uvideo(4) diff for fixing > > > > the uvm fault panic, the video(4) diff to fix the freeze. I'm > > > > potentially fine with kirill's@ full video(4) diff [2], but would like > > > > to have mpi@'s feedback on that as well. > > > > > > > > [1] https://marc.info/?l=openbsd-tech&m=174238030006419&w=2 > > > > [2] https://marc.info/?l=openbsd-tech&m=174180759929893&w=2 > > > > > > > > > > I had tried your short diff and it works, but after multiple attach and > > > detach of webcam (I haven't count, but it was something like few dozen > > > times) the system simple crashed with stacktrace: > > > > > > panic: attempt to access user address 0x28 from EL1 > > > Stopped at panic+0x140: cmp w21, #0x0 > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > *123765 9168 0 0x14000 0x200 1% usbtask > > > db_enter() at panic+0x13c > > > panic() at kdata_abort+0x180 > > > do_el0_sync() at handle_el1h_sync+0x68 > > > handle_el1h_sync() at uvideo_vs_close+0x20 > > > uvideo_vs_close() at uvideo_close+0x20 > > > > Which instruction is that? Which NULL pointer is being dereferenced? > > > > The offset of 0x28 suggests that `sc_vs_cur' is NULL. Could it be that > > uvideo_attach() didn't finish without error and that detaching such > > device is busted? > > > > It was this line: > > if (sc->sc_vs_cur->bulk_running == 1) { > > so, you're right `sc_vs_cur' is NULL. > > > > uvideo_close() at uvideo_detach+0x40 > > > uvideo_detach() at config_detach+0x160 > > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > > reports. Insufficient info makes it difficult to find and fix bugs. > > > ddb{1}> > > > > That means the detach thread called uvideo_close() on an incomplete > > descriptor. > > > > Please make sure to execute "ps" when you see such backtrace it is helpful > > to understand if another thread is currently sleeping inside uvideo(4) code > > and where. > > Unfortently I can reproduce uvideo issues quite easly only on snpadragon > laptop which has i2c keyboard which doesn't work in ddb. > > So, I can't use ddb at all here. > > On another machines reproducing uvideo issue requires much more effort, and > it wasn't so easy, so I never tried. > > > > > What did you do in your test? Was the /dev/video* being closed at the > > same time you detached it? What it the kind of race that you were > > triggering? > > > > Physically attach and detach webcam, and running ffmpeg in console between > it. > > BTW I was able to reproduce it with enabled debug log, and this crashes only > if I detach webcam before it hadn't parsed descriptors. > > So, here a small tweak of the last Markus diff which I can't crash anymore > when I use it with https://marc.info/?l=openbsd-tech&m=174180759929893&w=2 I think we're on the right track. > Index: sys/dev/usb/uvideo.c > =================================================================== > RCS file: /home/cvs/src/sys/dev/usb/uvideo.c,v > diff -u -p -r1.253 uvideo.c > --- sys/dev/usb/uvideo.c 18 Mar 2025 13:38:15 -0000 1.253 > +++ sys/dev/usb/uvideo.c 20 Mar 2025 09:43:10 -0000 > @@ -685,13 +685,12 @@ uvideo_detach(struct device *self, int f > struct uvideo_softc *sc = (struct uvideo_softc *)self; > int rv = 0; > > - /* Wait for outstanding requests to complete */ > - usbd_delay_ms(sc->sc_udev, UVIDEO_NFRAMES_MAX); > - > if (sc->sc_videodev != NULL) > rv = config_detach(sc->sc_videodev, flags); > > - uvideo_vs_free_frame(sc); > + /* avoid free when device detached before descriptor was parsed */ > + if (sc->sc_vs_cur) > + uvideo_close(sc); > > return (rv); > } How is this possible? How can a device be detached before it is attached? Anyway, if we are going to check if `sc_vs_cur' is set, can we do it in uvideo_open() to ensure it is not possible to open such device? Then, I'd suggest you move the check inside uvideo_close() to ease the understanding and for symmetry ;) And should we set this pointer just before /* init map queue */ in uvideo_attach_hook()? Before that some operations might fail...