Index | Thread | Search

From:
Kirill A. Korinsky <kirill@korins.ky>
Subject:
Re: sys/uvideo: avoid using queue.h
To:
Marcus Glocker <marcus@nazgul.ch>
Cc:
tech@openbsd.org
Date:
Tue, 18 Mar 2025 00:50:38 +0100

Download raw body.

Thread
On Mon, 17 Mar 2025 21:58:50 +0100,
Marcus Glocker <marcus@nazgul.ch> wrote:
> 
> On Mon, Mar 17, 2025 at 08:07:56PM GMT, Marcus Glocker wrote:
> 
> > > Well, I've just tried your diff and it doesn't work. Idea was to perform a
> > > stress test of your and my diff against uvideo.c,v 1.252 but the system had
> > > crashed before ffmpeg gets any frame from the device. OCRed stacktrace:
> > > 
> > >         panic: attempt to access user address 0x20 from EL1
> > >         Stopped at      panic+0x140:    cmp     w21, #0x0
> > >              TID     PID    UID  PRFLAGS     PC PFLAGS     CPU COMMAND
> > >           236976   22771   1000     0x3  0x4000000       1 ffmpeg
> > >         db_enter() at panic+0x13c
> > >         panic() at kdata_abort+0x180
> > >         do_el0_sync() at handle_el1h_sync+0x68
> > >         handle_el1h_sync() at uvideo_vs_cb+0xc0
> > >         uvideo_vs_cb() at usb_transfer_complete+0x220
> > >         usb_transfer_complete() at xhci_event_dequeue+0x10c
> > >         xhci_event_dequeue() at xhci_softintr+0x34
> > >         https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > >         reports.  Insufficient info makes it difficult to find and fix bugs.
> > >         ddb{0}>
> > > 
> > > The test was to use Elgato webcam to stream 4K 60 fps stream to null for 100
> > > seconds as ffmpeg command:
> > > 
> > >         ffmpeg -input_format mjpeg -video_size 3840x2160 -framerate 60 -i /dev/video0 -t 100 -f null -
> > > 
> > > Anyway, the same test works well on the same machine with the same device
> > > with and without my diff.
> > 
> > Ah crap, I had a left over in that diff from the first version, which
> > most probably caused this panic.  Is this better by any chance?
> 
> OK.  I think we're getting closer.  I did some more analysis of this
> "detach cam during streaming and system freezes" issue and came to the
> following conclusion:
> 
> - It happens with the -current code.
> - It happens with your "own queue implementation" diff.
> - It happens with my improved frame buffer utilization and mutex diff.
> - It happens with my improved frame buffer utilization without mutex
>   diff (new, attached).
> 
> The only diff which helps to fix the detach-freeze issue is this diff
> from you: [1].
> 
> For me this means:
> 
> - SIMPLEQ isn't the issue.
> - Missing mutex locking isn't the issue.
> 
> If the following diff works for your devices, I would propose to only
> get this in, to fix the frame buffer utilization in uvideo.  Then as a
> next step we review [1], and hopefully can bring that in next to fix
> the detach-freeze issue.
> 
> [1] https://marc.info/?l=openbsd-tech&m=174180759929893&w=2
> 

Both your diff doesn't crash system, so, I was able to make a performance
testing of all 3 diffs. ffmpeg can read with ~35 fps from all three of them
with 3840x2160 and 60 fps when I stream as 1920x1080. THe bottel neck is
ffmpeg because it had hit almost 100% CPU on large stream.

Regarding freezing on detach. I was able to reproduce it with your latest
diff where you improved buffer utilization (like on snapshot), and with my
diff and your mutex diff the system crashes with OCRed stacktrace:

	panic: attempt to access user address 0x62d98 from ELI
	Stopped at  panic+0x114f  cmp   w21, #0x0
	TID   PID   UID   FLAGS   PFLAGS   CPU   COMMAND
	*1076599 3134  35    0x12    0       0     xorg
	338189  39674 0     0x10000 0       3     hotplugd

	db_enter() at panic+0x1c
	panic() at kdata_abort+0x180
	do_el0_sync() at handle_el1h_sync+0x68
	handle_el1h_sync() at uvideo_vs_decode_stream_header+0x9c
	uvideo_vs_decode_stream_header() at uvideo_vs_cb+0x94
	uvideo_vs_cb() at usb_transfer_complete+0x220
	usb_transfer_complete() at xhci_event_dequeue+0x10c

Ok, next. Adding only my diff with device_lookup changes the crash to:

        panic: uvm_fault failed: fffff80002fd48 esr 96000004 far dead07fdeadbf37
	Stopped at  panic+0x140: cmp    w21, #0x0
	TID   PID   UID   FLAGS   PFLAGS   CPU   COMMAND
	304169 16111 35    0x12    0       7     xorg

	db_enter() at panic+0x13c
	panic() at kdata_abort+0x170
	kdata_abort() at handle_el1h_sync+0x68
	handle_el1h_sync() at usbd_free_xfer+0x58
	usbd_free_xfer() at uvideo_vs_free_isoc+0x48
	uvideo_vs_free_isoc() at uvideo_close+0xe0
	uvideo_close() at videoclose+0x5c

but it only happens if ffplay doesn't quit on detach and consumes a lot of
wired errors (really a lot), and I forced to kill it.

Yeah, non of diffs help with that, and I think that all consumer should be
stop before device is detached.

Anyway, I'd like to point that I like your diff, and I think it should go
because it improves behaviour and make it compatible with Linux.

I also would like to point that I had discovered the freeze after nuke of
the second bcopy was commited, and I recall that I can't reproduce it easy
if I had enabled debug. I think that this issue may turn back when we
removed bcopy again, but so far I agree that SIMPLEQ is the best option.

-- 
wbr, Kirill