Index | Thread | Search

From:
Vitaliy Makkoveev <mvs@openbsd.org>
Subject:
Re: mp-safe drm*_filtops
To:
tech@openbsd.org
Date:
Tue, 22 Jul 2025 00:46:21 +0300

Download raw body.

Thread
On Mon, Jul 21, 2025 at 05:10:45PM +1000, Jonathan Gray wrote:
> On Fri, Jul 18, 2025 at 11:31:05PM +0300, Vitaliy Makkoveev wrote:
> > The resurrection my old diff. No reason for X to stuck in kernel lock
> > while polling.
> > 
> > The `event_lock' mutex(9) is already protects `event_list' checked by
> > filt_drmread(), so use it to protect the `drmread_filtops' too.
> > 
> > The filt_drmkms() doesn't touch any data and could be easily converted
> > to always return 1, so the lock is only required for `drm_filtops'.
> > Introduce the new `note_mtx' mutex(9) for that purpose.
> > 
> > I have no radeondrm(4), so tested with inteldrm(4) only. Previous time I
> > had successful reports from people with radeondrm(4) too.
> 
> I'm not sure of the best way to test this or what any problems would
> look like.
> 
> In general I'm concerned about unlocking drm due to rcu.
> 
> on t500 with rv635 radeondrm and this patch
> 
> 7747 tests into 44335 on piglit
> 
> drm:pid11671:radeon_ring_test_lockup *ERROR* ring 0 stalled for more than 10170msec
> drm:pid11671:radeon_fence_check_lockup *WARNING* GPU lockup (current fence id 0x0000000000088108 last fence id 0x0000000000088114 on ring 0)
> ..
> [drm] *ERROR* radeon: fence wait failed (-11).
> [drm] *ERROR* radeon: failed testing IB on GFX ring (-11).
> 
> and windows in X stop updating, cursor still moves, and can switch vt
> 
> on reboot
> WARNING bo->pin_count failed at /usr/src/sys/dev/pci/drm/ttm/ttm_bo.c:252
> 
> Without the patch I also see the ring stalls around the same place.
> But the entire screen is black, which doesn't change on switching vt.
> 

I used the following machine to test inteldrm(4):

    inteldrm0 at pci0 dev 2 function 0 "Intel Graphics" rev 0x0c
    inteldrm0: msi, ALDERLAKE_P, gen 12

"piglit run tests/all results/all-reference" with the following results.
No crashes, no artifacts, no differences with unpatched kernel:

[59174/59174] skip: 6892, pass: 50949, warn: 9, fail: 152, crash: 1172

Could we put this diff into the snaps? According your report, this diff
doesn't make the radeondrm(4) worse. I doubt someone will see any
fallout.