From: Mark Kettenis Subject: Re: fstat sysctl crashes kernel To: Alexander Bluhm Cc: otto@bsdbox.dev, tech@openbsd.org Date: Fri, 09 Aug 2024 12:37:25 +0200 > Date: Fri, 9 Aug 2024 10:20:41 +0200 > From: Alexander Bluhm > > On Thu, Aug 08, 2024 at 09:27:36PM +0300, Vitaliy Makkoveev wrote: > > > On 8 Aug 2024, at 20:46, Alexander Bluhm wrote: > > > > > > On Thu, Aug 08, 2024 at 09:47:27AM -0600, Theo de Raadt wrote: > > >> Todd C. Miller wrote: > > >> > > >>> On Thu, 08 Aug 2024 13:21:54 +0200, Alexander Bluhm wrote: > > >>> > > >>>> Independent of that, can we commit the chunk below? When searching > > >>>> for a specific process, there is not need to traverse the list to > > >>>> the end. Also I check for arg >= 0 as this is cosistent with the > > >>>> arg < -1 check before. Then you don't have to think about what is > > >>>> special about process 0. There is no functional difference as > > >>>> process 0 has PS_SYSTEM set and is skipped anyway. > > >>> > > >>> Yes, this makes sense. OK millert@ > > >> > > >> No matter what we do, we don't get an atomic view. Punting the problem > > >> to userland sidesteps the difficult problem in the kernel. > > > > > > For me is important that we commit something instead of discussing > > > pro and con of various diffs. > > > > > > The idea with the marker sounds like the correct thing, but adds a > > > lot of complexity. mvs@ already worked on something simmilar. The > > > easiest fix without locking concerns is my userland fstat diff. > > > Locking the whole process list while doing copyout over NFS may > > > open another can of worms. > > > > > > > Userland pages are wired at this moment, you can???t stuck in swap > > on nfs here, sysctl_vslock() cares about that. > > > > > Both bluhm@ and mvs@ diff survive my stress test for hours. Usually > > > the test crashes or hangs within minutes. Currently I am testing > > > with witness to find fallout. > > > > > > > Nice to hear. Personal me, I prefer allprocess rwlock(9). It is much > > simpler and left userland as is. Also, is allows to remove kernel > > lock around some process stuff like prfind(). For example, from the > > filt_procattach() where claudio@ left it to make ps dereference safe. > > mvs@ diff has finished witness testing. It complains already during > boot. > > reordering: ld.so libc libcrypto sshd sshd-session ssh-agent. > starting early daemons: syslogd pflogd ntpd. > starting RPC daemons:. > witness: lock order reversal: > 1st 0xd7af9474 inode (&ip->i_lock) > 2nd 0xd0f32d68 allpslk (&allprocess_lock) > lock order [1] inode (&ip->i_lock) -> [2] allpslk (&allprocess_lock) > lock order data 0xd0ca8820 -> 0xd0cf51b4 is missing > lock order [2] allpslk (&allprocess_lock) -> [1] inode (&ip->i_lock) > #0 witness_checkorder+0x291 > #1 rw_enter+0x4d > #2 rrw_enter+0x41 > #3 ufs_lock+0x27 > #4 VOP_LOCK+0x50 > #5 vn_lock+0x90 > #6 vrele+0x35 > #7 unveil_destroy+0x76 > #8 exit1+0x378 > #9 sys_exit+0x14 > #10 syscall+0x3f2 > #11 Xsyscall_untramp+0xa9 > savecore: no core dump > checking quotas: done. > clearing /tmp > kern.securelevel: 0 -> 1 > turning on accounting > witness: lock order reversal: > 1st 0xd0f32d68 allpslk (&allprocess_lock) > 2nd 0xd0ed1114 acctlk (acctlk) > lock order [1] allpslk (&allprocess_lock) -> [2] acctlk (acctlk) > #0 witness_checkorder+0x291 > #1 rw_enter_read+0x32 > #2 acct_process+0x38 > #3 exit1+0x35d > #4 sys_exit+0x14 > #5 syscall+0x3f2 > #6 Xsyscall_untramp+0xa9 > lock order [2] acctlk (acctlk) -> [1] allpslk (&allprocess_lock) > #0 witness_checkorder+0x291 > #1 rw_enter_write+0x30 > #2 fork1+0x138 > #3 kthread_create+0x32 > #4 sys_acct+0xfe > #5 syscall+0x3f2 > #6 Xsyscall_untramp+0xa9 > creating runtime link editor directory cache. > preserving editor files. > starting network daemons: sshd snmpd smtpd sndiod. We should not introduce more lock order reversals. And I actually agree with Mateusz: the fact that you can sleep while holding an rwlock doesn't mean it is a good idea.