From: Vitaliy Makkoveev Subject: Re: Don't release NET_LOCK in vio(4) ioctl To: Stefan Fritsch Cc: Alexander Bluhm , tech@openbsd.org Date: Wed, 17 Apr 2024 20:57:02 +0300 > On 17 Apr 2024, at 00:49, Stefan Fritsch wrote: > > On Tue, 16 Apr 2024, Alexander Bluhm wrote: > >> On Fri, Apr 12, 2024 at 03:38:44PM +0300, Vitaliy Makkoveev wrote: >>> On Fri, Apr 12, 2024 at 02:05:05PM +0200, Stefan Fritsch wrote: >>>> Hi, >>>> >>>> commits f0b002d01d5 "Release the netlock when sleeping for control >>>> messages in in vioioctl()" and 126b881f71 "Insert a workaround for per-ifp >>>> ioctl being called w/o NET_LOCK()." in vio(4) fixed a deadlock but may >>>> cause a crash with a protection fault trap at rt_ifa_del if addresses are >>>> added/removed concurrently. >>>> >>>> Reproducible by running >>>> >>>> while true; do ifconfig vio0 -inet; done & >>>> while true; do ifconfig vio0 10.1.6.11/24; done & >>>> while true; do ifconfig vio0 down; done & >>>> while true; do ifconfig vio0 up; done & >>>> while true; do ifconfig vio0 -inet6 ; done & >>>> while true; do ifconfig vio0 inet6 2001:a61:3446:ddff:94be:52d1:d57c:1/64 ; done & >>>> >>>> in parallel. >>>> >>>> The diff below reverts the two above commits and changes the vio_ctrl_* >>>> functions to prevent any concurrent control queue requests while the >>>> device is being reset. I have seen no deadlocks or crashes with that >>>> patch. >>>> >>>> Comments? OKs? >>>> >>> >>> Sleep with netlock held stops packet processing and operations in inet >>> sockets. This is much worse when use another lock with netlock release. >>> pflowioctl() could be good example. >> >> Holding net lock for a long time is not good. But kernel crashes >> are worse. I prefer short stop of packet processing during ifconfig >> over a crash in netinet code. Here is the original bug report >> >> https://marc.info/?l=openbsd-bugs&m=166124228103568&w=2 >> >> I would try get this in and work on fine grained vio(4) locking >> afterwards. > > I agree. I have tried with a separate lock that serializes the whole > vio_ioctl function but I still get crashes with that: > > db_command_loop() at db_command_loop+0xea > db_trap(4,0) at db_trap+0x129 > db_ktrap(4,0,ffff8000239d0f60) at db_ktrap+0x111 > kerntrap(ffff8000239d0f60) at kerntrap+0xa7 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > rt_ifa_addlocal(ffff8000008d7b00) at rt_ifa_addlocal+0x2a > in6_update_ifa(ffff8000000af2a8,ffff8000239d1230,0) at > in6_update_ifa+0x581 > in6_ifattach_linklocal(ffff8000000af2a8,0) at in6_ifattach_linklocal+0x204 > in6_ifattach(ffff8000000af2a8) at in6_ifattach+0x103 > ifioctl(fffffd8051ac7dd8,801169ab,ffff8000239d13f0,ffff8000239834a8) at > ifioctl+0xc99 > sys_ioctl(ffff8000239834a8,ffff8000239d15a0,ffff8000239d1510) at > sys_ioctl+0x2af > syscall(ffff8000239d15a0) at syscall+0x588 > Xsyscall() at Xsyscall+0x128 > end of kernel > end trace frame: 0x773f17f14c30, count: -13 > Just for curiosity, can I see your diff? > I suspect that the global ifioctl() function needs some changes in order > to allow NIC drivers' ioctl functions to drop the NET_LOCK. Periodically, I think about that. At least for some pseudo drivers netlock could be avoided.