From: Mike Larkin Subject: Re: Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel To: tech@openbsd.org Date: Tue, 31 Mar 2026 09:36:13 -0700 On Tue, Mar 31, 2026 at 09:30:30AM -0700, Mike Larkin wrote: > On Tue, Mar 31, 2026 at 06:01:19PM +0200, hshoexer wrote: > > Hi, > > > > I've done some debugging on this: I've hacked vmd/vmm to inject a > > NMI into the guest. On OpenBSD we drop into ddb when receiving a > > NMI. I've built a i386 bsd.rd with ddb enabled. > > > > For me i386 bsd.rd hangs with a chance of more then 50% when booting > > through bios. In that case the vcpu thread goes up to 100% CPU > > usage. Breaking into ddb by NMI injection I see traces like this: > > > > ... > > scsibus2 at softraid0: 256 targets > > root on rd0a swap on rd0b dump on rd0b > > WARNING: CHECK AND RESET THE DATE! > > NMI ... going to debugger > > Stopped at schedclock+0x3e: leal 0(%eax,%ecx,2),%ecx > > ddb> bt > > schedclock(d72574f0) at schedclock+0x3e > > statclock(d0acd498,e4abdd98,0) at statclock+0x213 > > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > > clockintr(e4abdd98) at clockintr+0x3c > > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > > bzero() at bzero+0x23 > > pmap_zero_page(d5247088) at pmap_zero_page+0xf > > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > > ddb> > > schedclock(d72574f0) at schedclock+0x3e > > ddb> c > > NMI ... going to debugger > > Stopped at statclock+0x1f4: cmpl %esi,%ebx > > ddb> bt > > statclock(d0acd498,e4abdd98,0) at statclock+0x1f4 > > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > > clockintr(e4abdd98) at clockintr+0x3c > > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > > bzero() at bzero+0x23 > > pmap_zero_page(d5247088) at pmap_zero_page+0xf > > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > > ddb> c > > NMI ... going to debugger > > Stopped at statclock+0x21b: addl $0x1,%eax > > ddb> bt > > statclock(d0acd498,e4abdd98,0) at statclock+0x21b > > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > > clockintr(e4abdd98) at clockintr+0x3c > > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > > bzero() at bzero+0x23 > > pmap_zero_page(d5247088) at pmap_zero_page+0xf > > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > > ddb> c > > NMI ... going to debugger > > Stopped at statclock+0x1fa: jnb statclock+0x225 > > ddb> bt > > statclock(d0acd498,e4abdd98,0) at statclock+0x1fa > > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > > clockintr(e4abdd98) at clockintr+0x3c > > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > > bzero() at bzero+0x23 > > pmap_zero_page(d5247088) at pmap_zero_page+0xf > > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > > ddb> c boot reboot > > rebooting... > > > > So bsd.rd is busy handling clock interrupts. However, I've verified > > that vmd has only injecting a single clock interrupt when the hang > > is kicked off. > > > > With a "if count > 100 printf" in clockintr_hardclock() I see this: > > > > clockintr_hardclock: count 359957 > > clockintr_hardclock: count 32604107809 > > > > So it's not flodding the guest with interrupts. It seems to be > > nsec_advance() (used to calculate count) coming up with an high > > value for "elapsed". Causing not an infinite but very long loop, > > repeatedly (first 359957, then 32604107809 iterations). > > > > On i386 bsd.rd have kern.timecounter.hardware=i8254. On i386 GENERIC > > we have kern.timecounter.hardware=pvclock. > > > > bluhm@ came up with this experiment: On i386 GENERIC switching > > kern.timecounter.hardware from pvclock to i8254 is likely to cause > > an instant hang. In that case ddb shows a trace similar to the > > hang on i386 bsd.rd: > > > > # sysctl kern.timecounter.hardware > > kern.timecounter.hardware=pvclock0 > > # sysctl kern.timecounter.hardware=i8254 > > kern.timecounter.hardware: pvclock0 ->clockintr_hardclock: cound 359956 > > clockintr_hardclock: cound 9992852590 > > NMI ... going to debugger > > Stopped at i8254_get_timecount+0xd: movzbl %al,%edx > > ddb> > > ddb> bt > > i8254_get_timecount(d0f57528) at i8254_get_timecount+0xd > > nanouptime(f6608668) at nanouptime+0x23 > > timeout_hardclock_update() at timeout_hardclock_update+0x1f9 > > clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea > > clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed > > clockintr(f6608714) at clockintr+0x3f > > intr_handler(f6608714,d78e5dc0) at intr_handler+0x18 > > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > > cpu_idle_cycle() at cpu_idle_cycle+0xf > > ddb> c > > NMI ... going to debugger > > Stopped at i8254_get_timecount+0xb: inb $0x40,%al > > ddb> bt > > i8254_get_timecount(d0f57528) at i8254_get_timecount+0xb > > tc_windup(0,0,0) at tc_windup+0x74 > > tc_ticktock() at tc_ticktock+0x3c > > hardclock(f6608714) at hardclock+0x8 > > clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea > > clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed > > clockintr(f6608714) at clockintr+0x3f > > intr_handler(f6608714,d78e5dc0) at intr_handler+0x18 > > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > > cpu_idle_cycle() at cpu_idle_cycle+0xf > > ddb> c > > NMI ... going to debugger > > Stopped at i8254_get_timecount+0xb: inb $0x40,%al > > ddb> reboot > > rebooting... > > > > And the printf indicates we are looping due to high counter values. > > > > Doing similar experiments with qemu and kvm we don't see hangs. So > > this might be related to vmd i8254 emulation? > > > > Take care, > > Hans-Joerg > > > > maybe > > the problem with i8253 has always been its periodicity; I think the counter > wraps in like 35ms or something as it's only 16 bit and has a fixed frequency. > > we can handle one wrap (eg, "count went negative") but not two. > > last time I looked at this, there was a timeout problem here due to HZ=100 on > the host causing up to 20ms delay between timeouts firing (this was years ago > though, and I know there was some work done since then). > > while the problem may not be 100% related to HZ, various things in the scheduler > have changed over the years (and changes elsewhere that lead to different > run profiles now) that maybe we are victims of excessive latency on the i8253 > latching and missing two wraps. > > > For me i386 bsd.rd hangs with a chance of more then 50% when booting > > through bios. > > here is an experiment you can do: > > rebuild the host kernel with HZ=1000 and see if running that makes the problem > go away. if that's the case then the only real option to "fix" this is to move > i8253 into vmm(4) so we don't have to make multiple round trips to vmd(8) ; one * another option would be to finish the tickless work cheloha started but I think that's a long ways away from being complete. > set of round trips to write the counter selector, then another to latch the > counter value (and then a couple more to read the values but since it's latched > at that point, those last ones aren't sensitive to the latency). > > N.B. - rebuilding with new HZ I think means a make clean then full build. > > -ml >