Mailing List Archive

On Tue, Mar 31, 2026 at 06:01:19PM +0200, hshoexer wrote: > Hi, > > I've done some debugging on this: I've hacked vmd/vmm to inject a > NMI into the guest. On OpenBSD we drop into ddb when receiving a > NMI. I've built a i386 bsd.rd with ddb enabled. > > For me i386 bsd.rd hangs with a chance of more then 50% when booting > through bios. In that case the vcpu thread goes up to 100% CPU > usage. Breaking into ddb by NMI injection I see traces like this: > > ... > scsibus2 at softraid0: 256 targets > root on rd0a swap on rd0b dump on rd0b > WARNING: CHECK AND RESET THE DATE! > NMI ... going to debugger > Stopped at schedclock+0x3e: leal 0(%eax,%ecx,2),%ecx > ddb> bt > schedclock(d72574f0) at schedclock+0x3e > statclock(d0acd498,e4abdd98,0) at statclock+0x213 > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > clockintr(e4abdd98) at clockintr+0x3c > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > bzero() at bzero+0x23 > pmap_zero_page(d5247088) at pmap_zero_page+0xf > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > ddb> > schedclock(d72574f0) at schedclock+0x3e > ddb> c > NMI ... going to debugger > Stopped at statclock+0x1f4: cmpl %esi,%ebx > ddb> bt > statclock(d0acd498,e4abdd98,0) at statclock+0x1f4 > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > clockintr(e4abdd98) at clockintr+0x3c > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > bzero() at bzero+0x23 > pmap_zero_page(d5247088) at pmap_zero_page+0xf > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > ddb> c > NMI ... going to debugger > Stopped at statclock+0x21b: addl $0x1,%eax > ddb> bt > statclock(d0acd498,e4abdd98,0) at statclock+0x21b > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > clockintr(e4abdd98) at clockintr+0x3c > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > bzero() at bzero+0x23 > pmap_zero_page(d5247088) at pmap_zero_page+0xf > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > ddb> c > NMI ... going to debugger > Stopped at statclock+0x1fa: jnb statclock+0x225 > ddb> bt > statclock(d0acd498,e4abdd98,0) at statclock+0x1fa > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 > clockintr(e4abdd98) at clockintr+0x3c > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > bzero() at bzero+0x23 > pmap_zero_page(d5247088) at pmap_zero_page+0xf > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 > ddb> c boot reboot > rebooting... > > So bsd.rd is busy handling clock interrupts. However, I've verified > that vmd has only injecting a single clock interrupt when the hang > is kicked off. > > With a "if count > 100 printf" in clockintr_hardclock() I see this: > > clockintr_hardclock: count 359957 > clockintr_hardclock: count 32604107809 > > So it's not flodding the guest with interrupts. It seems to be > nsec_advance() (used to calculate count) coming up with an high > value for "elapsed". Causing not an infinite but very long loop, > repeatedly (first 359957, then 32604107809 iterations). > > On i386 bsd.rd have kern.timecounter.hardware=i8254. On i386 GENERIC > we have kern.timecounter.hardware=pvclock. > > bluhm@ came up with this experiment: On i386 GENERIC switching > kern.timecounter.hardware from pvclock to i8254 is likely to cause > an instant hang. In that case ddb shows a trace similar to the > hang on i386 bsd.rd: > > <root@i386:1># sysctl kern.timecounter.hardware > kern.timecounter.hardware=pvclock0 > <root@i386:2># sysctl kern.timecounter.hardware=i8254 > kern.timecounter.hardware: pvclock0 ->clockintr_hardclock: cound 359956 > clockintr_hardclock: cound 9992852590 > NMI ... going to debugger > Stopped at i8254_get_timecount+0xd: movzbl %al,%edx > ddb> > ddb> bt > i8254_get_timecount(d0f57528) at i8254_get_timecount+0xd > nanouptime(f6608668) at nanouptime+0x23 > timeout_hardclock_update() at timeout_hardclock_update+0x1f9 > clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea > clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed > clockintr(f6608714) at clockintr+0x3f > intr_handler(f6608714,d78e5dc0) at intr_handler+0x18 > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > cpu_idle_cycle() at cpu_idle_cycle+0xf > ddb> c > NMI ... going to debugger > Stopped at i8254_get_timecount+0xb: inb $0x40,%al > ddb> bt > i8254_get_timecount(d0f57528) at i8254_get_timecount+0xb > tc_windup(0,0,0) at tc_windup+0x74 > tc_ticktock() at tc_ticktock+0x3c > hardclock(f6608714) at hardclock+0x8 > clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea > clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed > clockintr(f6608714) at clockintr+0x3f > intr_handler(f6608714,d78e5dc0) at intr_handler+0x18 > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb > cpu_idle_cycle() at cpu_idle_cycle+0xf > ddb> c > NMI ... going to debugger > Stopped at i8254_get_timecount+0xb: inb $0x40,%al > ddb> reboot > rebooting... > > And the printf indicates we are looping due to high counter values. > > Doing similar experiments with qemu and kvm we don't see hangs. So > this might be related to vmd i8254 emulation? > > Take care, > Hans-Joerg > maybe the problem with i8253 has always been its periodicity; I think the counter wraps in like 35ms or something as it's only 16 bit and has a fixed frequency. we can handle one wrap (eg, "count went negative") but not two. last time I looked at this, there was a timeout problem here due to HZ=100 on the host causing up to 20ms delay between timeouts firing (this was years ago though, and I know there was some work done since then). while the problem may not be 100% related to HZ, various things in the scheduler have changed over the years (and changes elsewhere that lead to different run profiles now) that maybe we are victims of excessive latency on the i8253 latching and missing two wraps. > For me i386 bsd.rd hangs with a chance of more then 50% when booting > through bios. here is an experiment you can do: rebuild the host kernel with HZ=1000 and see if running that makes the problem go away. if that's the case then the only real option to "fix" this is to move i8253 into vmm(4) so we don't have to make multiple round trips to vmd(8) ; one set of round trips to write the counter selector, then another to latch the counter value (and then a couple more to read the values but since it's latched at that point, those last ones aren't sensitive to the latency). N.B. - rebuilding with new HZ I think means a make clean then full build. -ml

- 2026-03-30 06:16 Florian Obser:
  Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel

2026-03-31 16:01 hshoexer:

Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel

2026-03-31 16:30 Mike Larkin:
Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel
- 2026-03-31 16:36 Mike Larkin:
  Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel
- 2026-03-31 18:33 Alexander Bluhm:
  Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel