Index | Thread | Search

From:
Mike Larkin <mlarkin@nested.page>
Subject:
Re: Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel
To:
tech@openbsd.org
Date:
Tue, 31 Mar 2026 09:30:30 -0700

Download raw body.

Thread
  • hshoexer:

    Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel

  • On Tue, Mar 31, 2026 at 06:01:19PM +0200, hshoexer wrote:
    > Hi,
    >
    > I've done some debugging on this:  I've hacked vmd/vmm to inject a
    > NMI into the guest.  On OpenBSD we drop into ddb when receiving a
    > NMI.  I've built a i386 bsd.rd with ddb enabled.
    >
    > For me i386 bsd.rd hangs with a chance of more then 50% when booting
    > through bios.  In that case the vcpu thread goes up to 100% CPU
    > usage.  Breaking into ddb by NMI injection I see traces like this:
    >
    > ...
    > scsibus2 at softraid0: 256 targets
    > root on rd0a swap on rd0b dump on rd0b
    > WARNING: CHECK AND RESET THE DATE!
    > NMI ... going to debugger
    > Stopped at      schedclock+0x3e:        leal    0(%eax,%ecx,2),%ecx
    > ddb> bt
    > schedclock(d72574f0) at schedclock+0x3e
    > statclock(d0acd498,e4abdd98,0) at statclock+0x213
    > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2
    > clockintr(e4abdd98) at clockintr+0x3c
    > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13
    > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb
    > bzero() at bzero+0x23
    > pmap_zero_page(d5247088) at pmap_zero_page+0xf
    > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92
    > ddb>
    > schedclock(d72574f0) at schedclock+0x3e
    > ddb> c
    > NMI ... going to debugger
    > Stopped at      statclock+0x1f4:        cmpl    %esi,%ebx
    > ddb> bt
    > statclock(d0acd498,e4abdd98,0) at statclock+0x1f4
    > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2
    > clockintr(e4abdd98) at clockintr+0x3c
    > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13
    > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb
    > bzero() at bzero+0x23
    > pmap_zero_page(d5247088) at pmap_zero_page+0xf
    > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92
    > ddb> c
    > NMI ... going to debugger
    > Stopped at      statclock+0x21b:        addl    $0x1,%eax
    > ddb> bt
    > statclock(d0acd498,e4abdd98,0) at statclock+0x21b
    > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2
    > clockintr(e4abdd98) at clockintr+0x3c
    > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13
    > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb
    > bzero() at bzero+0x23
    > pmap_zero_page(d5247088) at pmap_zero_page+0xf
    > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92
    > ddb> c
    > NMI ... going to debugger
    > Stopped at      statclock+0x1fa:        jnb     statclock+0x225
    > ddb> bt
    > statclock(d0acd498,e4abdd98,0) at statclock+0x1fa
    > clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2
    > clockintr(e4abdd98) at clockintr+0x3c
    > intr_handler(e4abdd98,d727bd40) at intr_handler+0x13
    > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb
    > bzero() at bzero+0x23
    > pmap_zero_page(d5247088) at pmap_zero_page+0xf
    > uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92
    > ddb> c boot reboot
    > rebooting...
    >
    > So bsd.rd is busy handling clock interrupts.  However, I've verified
    > that vmd has only injecting a single clock interrupt when the hang
    > is kicked off.
    >
    > With a "if count > 100 printf" in clockintr_hardclock() I see this:
    >
    > clockintr_hardclock: count 359957
    > clockintr_hardclock: count 32604107809
    >
    > So it's not flodding the guest with interrupts.  It seems to be
    > nsec_advance() (used to calculate count) coming up with an high
    > value for "elapsed".  Causing not an infinite but very long loop,
    > repeatedly (first 359957, then 32604107809 iterations).
    >
    > On i386 bsd.rd have kern.timecounter.hardware=i8254.  On i386 GENERIC
    > we have kern.timecounter.hardware=pvclock.
    >
    > bluhm@ came up with this experiment:  On i386 GENERIC switching
    > kern.timecounter.hardware from pvclock to i8254 is likely to cause
    > an instant hang.  In that case ddb shows a trace similar to the
    > hang on i386 bsd.rd:
    >
    > <root@i386:1># sysctl kern.timecounter.hardware
    > kern.timecounter.hardware=pvclock0
    > <root@i386:2># sysctl kern.timecounter.hardware=i8254
    > kern.timecounter.hardware: pvclock0 ->clockintr_hardclock: cound 359956
    > clockintr_hardclock: cound 9992852590
    > NMI ... going to debugger
    > Stopped at      i8254_get_timecount+0xd:        movzbl  %al,%edx
    > ddb>
    > ddb> bt
    > i8254_get_timecount(d0f57528) at i8254_get_timecount+0xd
    > nanouptime(f6608668) at nanouptime+0x23
    > timeout_hardclock_update() at timeout_hardclock_update+0x1f9
    > clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea
    > clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed
    > clockintr(f6608714) at clockintr+0x3f
    > intr_handler(f6608714,d78e5dc0) at intr_handler+0x18
    > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb
    > cpu_idle_cycle() at cpu_idle_cycle+0xf
    > ddb> c
    > NMI ... going to debugger
    > Stopped at      i8254_get_timecount+0xb:        inb     $0x40,%al
    > ddb> bt
    > i8254_get_timecount(d0f57528) at i8254_get_timecount+0xb
    > tc_windup(0,0,0) at tc_windup+0x74
    > tc_ticktock() at tc_ticktock+0x3c
    > hardclock(f6608714) at hardclock+0x8
    > clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea
    > clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed
    > clockintr(f6608714) at clockintr+0x3f
    > intr_handler(f6608714,d78e5dc0) at intr_handler+0x18
    > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb
    > cpu_idle_cycle() at cpu_idle_cycle+0xf
    > ddb> c
    > NMI ... going to debugger
    > Stopped at      i8254_get_timecount+0xb:        inb     $0x40,%al
    > ddb> reboot
    > rebooting...
    >
    > And the printf indicates we are looping due to high counter values.
    >
    > Doing similar experiments with qemu and kvm we don't see hangs. So
    > this might be related to vmd i8254 emulation?
    >
    > Take care,
    > Hans-Joerg
    >
    
    maybe
    
    the problem with i8253 has always been its periodicity; I think the counter
    wraps in like 35ms or something as it's only 16 bit and has a fixed frequency.
    
    we can handle one wrap (eg, "count went negative") but not two.
    
    last time I looked at this, there was a timeout problem here due to HZ=100 on
    the host causing up to 20ms delay between timeouts firing (this was years ago
    though, and I know there was some work done since then).
    
    while the problem may not be 100% related to HZ, various things in the scheduler
    have changed over the years (and changes elsewhere that lead to different
    run profiles now) that maybe we are victims of excessive latency on the i8253
    latching and missing two wraps.
    
    > For me i386 bsd.rd hangs with a chance of more then 50% when booting
    > through bios.
    
    here is an experiment you can do:
    
    rebuild the host kernel with HZ=1000 and see if running that makes the problem
    go away. if that's the case then the only real option to "fix" this is to move
    i8253 into vmm(4) so we don't have to make multiple round trips to vmd(8) ; one
    set of round trips to write the counter selector, then another to latch the
    counter value (and then a couple more to read the values but since it's latched
    at that point, those last ones aren't sensitive to the latency).
    
    N.B. - rebuilding with new HZ I think means a make clean then full build.
    
    -ml
    
    
  • hshoexer:

    Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel