Index | Thread | Search

From:
Mike Larkin <mlarkin@nested.page>
Subject:
Re: Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel
To:
Brent Cook <busterb@gmail.com>
Cc:
Dave Voutila <dv@sisu.io>, Alexander Bluhm <bluhm@openbsd.org>, tech@openbsd.org
Date:
Mon, 6 Apr 2026 08:22:52 -0700

Download raw body.

Thread
  • Alexander Bluhm:

    Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel

  • hshoexer:

    Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel

  • On Mon, Apr 06, 2026 at 08:46:02AM -0500, Brent Cook wrote:
    > On Mon, Apr 6, 2026 at 7:06 AM Dave Voutila <dv@sisu.io> wrote:
    >
    > > Brent Cook <busterb@gmail.com> writes:
    > >
    > > > On Sun, Apr 5, 2026 at 2:36 PM Mike Larkin <mlarkin@nested.page> wrote:
    > > >
    > > >  On Sun, Apr 05, 2026 at 01:08:51PM -0500, Brent Cook wrote:
    > > >  > On Sun, Apr 05, 2026 at 04:58:11AM -0500, Brent Cook wrote:
    > > >  > > On Tue, Mar 31, 2026 at 1:35 PM Alexander Bluhm <bluhm@openbsd.org>
    > > wrote:
    > > >  > >
    > > >  > > > On Tue, Mar 31, 2026 at 09:30:30AM -0700, Mike Larkin wrote:
    > > >  > > > > rebuild the host kernel with HZ=1000 and see if running that
    > > makes the
    > > >  > > > problem
    > > >  > > > > go away.
    > > >  > > >
    > > >  > > > I have option HZ=1000 in host's sys/conf/GENERIC now.  It does
    > > not help.
    > > >  > > > Guest's GENERIC is unmodified.
    > > >  > > >
    > > >  > > > When the i386 guest boots, its GENERIC either says
    > > >  > > > cpu0: AMD EPYC 73F3 16-Core Processor ("AuthenticAMD" 686-class,
    > > 512KB L2
    > > >  > > > cache) 3.50 GHz, 19-01-01
    > > >  > > > or
    > > >  > > > cpu0: AMD EPYC 73F3 16-Core Processor ("AuthenticAMD" 686-class,
    > > 512KB L2
    > > >  > > > cache) 74 MHz, 19-01-01
    > > >  > > >
    > > >  > > >
    > > >  > > Same thing here, but this is a clue why RAMDISK_CD hangs but not
    > > GENERIC
    > > >  > > are hanging, at least for me.
    > > >  > >
    > > >  > > Adding pvclock0 allows RAMDISK_CD to boot 100% of the time in my
    > > local vmm
    > > >  > > machine.
    > > >  > >
    > > >  > > diff --git a/sys/arch/i386/conf/RAMDISK_CD
    > > b/sys/arch/i386/conf/RAMDISK_CD
    > > >  > > index c00f6dcbbb4..24bd4e465a9 100644
    > > >  > > --- a/sys/arch/i386/conf/RAMDISK_CD
    > > >  > > +++ b/sys/arch/i386/conf/RAMDISK_CD
    > > >  > > @@ -28,6 +28,7 @@ config                bsd root on rd0a swap on
    > > rd0b and
    > > >  > > wd0b and sd0b
    > > >  > >  mainbus0       at root
    > > >  > >
    > > >  > >  pvbus0         at mainbus0     # Paravirtual device bus
    > > >  > > +pvclock0       at pvbus?       # KVM/VMD paravirtual clock
    > > >  > >
    > > >  > >  acpi0          at bios?
    > > >  > >  #acpitimer*    at acpi?
    > > >  > >
    > > >  > >
    > > >  >
    > > >  > Ugh, that last paste was ugly, sorry. Switching mail clients:
    > > >  >
    > > >  > Would anyone be able to give this diff a try? It's working locally; I
    > > >  > was able reliably get the i8254 to calibrate to 4GHz, and see no hangs
    > > >  > either in RAMDISK_CD or when switching sysctl
    > > >  > kern.timecounter.hardware=i8254. Now my i386 VMM machine can
    > > >  > sysupgrade reliably
    > > >  >
    > > >  > The main change here is to set the timestamp right in the vcpu thread
    > > >  > instead of later in the event loop in i8253_reset. Otherwise, a latch
    > > >  > command issued before i8253_reset runs will be using a new start
    > > value,
    > > >  > but an old stale ts value. Maybe this makes the clock_gettime in
    > > >  > i8253_reset unnecessary, but I left it in.
    > > >  >
    > > >  > This also adjusts a couple of the register states to match that of
    > > >  > Bhyve when the mode is updated or there is a latch command:
    > > >  >
    > > >  >
    > > https://github.com/freebsd/freebsd-src/blob/main/sys/amd64/vmm/io/vatpit.c#L239
    > > >  >
    > > https://github.com/freebsd/freebsd-src/blob/main/sys/amd64/vmm/io/vatpit.c#L330
    > > >  >
    > > >  > Hopefully this makes sense, or if it's the wrong approach, at least
    > > >  > inspires the correct fix.
    > > >  >
    > > >
    > > >  this makes sense to me; I think the key fix here is the resetting of
    > > last_r
    > > >  to 2, indicating that 2 bytes must be read. this makes sense after a
    > > latch
    > > >  and makes more sense WRT the 8253 state machine.
    > > >
    > > >  The other bits may or may not be needed but they seem correct also
    > > and/or
    > > >  harmless, so I'd say leave them in for consistency.
    > > >
    > > >  I don't have any machines that exhibit the issue so as long as you get
    > > this
    > > >  tested on some various guest OSes, ok mlarkin. (eg test linux, openbsd
    > > amd64
    > > >  and openbsd i386, make sure nothing gets broken)
    > > >
    > > >  -ml
    > > >
    > > > Thanks, so far everything is looking good on Debian i386/amd64, Alpine
    > > x86_64, and the OpenBSDs.
    > > >
    > > > Alpine x86 32-bit is dying when booting the installer with what looks
    > > like a memory probe that isn't handled yet by vmm.
    > > >
    > > > vmx_handle_np_fault: unknown memory type 2 for GPA 0xc4e89a0c
    > > >
    > > >  But I think that is a pre-existing issue.
    > >
    > > Yeah that looks like the result of unemulated mmio and you can disregard
    > > that for moving forward with this diff. Can you share the Alpine version
    > > and if it's iso vs. disk image?
    > >
    >
    > It was the ISO, started like so:
    >
    > doas vmctl start -m 1G -L -i 1 -c -r alpine-standard-3.23.3-x86.iso -d
    > alpine-x86.qcow
    > alpine
    > -x86 Giving it 4G fills in the memory space though, and allows it to boot
    > successfully.
    >
    >
    > >
    > > Comment below though...
    > >
    > > >  > diff --git a/usr.sbin/vmd/i8253.c b/usr.sbin/vmd/i8253.c
    > > >  > index 9e32d9382b6..091716a2f32 100644
    > > >  > --- a/usr.sbin/vmd/i8253.c
    > > >  > +++ b/usr.sbin/vmd/i8253.c
    > > >  > @@ -262,12 +262,14 @@ vcpu_exit_i8253(struct vm_run_params *vrp)
    > > >  >                                           ticks %
    > > i8253_channel[sel].start;
    > > >  >                               } else
    > > >  >                                       i8253_channel[sel].olatch = 0;
    > > >  > +                             i8253_channel[sel].last_r = 2;
    > >
    > > I'm not convinced the above line is correct given how last_r and last_w
    > > are working in i8253.c...but they themselves may be incorrect. Both
    > > fields are used only to index the last byte of the 2-byte register being
    > > read or written and values other than 0 and 1 are never tested.
    > >
    > > Apparently we're initializing last_r to 1. Checks are only done for if
    > > last_r == 0 or non-zero.
    > >
    > > Given that, I think this _should_ be set to 1, not 2.
    > >
    > > Honestly this could just be a big mistake in the emulation...but
    > > semantically 2 doesn't fit the rest of the code.
    > >
    >
    > Yeah, got it - it should be a 1 here. I got things confused when comparing
    > different i8253 emulations.
    >
    > vmm is using it a binary state value, while bhyve uses 'olbyte' as a kind
    > of tri-state, where when it gets to 0, the output latch follows the counter
    > instead of shifting bytes from the last latched value:
    >
    > https://github.com/freebsd/freebsd-src/blob/main/sys/amd64/vmm/io/vatpit.c#L396
    >
    >
    
    I think the key here is "not 0". If 1 works, then we should use that to be
    consistent.
    
    > >
    > > >  >                               goto ret;
    > > >  >                       } else if (rw != TIMER_16BIT) {
    > > >  >                               log_warnx("%s: i8253 PIT: unsupported
    > > counter "
    > > >  >                                   "%d rw mode 0x%x selected",
    > > __func__,
    > > >  >                                   sel, (rw & TIMER_16BIT));
    > > >  >                       }
    > > >  > +                     i8253_channel[sel].last_w = 0;
    > > >  >                       i8253_channel[sel].mode = (out_data & 0xe) >> 1;
    > > >  >
    > > >  >                       goto ret;
    > > >  > @@ -293,6 +295,9 @@ vcpu_exit_i8253(struct vm_run_params *vrp)
    > > >  >                               if (i8253_channel[sel].start == 0)
    > > >  >                                       i8253_channel[sel].start =
    > > 0xffff;
    > > >  >
    > > >  > +                             clock_gettime(CLOCK_MONOTONIC,
    > > >  > +                                 &i8253_channel[sel].ts);
    > > >  > +
    > > >  >                               DPRINTF("%s: channel %d reset, mode=%d,
    > > "
    > > >  >                                   "start=%d\n", __func__,
    > > >  >                                   sel, i8253_channel[sel].mode,
    > >
    
    
  • Alexander Bluhm:

    Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel

  • hshoexer:

    Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel