Download raw body.
Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel
Hi, I've done some debugging on this: I've hacked vmd/vmm to inject a NMI into the guest. On OpenBSD we drop into ddb when receiving a NMI. I've built a i386 bsd.rd with ddb enabled. For me i386 bsd.rd hangs with a chance of more then 50% when booting through bios. In that case the vcpu thread goes up to 100% CPU usage. Breaking into ddb by NMI injection I see traces like this: ... scsibus2 at softraid0: 256 targets root on rd0a swap on rd0b dump on rd0b WARNING: CHECK AND RESET THE DATE! NMI ... going to debugger Stopped at schedclock+0x3e: leal 0(%eax,%ecx,2),%ecx ddb> bt schedclock(d72574f0) at schedclock+0x3e statclock(d0acd498,e4abdd98,0) at statclock+0x213 clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 clockintr(e4abdd98) at clockintr+0x3c intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb bzero() at bzero+0x23 pmap_zero_page(d5247088) at pmap_zero_page+0xf uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 ddb> schedclock(d72574f0) at schedclock+0x3e ddb> c NMI ... going to debugger Stopped at statclock+0x1f4: cmpl %esi,%ebx ddb> bt statclock(d0acd498,e4abdd98,0) at statclock+0x1f4 clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 clockintr(e4abdd98) at clockintr+0x3c intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb bzero() at bzero+0x23 pmap_zero_page(d5247088) at pmap_zero_page+0xf uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 ddb> c NMI ... going to debugger Stopped at statclock+0x21b: addl $0x1,%eax ddb> bt statclock(d0acd498,e4abdd98,0) at statclock+0x21b clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 clockintr(e4abdd98) at clockintr+0x3c intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb bzero() at bzero+0x23 pmap_zero_page(d5247088) at pmap_zero_page+0xf uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 ddb> c NMI ... going to debugger Stopped at statclock+0x1fa: jnb statclock+0x225 ddb> bt statclock(d0acd498,e4abdd98,0) at statclock+0x1fa clockintr_dispatch(e4abdd98) at clockintr_dispatch+0x1d2 clockintr(e4abdd98) at clockintr+0x3c intr_handler(e4abdd98,d727bd40) at intr_handler+0x13 Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb bzero() at bzero+0x23 pmap_zero_page(d5247088) at pmap_zero_page+0xf uvm_pagezero_thread(d72574f0) at uvm_pagezero_thread+0x92 ddb> c boot reboot rebooting... So bsd.rd is busy handling clock interrupts. However, I've verified that vmd has only injecting a single clock interrupt when the hang is kicked off. With a "if count > 100 printf" in clockintr_hardclock() I see this: clockintr_hardclock: count 359957 clockintr_hardclock: count 32604107809 So it's not flodding the guest with interrupts. It seems to be nsec_advance() (used to calculate count) coming up with an high value for "elapsed". Causing not an infinite but very long loop, repeatedly (first 359957, then 32604107809 iterations). On i386 bsd.rd have kern.timecounter.hardware=i8254. On i386 GENERIC we have kern.timecounter.hardware=pvclock. bluhm@ came up with this experiment: On i386 GENERIC switching kern.timecounter.hardware from pvclock to i8254 is likely to cause an instant hang. In that case ddb shows a trace similar to the hang on i386 bsd.rd: <root@i386:1># sysctl kern.timecounter.hardware kern.timecounter.hardware=pvclock0 <root@i386:2># sysctl kern.timecounter.hardware=i8254 kern.timecounter.hardware: pvclock0 ->clockintr_hardclock: cound 359956 clockintr_hardclock: cound 9992852590 NMI ... going to debugger Stopped at i8254_get_timecount+0xd: movzbl %al,%edx ddb> ddb> bt i8254_get_timecount(d0f57528) at i8254_get_timecount+0xd nanouptime(f6608668) at nanouptime+0x23 timeout_hardclock_update() at timeout_hardclock_update+0x1f9 clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed clockintr(f6608714) at clockintr+0x3f intr_handler(f6608714,d78e5dc0) at intr_handler+0x18 Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb cpu_idle_cycle() at cpu_idle_cycle+0xf ddb> c NMI ... going to debugger Stopped at i8254_get_timecount+0xb: inb $0x40,%al ddb> bt i8254_get_timecount(d0f57528) at i8254_get_timecount+0xb tc_windup(0,0,0) at tc_windup+0x74 tc_ticktock() at tc_ticktock+0x3c hardclock(f6608714) at hardclock+0x8 clockintr_hardclock(d0f5349c,f6608714,0) at clockintr_hardclock+0xea clockintr_dispatch(f6608714) at clockintr_dispatch+0x1ed clockintr(f6608714) at clockintr+0x3f intr_handler(f6608714,d78e5dc0) at intr_handler+0x18 Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0xfb cpu_idle_cycle() at cpu_idle_cycle+0xf ddb> c NMI ... going to debugger Stopped at i8254_get_timecount+0xb: inb $0x40,%al ddb> reboot rebooting... And the printf indicates we are looping due to high counter values. Doing similar experiments with qemu and kvm we don't see hangs. So this might be related to vmd i8254 emulation? Take care, Hans-Joerg
Sysupgrade on i386 vmm to current snapshot hangs on boot into upgrade kernel