From: "Lorenz (xha)" Subject: arm64 without swap To: tech@openbsd.org Date: Sun, 3 Nov 2024 11:02:23 +0100 hi, arm64 is currently broken without swap, processes are "randomly" getting killed. this is very easy to reproduce by booting without swap. daemons such as smtpd or ntpd, or the kernel/lib relink will crash. this is caused by one of the following switch cases in do_el0_sync() (arm64/trap.c), which is called by "handle_el0_sync": 370 switch (exception) { 397 case EXCP_INSN_ABORT_L: 398 udata_abort(frame, esr, far, 1); 399 break; 410 case EXCP_DATA_ABORT_L: 411 udata_abort(frame, esr, far, 0); 412 break; these call udata_abort(), which, among other things, calls uvm_fault(): 110 /* Handle referenced/modified emulation */ 111 if (pmap_fault_fixup(map->pmap, va, access_type)) 112 return; 113 114 error = uvm_fault(map, va, 0, access_type); 115 116 if (error == 0) { 117 uvm_grow(p, va); 118 return; 119 } 120 121 if (error == ENOMEM) { 122 sig = SIGKILL; 123 code = 0; 134 sv.sival_ptr = (void *)far; 135 trapsignal(p, sig, esr, code, sv); uvm_fault() fails with ENOMEM and udata_aobrt() then sends SIGKILL to the process. in uvm_fault(), there are only two functions which could return ENOMEM: uvm_fault_lower() and uvm_fault_upper(). both of these functions have branches, that, if they *think* that they are out of memory, check if uvm_swapisfull(). if that is the case, they return with ENOMEM, otherwise they wait for more RAM. of course, uvm_swapisfull() always returns true if there is no swap! i have observed a few interesting thing when building the kernel on arm64 without swap. first, make without any "-j" flags works, no processes are getting killed. second, when running with make "-j2" or "-j4", processes are going to get killed, usually cc. however, after repeating this for three or four times, the build will work without processes getting killed. i have tested this issue on an "Raspberry Pi 4 Model B" and Firefly iCore-3588Q ("MNT RK3588 Processor Module"). i want to debug this further, since i have to investigate another, probably related, pmap/uvm issue anyways. however, before that i wanted to ask if someone has dug into this issue or has any clues on what could be going on. thanks!