Index | Thread | Search

From:
"Lorenz (xha)" <me@xha.li>
Subject:
arm64 without swap
To:
tech@openbsd.org
Date:
Sun, 3 Nov 2024 11:02:23 +0100

Download raw body.

Thread
hi,

arm64 is currently broken without swap, processes are "randomly"
getting killed.  this is very easy to reproduce by booting without
swap.  daemons such as smtpd or ntpd, or the kernel/lib relink will
crash.

this is caused by one of the following switch cases in
do_el0_sync() (arm64/trap.c), which is called by "handle_el0_sync":

370         switch (exception) {

397         case EXCP_INSN_ABORT_L:
398                 udata_abort(frame, esr, far, 1);
399                 break;

410         case EXCP_DATA_ABORT_L:
411                 udata_abort(frame, esr, far, 0);
412                 break;

these call udata_abort(), which, among other things, calls uvm_fault():

110         /* Handle referenced/modified emulation */
111         if (pmap_fault_fixup(map->pmap, va, access_type))
112                 return;
113
114         error = uvm_fault(map, va, 0, access_type);
115
116         if (error == 0) {
117                 uvm_grow(p, va);
118                 return;
119         }
120
121         if (error == ENOMEM) {
122                 sig = SIGKILL;
123                 code = 0;

134         sv.sival_ptr = (void *)far;
135         trapsignal(p, sig, esr, code, sv);

uvm_fault() fails with ENOMEM and udata_aobrt() then sends SIGKILL to
the process.

in uvm_fault(), there are only two functions which could return
ENOMEM: uvm_fault_lower() and uvm_fault_upper().  both of these
functions have branches, that, if they *think* that they are out
of memory, check if uvm_swapisfull().  if that is the case, they
return with ENOMEM, otherwise they wait for more RAM.  of course,
uvm_swapisfull() always returns true if there is no swap!

i have observed a few interesting thing when building the kernel
on arm64 without swap.  first, make without any "-j" flags works,
no processes are getting killed.  second, when running with make
"-j2" or "-j4", processes are going to get killed, usually cc.
however, after repeating this for three or four times, the build
will work without processes getting killed.

i have tested this issue on an "Raspberry Pi 4 Model B" and
Firefly iCore-3588Q ("MNT RK3588 Processor Module").

i want to debug this further, since i have to investigate another,
probably related, pmap/uvm issue anyways.  however, before that i
wanted to ask if someone has dug into this issue or has any clues
on what could be going on.  thanks!