From: Ingo Schwarze <schwarze@usta.de>
Subject: timing-dependent(?) display in xterm(1)
To: tech@openbsd.org
Date: Sat, 10 May 2025 16:19:06 +0200

Hello,

while wotking on VI command line editing mode in ksh(1), i stumbled
over the following, which i believe might possibly be a quirk
in xterm(1).  I'm not yet sure what is going on, hence the question
mark after the word "timing-dependent".

I'm running xterm(1) in the UTF-8 capable mode that is the default
on OpenBSD.  First observe that the command

   $ printf "\xc3\xa9\n"

causes xterm(1) to display an "e accent aigu" as expected.
This is not timing-dependent, the two following commands
display the same "e accent aigu" character just fine:

   $ printf "\xc3"; printf "\xa9\n"
   $ printf "\xc3"; sleep 1; printf "\xa9\n"

Now observe that the command

   $ printf "\x80\n"

displays a U+FFFD REPLACEMENT CHARACTER because a lonely UTF-8
continuation byte is not valid UTF-8.  So far, so good.

Now the strangeness begins.  The command

   $ printf "\x80x\n"

does *not* display th replacement character, but only the 'x' character,
which might be a potential bug (i'm not yet sure whether it is).
Finally, if i change the timing(?) as follows,

   $ printf "\x80"; printf "x\n"

i see both the replacement character and the 'x' as expected.

Does anybody have an idea what might be going on here?
Myself, i suspect this might be a bug in xterm(1) because
it seems to me what is displayed should only depend on the
sequence of bytes received, not on the times at which the
individual bytes arrive.

I found this quirk after finding a bug in ed_mov_opt() in vi.c
in our ksh(1), writing a patch to fix that bug (i did not yet send
that patch out because it is not yet suffieciently tested), and then
while testing the patch, even though it causes a stream of bytes that
i deem correct, xterm(1) sometimes omits a UTF-8 replacement character
even though it receives an UTF-8 continuation byte.  This problem
persists even when i insert an fflush(3) call after writing each byte
to disable stdio buffering in the shell.  The problem goes away when
i attach egdb(1) to the ksh(1) process and manually step through
the ksh(1) code.  I suspect the reason that solves the problem is
that it causes the bytes to arrive at xterm(1) one by one, with
sufficient distance in time.

I must say Heisenbugs are among my favourites: as soon as you try
to observe them in a debugger, they are no longer there.  =:c(

I'm not yet sure which is the best course of action.

 (a) assume xterm(1) is broken, shrug, and fix ksh(1) only for now
     arguing that the shell is more important than the terminal
     (and likely the code is much simpler in the shell than in
      the terminal, so progress will likely be faster)
 (b) or suspend work on ksh(1), debug xterm(1) first,
     fix that, then return to the shell once xterm(1) works
     (because one could maybe argue that a good shell buys us
      little without a working terminal)
 (c) or is there an even better option or explanation?

Yours,
  Ingo