Download raw body.
timing-dependent(?) display in xterm(1)
Hello Walter, Walter Alejandro Iglesias wrote on Sat, May 10, 2025 at 08:46:42PM +0200: > On Sat, May 10, 2025 at 04:19:06PM +0200, Ingo Schwarze wrote: >> while wotking on VI command line editing mode in ksh(1), i stumbled >> over the following, which i believe might possibly be a quirk >> in xterm(1). I'm not yet sure what is going on, hence the question >> mark after the word "timing-dependent". >> [...] > I've been reading this: > https://invisible-island.net/xterm/bad-utf8/ Thank you, that is interesting. It does not talk about my question, though (xterm(1) output being inconsistent with itself). To the contrary, that document appears to implicitly support my assumption that every terminal should represent every sequence of input bytes in some well-defined way, even though many differrent ways to handle sequences that are invalid in UTF-8 exist. But the implicit assumption seems to be that every terminal should pick one way, not change its ways depending on the weather. > And downloaded this file: > https://invisible-island.net/xterm/bad-utf8/UTF-8-test-20150828.txt Right, that is somewhat similar to the mandoc UTF-8 test suite - except that it's not a test suite. > In many cases, under xterm, UTF-8 continuation bytes followed by an > ASCII character do weird things. The most weird case is: > > $ printf "\x9ax\n" $ printf "\x9ax\n" x ^[[?1;2c $ 1;2c WOW. That is horrific. Note the "1;2c" ends up in the editable area for the next command line, so if i press ENTER again, i get ksh: 1: not found ksh: 2c: not found $ That's not only utterly broken but maybe even a security risk, in particular conidering that the printed digit-punctuation-salad looks suspiciously similar to ANSI escape sequences - and then we have this in /usr/src/gnu/usr.bin/perl/lib/unicore/UnicodeData.txt: 009A;<control>;Cc;0;BN;;;;;N;SINGLE CHARACTER INTRODUCER;;;; 009B;<control>;Cc;0;BN;;;;;N;CONTROL SEQUENCE INTRODUCER;;;; So i suspect that xterm(1) incorrectly reinterprets the invalid byte 0x9a to U+009A (which would be c2 9a in UTF-8, not a lone 9a) and then proceeds to use that incorrectly translated character as a C1 control character. That is all the more outrageous because IIRC, xterm(1) promises to never interpret C1 controls in UTF-8 mode. I believe i tested that this was actually the case some years ago. It appears this got broken, and in our default configuration, xterm(1) now not only interprets C1 sequences from UTF-8 - which is a very unsafe practice in itself - but even *generates* such sequences from invalid input in invalid ways. What this boils down to is that at least one security features i believed we had in our xterm(1) is no longer effective. At least that answers my question: re-securing xterm(1) is clearly more important than improving UTF-8 support in ksh(8). I cannot say yet how this happened, but *if* this was caused by an update to xterm(1) from upstream, then that means trusting upstream with xterm(1) is not a good idea. Yours, Ingo
timing-dependent(?) display in xterm(1)