Mailing List Archive

Hello Henry, Henry Ford wrote on Wed, May 14, 2025 at 02:25:06PM -0400: > On Mon, May 12, 2025 at 09:40:35 AM -0400, Walter Alejandro Iglesias wrote: >> On Sat, May 10, 2025 at 11:12:49PM +0200, Ingo Schwarze wrote: >>> I cannot say yet how this happened, but *if* this was caused by >>> an update to xterm(1) from upstream, then that means trusting >>> upstream with xterm(1) is not a good idea. >> The bug was introduced (intentionally?) in xterm 391: >> >> https://invisible-island.net/xterm/xterm.log.html#xterm_391 >> >> I've downloaded, compiled and tested 390 and 391 versions. I could >> reproduce the bug with 391 version, xtrem-390 works as expected. Test >> with xterm-390: >> >> $ printf '\x9bc\n' >> �c > This behaviour is caused by the following code in xterm(1), which allows > bytes between 0x80 and 0xc0 if they are followed immediately by a byte > between 0x20 and 0x80 (and a UTF-8 sequence was not already started). > These bytes are then processed as normal, i.e if they are C1 control > characters they are acted on. > This only works if the next byte is immediately available to xterm(1), > it will not wait for input to become available. Thank you very much for digging into this. This saves me quite some time and effort in researching what is going on. Compare this to https://undeadly.org/cgi?action=article&sid=20160308204011 where i said, among other things, Actually, even when fed garbage or unsupported encodings, a UTF-8 xterm(1) is more robust than a US-ASCII xterm(1) because the UTF-8 xterm(1) honours *fewer* terminal escape codes than the US-ASCII xterm(1). That may seem surprising at first because Unicode defines *more* control characters than US-ASCII does. But as explained on http://invisible-island.net/xterm/ctlseqs/ctlseqs.html xterm(1) never treats decoded multibyte characters as terminal control codes, so the ISO 6429 C1 control codes do not take effect in UTF-8 mode; but they do take effect in US-ASCII mode, even though they fall outside the scope of ASCII. Consequently, in the interest of safe and sane defaults, i recently switched our xterm(1) to enable UTF-8 mode by default. I did that by adding this resource to /usr/X11R6/share/X11/app-defaults/XTerm: *locale: UTF-8 The main goal is improving robustness. But it also improves usability. So i think given your research results which i'm leaving in place below. we have positive proof that upstream *deliberately* and *systematically* ruined the security features of xterm(1), even those that have been - and still are - documented on their own website. Apparently, we have been caught unaware. Multiple patches of such large size that they were essentially unreviewable have been merged into our tree, with OKs from good but apparently unsuspecting developers. I'm also to blame myself because i did not review those patches and probably should have. I previously said "we cannot trust upstream" based on my own testing and code review results. In view of the additional, valuable results of your research, we can go one step further. I think the following stronger statement has been conclusively proven: We can trust upstream to consistently and systematically pursue development goals that are clearly opposed to the OpenBSD project goals. In particular, the following project goals stated on https://www.openbsd.org/goals.html are systematically being violated: 1. Transparency: "ability to look at CVS tree changes directly" 2. Robustness. 3. Security. Instead, upstream appears to focus on goals incompatible with OpenBSD project goals, in particular support for antiquated and obscure systems, excessive complication, and massive and quickly proliferating featuritis. Even though tedu@'s statement that Thomas Dickey values standards compliance may quite possibly be true (i did not check), i do not see how that could possibly provide any benefit in view of the above findings. Yours, Ingo > (from xenocara app/xterm/ptydata.c r1.161 line 98): > } else if (c < 0xc0) { > /* We received a continuation byte */ > if (utf_count < 1) { > if (screen->c1_printable) { > data->utf_data = (IChar) c; > } else if ((i + 1) < length > && data->next[i + 1] > 0x20 > && data->next[i + 1] < 0x80) { > /* > * Allow for C1 control string if the next byte is > * available for inspection. > */ > data->utf_data = (IChar) c; > } else { > /* > * We received a continuation byte before receiving a > * sequence state, or a failed attempt to use a C1 control > * string. > */ > data->utf_data = (IChar) UCS_REPL; > } > data->utf_size = (i + 1); > break; > > This was changed in xterm snapshot 390d, the diff for which is > available at: https://github.com/ThomasDickey/xterm-snapshots/commit/69daabcca67cfc021973ae910ec2020ef32d14cf > > The relevant portion of the diff is reproduced below. > If this diff is reverted inside the xterm directory of a xenocara > source tree the behaviour returns to way it was pre xterm-391. > > It seems like the intent here is to disambiguate invalid UTF-8 from > C1 control characters (and other data in the range 0x80-0xc0) by > checking if they are followed by a printable ascii byte. > Unfortunately this is implemented in such a way that the behaviour > depends on the timing with which the bytes are sent. > > --- ptydata.c > +++ ptydata.c > @@ -98,13 +98,24 @@ decodeUtf8(TScreen *screen, PtyData *data) > } else if (c < 0xc0) { > /* We received a continuation byte */ > if (utf_count < 1) { > - /* > - * We received a continuation byte before receiving a sequence > - * state. Or an attempt to use a C1 control string. Either > - * way, it is mapped to the replacement character, unless > - * allowed by optional feature. > - */ > - data->utf_data = (IChar) (screen->c1_printable ? c : UCS_REPL); > + if (screen->c1_printable) { > + data->utf_data = (IChar) c; > + } else if ((i + 1) < length > + && data->next[i + 1] > 0x20 > + && data->next[i + 1] < 0x80) { > + /* > + * Allow for C1 control string if the next byte is > + * available for inspection. > + */ > + data->utf_data = (IChar) c; > + } else { > + /* > + * We received a continuation byte before receiving a > + * sequence state, or a failed attempt to use a C1 control > + * string. > + */ > + data->utf_data = (IChar) UCS_REPL; > + } > data->utf_size = (i + 1); > break; > } else if (screen->utf8_weblike

2025-05-12 13:40 Walter Alejandro Iglesias:
timing-dependent(?) display in xterm(1)
- 2025-05-14 18:25 Henry Ford:
  timing-dependent(?) display in xterm(1)
- - 2025-05-14 19:33 Ingo Schwarze:
    timing-dependent(?) display in xterm(1)