Index | Thread | Search

From:
Ingo Schwarze <schwarze@usta.de>
Subject:
Re: timing-dependent(?) display in xterm(1)
To:
Henry Ford <henryfordkjv@gmail.com>
Cc:
tech@openbsd.org
Date:
Wed, 14 May 2025 21:33:05 +0200

Download raw body.

Thread
Hello Henry,

Henry Ford wrote on Wed, May 14, 2025 at 02:25:06PM -0400:
> On Mon, May 12, 2025 at 09:40:35 AM -0400, Walter Alejandro Iglesias wrote:
>> On Sat, May 10, 2025 at 11:12:49PM +0200, Ingo Schwarze wrote:

>>> I cannot say yet how this happened, but *if* this was caused by
>>> an update to xterm(1) from upstream, then that means trusting
>>> upstream with xterm(1) is not a good idea.

>> The bug was introduced (intentionally?) in xterm 391:
>> 
>>   https://invisible-island.net/xterm/xterm.log.html#xterm_391
>> 
>> I've downloaded, compiled and tested 390 and 391 versions.  I could
>> reproduce the bug with 391 version, xtrem-390 works as expected.  Test
>> with xterm-390:
>> 
>>   $ printf '\x9bc\n'
>>   �c

> This behaviour is caused by the following code in xterm(1), which allows
> bytes between 0x80 and 0xc0 if they are followed immediately by a byte
> between 0x20 and 0x80 (and a UTF-8 sequence was not already started).
> These bytes are then processed as normal, i.e if they are C1 control
> characters they are acted on.
> This only works if the next byte is immediately available to xterm(1),
> it will not wait for input to become available.

Thank you very much for digging into this.  This saves me quite some
time and effort in researching what is going on.  Compare this to

  https://undeadly.org/cgi?action=article&sid=20160308204011

where i said, among other things,

  Actually, even when fed garbage or unsupported encodings, a UTF-8
  xterm(1) is more robust than a US-ASCII xterm(1) because the UTF-8
  xterm(1) honours *fewer* terminal escape codes than the US-ASCII
  xterm(1). That may seem surprising at first because Unicode defines
  *more* control characters than US-ASCII does. But as explained on
    http://invisible-island.net/xterm/ctlseqs/ctlseqs.html
  xterm(1) never treats decoded multibyte characters as terminal
  control codes, so the ISO 6429 C1 control codes do not take effect
  in UTF-8 mode; but they do take effect in US-ASCII mode, even though
  they fall outside the scope of ASCII.
  Consequently, in the interest of safe and sane defaults, i recently
  switched our xterm(1) to enable UTF-8 mode by default. I did that
  by adding this resource to /usr/X11R6/share/X11/app-defaults/XTerm:
    *locale: UTF-8
  The main goal is improving robustness. But it also improves usability.

So i think given your research results which i'm leaving in place below.
we have positive proof that upstream *deliberately* and *systematically*
ruined the security features of xterm(1), even those that have been -
and still are - documented on their own website.

Apparently, we have been caught unaware.  Multiple patches of such
large size that they were essentially unreviewable have been merged
into our tree, with OKs from good but apparently unsuspecting
developers.  I'm also to blame myself because i did not review
those patches and probably should have.

I previously said "we cannot trust upstream" based on my own testing
and code review results.  In view of the additional, valuable results
of your research, we can go one step further.  I think the following
stronger statement has been conclusively proven:

  We can trust upstream to consistently and systematically pursue
  development goals that are clearly opposed to the OpenBSD project
  goals.  In particular, the following project goals stated on
    https://www.openbsd.org/goals.html
  are systematically being violated:

   1. Transparency: "ability to look at CVS tree changes directly"
   2. Robustness.
   3. Security.

Instead, upstream appears to focus on goals incompatible with OpenBSD
project goals, in particular support for antiquated and obscure
systems, excessive complication, and massive and quickly proliferating
featuritis.

Even though tedu@'s statement that Thomas Dickey values standards
compliance may quite possibly be true (i did not check), i do not see how
that could possibly provide any benefit in view of the above findings.

Yours,
  Ingo


> (from xenocara app/xterm/ptydata.c r1.161 line 98):
> 	} else if (c < 0xc0) {
> 	    /* We received a continuation byte */
> 	    if (utf_count < 1) {
> 		if (screen->c1_printable) {
> 		    data->utf_data = (IChar) c;
> 		} else if ((i + 1) < length
> 			   && data->next[i + 1] > 0x20
> 			   && data->next[i + 1] < 0x80) {
> 		    /*
> 		     * Allow for C1 control string if the next byte is
> 		     * available for inspection.
> 		     */
> 		    data->utf_data = (IChar) c;
> 		} else {
> 		    /*
> 		     * We received a continuation byte before receiving a
> 		     * sequence state, or a failed attempt to use a C1 control
> 		     * string.
> 		     */
> 		    data->utf_data = (IChar) UCS_REPL;
> 		}
> 		data->utf_size = (i + 1);
> 		break;
>  
> This was changed in xterm snapshot 390d, the diff for which is
> available at: https://github.com/ThomasDickey/xterm-snapshots/commit/69daabcca67cfc021973ae910ec2020ef32d14cf
> 
> The relevant portion of the diff is reproduced below.
> If this diff is reverted inside the xterm directory of a xenocara
> source tree the behaviour returns to way it was pre xterm-391.
>  
> It seems like the intent here is to disambiguate invalid UTF-8 from
> C1 control characters (and other data in the range 0x80-0xc0) by
> checking if they are followed by a printable ascii byte.
> Unfortunately this is implemented in such a way that the behaviour
> depends on the timing with which the bytes are sent.
> 
> --- ptydata.c
> +++ ptydata.c
> @@ -98,13 +98,24 @@ decodeUtf8(TScreen *screen, PtyData *data)
>  	} else if (c < 0xc0) {
>  	    /* We received a continuation byte */
>  	    if (utf_count < 1) {
> -		/*
> -		 * We received a continuation byte before receiving a sequence
> -		 * state.  Or an attempt to use a C1 control string.  Either
> -		 * way, it is mapped to the replacement character, unless
> -		 * allowed by optional feature.
> -		 */
> -		data->utf_data = (IChar) (screen->c1_printable ? c : UCS_REPL);
> +		if (screen->c1_printable) {
> +		    data->utf_data = (IChar) c;
> +		} else if ((i + 1) < length
> +			   && data->next[i + 1] > 0x20
> +			   && data->next[i + 1] < 0x80) {
> +		    /*
> +		     * Allow for C1 control string if the next byte is
> +		     * available for inspection.
> +		     */
> +		    data->utf_data = (IChar) c;
> +		} else {
> +		    /*
> +		     * We received a continuation byte before receiving a
> +		     * sequence state, or a failed attempt to use a C1 control
> +		     * string.
> +		     */
> +		    data->utf_data = (IChar) UCS_REPL;
> +		}
>  		data->utf_size = (i + 1);
>  		break;
>  	    } else if (screen->utf8_weblike