Mailing List Archive

From:: Ingo Schwarze <schwarze@usta.de>
Subject:: Re: [REPOST] ksh: utf8 full width character support for emacs.c
To:: Gong Zhile <gongzl@stu.hebust.edu.cn>
Cc:: tech@openbsd.org
Date:: Fri, 21 Mar 2025 12:15:27 +0100

Download raw body.

Thread

- 2025-03-20 04:37 Christian Schulte:
  [REPOST] ksh: utf8 full width character support for emacs.c
2025-03-21 08:53 Christian Schulte:
[REPOST] ksh: utf8 full width character support for emacs.c
2025-03-21 11:15 Ingo Schwarze:
[REPOST] ksh: utf8 full width character support for emacs.c
- 2025-03-21 22:38 Anthony J. Bentley:
  [REPOST] ksh: utf8 full width character support for emacs.c
- - 2025-03-22 14:03 Ingo Schwarze:
    [REPOST] ksh: utf8 full width character support for emacs.c

Hello,

Gong Zhile wrote on Wed, Mar 19, 2025 at 10:15:42AM +0800:

> There isn't any wchar_t involved in that patch. It took a UTF-8 rune
> (codepoint) from a cstring and process it. But, as enh has pointed out,
> refactoring it to elevate wcwidth(3) is surely a good idea.

I wouldn't say "surely", but i would say "more likely".

You have picked quite a tricky task here.  Hurdles include:
 * To use wcwidth(3), you need wchar_t values.
 * I see no reasonable way how you could get such values
   other than by using libc functions like mbtowc(3).
 * To use these functions, it is necessary to use setlocale(3)
   or functions like newlocale(3)/uselocale(3), which the shell
   does not use yet.
 * One needs to consider whether using ???locale(3) in the shell
   carries any risk, or needs to be restricted to certain areas of
   the code.  It's not yet clear to me that suddenly running all
   the code in the shell under a locale different from the C locale
   would have no detrimental consequences.
 * Security and reliability implications would have to be considered.
   For example, in some situations it might be a catastrophic
   vulnerability if the shell would somehow end up dying from EILSEQ.
 * Performance implications would have to be considered.
   For example, switching setlocale(3) back and forth multiple
   times can sometimes cause massive performance degradation depending
   on the circumstances.  Even running mbtowc(3) too often could maybe
   have performance implications.
 * Consequences for the simplicity and maintainability of the
   code ought to be considered, and minimized.
 * Implications for the SMALL version of the shell need to be
   carefully considered.

I'm not trying to discourage you, i'm merely pointing out that
this is absolutely not a beginner-level task.  I think honesty
requires being open about that non-obvious fact.

Yours,
  Ingo

- 2025-03-20 04:37 Christian Schulte:
  [REPOST] ksh: utf8 full width character support for emacs.c
2025-03-21 08:53 Christian Schulte:
[REPOST] ksh: utf8 full width character support for emacs.c
2025-03-21 11:15 Ingo Schwarze:
[REPOST] ksh: utf8 full width character support for emacs.c
- 2025-03-21 22:38 Anthony J. Bentley:
  [REPOST] ksh: utf8 full width character support for emacs.c
- - 2025-03-22 14:03 Ingo Schwarze:
    [REPOST] ksh: utf8 full width character support for emacs.c