Index | Thread | Search

From:
Ingo Schwarze <schwarze@usta.de>
Subject:
Re: [REPOST] ksh: utf8 full width character support for emacs.c
To:
enh@google.com
Cc:
Gong Zhile <gongzl@stu.hebust.edu.cn>, tech@openbsd.org
Date:
Fri, 21 Mar 2025 11:21:28 +0100

Download raw body.

Thread
Hi,

enh wrote on Mon, Mar 17, 2025 at 10:09:10AM -0400:

> any reason you don't just let libc's wcwidth() handle tracking
> unicode?

See https://www.openbsd.org/papers/eurobsdcon2016-utf8.pdf
for how UTF-8 support is currently designed in the OpenBSD base system,
including in OpenBSD ksh(1).

In particular, see page 19 for which techniques described in that talk
are used in ksh(1) and page 23 regarding additional details about ksh(1).

In particular, ksh(1) by design neither uses wchar_t nor an ad-hoc
replacement type for wchar_t but intentionally purely works on the "char"
level.

> i've certainly made changes to bionic's (icu-based) wcwidth()
> in the last year --- this stuff's tricky enough that you probably
> don't want to duplicate it unless you absolutely have to.

I strongly agree that reimplementing wcwidth(3) inside ksh(1) would
be unacceptable.  If we come to the conclusion that ksh(1) needs wcwidth(3),
there is no alternative to using the wcwidth(3) from libc because
that version is very carefully maintained by afresh1@ via Perl and
duplicating that work is out of the question.

So far, double-width character support was intentionally not attempted
in ksh(1) because it's quite a tricky problem and the tradeoff between
the potential benefit and complicating the code is not immediately clear.

Maybe the decision to not support double-width characters can be
revisited.  But that at the very least requires a complete understanding
of the presentation cited above, and care has to be take to keep changes
minimal as as easy to maintain as possible.  The patches floating here
on tech@ are clearly much too large and much too dirty.

*Maybe* ist is possible to use the combined technique 4+5 mentioned
on page 29 of the slides in one strategical place (certainly not all
over the place, though) in order to *locally* get the wchar_t values
needed for wcwidth(3).  I'm far from sure though whether that is
actually feasible and how exactly.

Yours,
  Ingo