From: Ingo Schwarze Subject: Re: [REPOST] ksh: utf8 full width character support for emacs.c To: enh@google.com Cc: Gong Zhile , tech@openbsd.org Date: Fri, 21 Mar 2025 11:21:28 +0100 Hi, enh wrote on Mon, Mar 17, 2025 at 10:09:10AM -0400: > any reason you don't just let libc's wcwidth() handle tracking > unicode? See https://www.openbsd.org/papers/eurobsdcon2016-utf8.pdf for how UTF-8 support is currently designed in the OpenBSD base system, including in OpenBSD ksh(1). In particular, see page 19 for which techniques described in that talk are used in ksh(1) and page 23 regarding additional details about ksh(1). In particular, ksh(1) by design neither uses wchar_t nor an ad-hoc replacement type for wchar_t but intentionally purely works on the "char" level. > i've certainly made changes to bionic's (icu-based) wcwidth() > in the last year --- this stuff's tricky enough that you probably > don't want to duplicate it unless you absolutely have to. I strongly agree that reimplementing wcwidth(3) inside ksh(1) would be unacceptable. If we come to the conclusion that ksh(1) needs wcwidth(3), there is no alternative to using the wcwidth(3) from libc because that version is very carefully maintained by afresh1@ via Perl and duplicating that work is out of the question. So far, double-width character support was intentionally not attempted in ksh(1) because it's quite a tricky problem and the tradeoff between the potential benefit and complicating the code is not immediately clear. Maybe the decision to not support double-width characters can be revisited. But that at the very least requires a complete understanding of the presentation cited above, and care has to be take to keep changes minimal as as easy to maintain as possible. The patches floating here on tech@ are clearly much too large and much too dirty. *Maybe* ist is possible to use the combined technique 4+5 mentioned on page 29 of the slides in one strategical place (certainly not all over the place, though) in order to *locally* get the wchar_t values needed for wcwidth(3). I'm far from sure though whether that is actually feasible and how exactly. Yours, Ingo