From: ori@eigenstate.org Subject: Re: [REPOST] ksh: utf8 full width character support for emacs.c To: op@omarpolo.com, schwarze@usta.de Cc: tech@openbsd.org Date: Sun, 06 Apr 2025 22:48:12 -0400 Quoth Ingo Schwarze : > Hello Omar, > > Omar Polo wrote on Sun, Mar 30, 2025 at 08:37:06PM +0200: > > > grapheme clusters (i.e. what a user percieves as a "character") > > can be more than one code point long. > > That is (more or less) accurate - and ludicrously complicated. > There is a long annex to the Unicode standard on this topic, > Unicode Text Segmentation, Unicode Standard Annex #29 > https://www.unicode.org/reports/tr29/ > > Note that "grapheme clusters" are not the same as "user-percieved > characters", and there are several different types of grapheme clusters > (legacy, extenbded, tailored, ...). > For extra trivia: there are some Indian languages (Kannada, IIRC, is an example), where a combining codepoint combines with multiple surrounding glyphs, and not just the one codepoint before it. Rendering text is hard.