From: Ingo Schwarze Subject: Re: watch(1): fix UTF-8 To: Job Snijders Cc: tech@openbsd.org Date: Wed, 21 May 2025 14:49:35 +0200 Hi Job, Job Snijders wrote on Wed, May 21, 2025 at 12:19:58PM +0000: > Florian noticed that this results in art which does not spark joy. > > $ ftp https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt > $ watch cat UTF-8-demo.txt > > I took inspiration from tmux/tmux.c. I strongly object for several reasons. First, "en_US.UTF-8" is always valid on OpenBSD, so the first setlocale(3) always succeeds, so the "if" block is never entered, so the program always runs with a UTF-8 locale, even when the user explicitly requested otherwise. That is not acceptable. Secondly, this is overengineering. We do not put large amounts of complicated code into OpenBSD for the sake of portability. If a portable version is created of an OpenBSD program, which i do not expect to happen for watch(1), the portable version is the place to add portability goo as needed. Locale names are not portable (by POSIX definition) but that does not excuse extensive portability gymnastics inside OpenBSD. I did not yet investigate what exactly in watch(1) causes the garbled output of $ env LC_CTYPE=C.UTF-8 watch cat UTF-8-demo.txt but this clearly needs a simpler solution. Yours, Ingo > Index: watch.c > =================================================================== > RCS file: /cvs/src/usr.bin/watch/watch.c,v > diff -u -p -r1.23 watch.c > --- watch.c 21 May 2025 08:32:10 -0000 1.23 > +++ watch.c 21 May 2025 12:15:54 -0000 > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -124,7 +125,17 @@ main(int argc, char *argv[]) > struct event ev_sigint, ev_sighup, ev_sigterm, ev_sigwinch, ev_stdin; > size_t len, rem; > int i, ch; > + const char *s; > char *p; > + > + if (setlocale(LC_CTYPE, "en_US.UTF-8") == NULL && > + setlocale(LC_CTYPE, "C.UTF-8") == NULL) { > + if (setlocale(LC_CTYPE, "") == NULL) > + errx(1, "invalid LC_ALL, LC_CTYPE or LANG"); > + s = nl_langinfo(CODESET); > + if (strcasecmp(s, "UTF-8") != 0 && strcasecmp(s, "UTF8") != 0) > + errx(1, "need UTF-8 locale (LC_CTYPE) but have %s", s); > + } > > while ((ch = getopt(argc, argv, "cls:wx")) != -1) > switch (ch) {