Download raw body.
watch(1): fix UTF-8
Hi Job,
Job Snijders wrote on Wed, May 21, 2025 at 12:19:58PM +0000:
> Florian noticed that this results in art which does not spark joy.
>
> $ ftp https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
> $ watch cat UTF-8-demo.txt
>
> I took inspiration from tmux/tmux.c.
I strongly object for several reasons.
First, "en_US.UTF-8" is always valid on OpenBSD, so the first
setlocale(3) always succeeds, so the "if" block is never
entered, so the program always runs with a UTF-8 locale,
even when the user explicitly requested otherwise.
That is not acceptable.
Secondly, this is overengineering. We do not put large amounts
of complicated code into OpenBSD for the sake of portability.
If a portable version is created of an OpenBSD program, which
i do not expect to happen for watch(1), the portable version
is the place to add portability goo as needed.
Locale names are not portable (by POSIX definition) but that
does not excuse extensive portability gymnastics inside OpenBSD.
I did not yet investigate what exactly in watch(1) causes the
garbled output of
$ env LC_CTYPE=C.UTF-8 watch cat UTF-8-demo.txt
but this clearly needs a simpler solution.
Yours,
Ingo
> Index: watch.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/watch/watch.c,v
> diff -u -p -r1.23 watch.c
> --- watch.c 21 May 2025 08:32:10 -0000 1.23
> +++ watch.c 21 May 2025 12:15:54 -0000
> @@ -26,6 +26,7 @@
> #include <err.h>
> #include <errno.h>
> #include <event.h>
> +#include <langinfo.h>
> #include <locale.h>
> #include <paths.h>
> #include <signal.h>
> @@ -124,7 +125,17 @@ main(int argc, char *argv[])
> struct event ev_sigint, ev_sighup, ev_sigterm, ev_sigwinch, ev_stdin;
> size_t len, rem;
> int i, ch;
> + const char *s;
> char *p;
> +
> + if (setlocale(LC_CTYPE, "en_US.UTF-8") == NULL &&
> + setlocale(LC_CTYPE, "C.UTF-8") == NULL) {
> + if (setlocale(LC_CTYPE, "") == NULL)
> + errx(1, "invalid LC_ALL, LC_CTYPE or LANG");
> + s = nl_langinfo(CODESET);
> + if (strcasecmp(s, "UTF-8") != 0 && strcasecmp(s, "UTF8") != 0)
> + errx(1, "need UTF-8 locale (LC_CTYPE) but have %s", s);
> + }
>
> while ((ch = getopt(argc, argv, "cls:wx")) != -1)
> switch (ch) {
watch(1): fix UTF-8