Index | Thread | Search

From:
Florian Obser <florian@openbsd.org>
Subject:
Re: watch(1): fix UTF-8
To:
tech@openbsd.org
Date:
Wed, 21 May 2025 14:35:56 +0200

Download raw body.

Thread
curses(3) has this:

   Initialization
     The library uses the locale which the calling program has initialized.
     That is normally done with setlocale(3):

         setlocale(LC_ALL, "");

     If the locale is not initialized, the library assumes that characters are
     printable as in ISO-8859-1, to work with certain legacy programs.  You
     should initialize the locale and not rely on specific details of the
     library when the locale has not been setup.

On 2025-05-21 12:19 GMT, Job Snijders <job@openbsd.org> wrote:
> Florian noticed that this results in art which does not spark joy.
>
> 	$ ftp https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
> 	$ watch cat UTF-8-demo.txt
>
> I took inspiration from tmux/tmux.c.
>
> OK?
>
> Index: watch.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/watch/watch.c,v
> diff -u -p -r1.23 watch.c
> --- watch.c	21 May 2025 08:32:10 -0000	1.23
> +++ watch.c	21 May 2025 12:15:54 -0000
> @@ -26,6 +26,7 @@
>  #include <err.h>
>  #include <errno.h>
>  #include <event.h>
> +#include <langinfo.h>
>  #include <locale.h>
>  #include <paths.h>
>  #include <signal.h>
> @@ -124,7 +125,17 @@ main(int argc, char *argv[])
>  	struct event ev_sigint, ev_sighup, ev_sigterm, ev_sigwinch, ev_stdin;
>  	size_t len, rem;
>  	int i, ch;
> +	const char *s;
>  	char *p;
> +
> +	if (setlocale(LC_CTYPE, "en_US.UTF-8") == NULL &&
> +	    setlocale(LC_CTYPE, "C.UTF-8") == NULL) {
> +		if (setlocale(LC_CTYPE, "") == NULL)
> +			errx(1, "invalid LC_ALL, LC_CTYPE or LANG");
> +		s = nl_langinfo(CODESET);
> +		if (strcasecmp(s, "UTF-8") != 0 && strcasecmp(s, "UTF8") != 0)
> +			errx(1, "need UTF-8 locale (LC_CTYPE) but have %s", s);
> +        }
>  
>  	while ((ch = getopt(argc, argv, "cls:wx")) != -1)
>  		switch (ch) {
>

-- 
In my defence, I have been left unsupervised.