From: Walter Alejandro Iglesias <wai@roquesor.com>
Subject: Re: mail(1) set Date and User-Agent [was: Re: Back to rfc2045]
To: Omar Polo <op@omarpolo.com>
Cc: tech@openbsd.org
Date: Thu, 1 Aug 2024 21:47:32 +0200

On Thu, Aug 01, 2024 at 08:00:57PM +0200, Omar Polo wrote:
> On 2024/08/01 10:41:40 +0200, Walter Alejandro Iglesias <wai@roquesor.com> wrote:
> > I fixed some mistakes and simplified a bit the code:
> > 
> >   https://en.roquesor.com/Downloads/mail_patches.tar.gz
> > 
> > I understand why my proposal doesn't attract too much attention, I can
> > count only one person here using mail(1) as a MUA.  Anyway, I'd like to
> > know if I'm doing something wrong.
> > 
> > At least I'd appreciate some guidance about the general procedure.
> > Regarding the size, for example, I understand conceptually the
> > convenience of splitting and committing a diff in small chunks to gain
> > control and avoid introducing regressions, but isn't it necessary to
> > test the full functional patches first, to evaluate what they do?
> 
> No, it is not.  For example, Date and User-agent are completely
> orthogonal to MIME support or UTF8.

Haven't I posted separates diffs, one including Date and User-agent and
another with the MIME headers?!!!

> So, while I'm happy to see this
> interest in improving mail(1),

OK.  So according to you, mine is just interest.  My work and my time
doesn't count at all.

> let's try to get it down to small pieces
> that can be reviewed and everyone agree on.

So far you aren't helping in what I asked: how to split the diffs or
reducing its size without breaking its functionality.  You are just
confusing things like you did at the time of my first proposal.  On that
occasion you told me off-list that you agreed with what I had done, here
is what you told me (name censored for privacy):

   "P.S.  I personally don't like the suggestion proposed by <.....>
   about having mail spawning another program to decide, seems too much
   complex for the task at hand."

And right after hat you posted on tech@ a diff of yours doing exactly
the opposite.


> Especially when some
> "controversial" or, rather, delicate topics like UTF-8 are involved.

Are we discussing politics, religion?  I pasted below the program[1] I
wrote to design the function to base64 encode words, it also includes
the function I wrote to wrap the headers.  Feel free anyone to review
the code, test it and then *technically* tell me what I did wrong or in
which way can be improved.


> The risk is to derail the thread on some details and so loosing all the
> other stuff.

This time there is no room to derails, contrary to what happened with my
first proposal, now I am quite clear about what I want to do, and the
work is done.  Again, my doubt was about how to split id.diff, which is
large, without breaking its functionality (at deraadt@ request.)


> 
> So, let's discuss the first (?) patch, the one to add the Date and
> User-Agent header.
> We'll do separate threads for the others when I'll
> get to review them.
> 
> Personally I don't feel a strong need for User-Agent,
> but given that
> even mblaze(1) sets it by default I think it's fine.  I'm a bit unsure
> about the format, your diff has "OpenBSD mail(1)" and the (...) are
> usually comments in mail headers, so I've changed it to just "OpenBSD
> mail".

I don't care about the User-Agent.  That's cosmetic.  I included it to
check if someone really tested my patches.

> 
> Then, we can't use strftime() since it depends on the locale.  (On
> OpenBSD this technically doesn't matter, but let's do the right thing.)
> So, I've stolen a function from OpenSMTPD.  (which opens the problem of
> the copyright and ISC vs BSD 3 clausole)

I'd also read the comment and the function in OpenSMTP code, I could've
stolen it as you did.  I used strftime() instead for two reasons, first
because of the advice (in my first proposal) of Stefan Sperling to use
libc, second to keep my diffs as small as possible at deraadt@ request.
Then knowing that under OpenBSD that problem with the locale doesn't
exist (OpenSMTP is used in operating systems supporting more locales)
and also having verified it after applying my patches, I decided to use
strftime().  So, sorry Omar, but you're not adding nothing new to what I
already did.

> 
> Anyway, here's the diff.  Except for the copyright "problem" from which
> I'd like to hear other opinions, I think this is fine and is ok op@ if
> someone wants to go ahead with this.

I don't see any improvement in your solution compared to mine.  But
since, despite me, you have credentials to commit your patch, do
whatever you want.  It's totally fine for me.

> 
> 
> Cheers,
> 
> Omar Polo
> 

(1) Here is the little program I mentioned above:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>
#include <errno.h>
#include <stdint.h>
#include <unistd.h>
#include <ctype.h>
#include <locale.h>

static int newline_at_end = 0;

void		encode_word(char **p);
static void	filecopy(FILE *, FILE *);
void		usage(void);
void		base64(char **p);
unsigned	b64len(unsigned len);
void		wrap_header(char **p, size_t name_len);

int
main(int argc, char *argv[])
{
	FILE *fp;
	int option;

	while ((option = getopt(argc, argv, "h")) != -1)
		switch (option) {
		case 'h':
			usage();
			break;
		default:
			usage();
		}

	argc -= optind;
	argv += optind;

	if (argc > 0)
		while (argc-- > 0)
			if ((fp = fopen(*argv++, "r")) == NULL)
				warn("%s", *(argv - 1));
			else {
				filecopy(fp, stdout);
				fclose(fp);
			}
	else
		filecopy(stdin, stdout);

	return errno;
}

void
filecopy(FILE * ifp, FILE * ofp)
{
	char *p = NULL;
	size_t size = 0;
	size_t i = 0;
	int c;

	while ((c = getc(ifp)) != EOF) {
		if (i == size) {
			p = realloc(p, size + 100);
			if (p == NULL)
				err(1, NULL);
			size += 100;
		}
		/* Strip control characters */
		if (!iscntrl(c) || c == '\t' || c == '\n')
			p[i++] = c;
	}

	if (i == size) {
		p = realloc(p, size + 1);
		if (p == NULL)
			err(1, NULL);
	}

	p[i] = '\0';

	encode_word(&p);
	wrap_header(&p, 0);
	printf("%s", p);

	free(p);
}

void
encode_word(char **p)
{
	char *s = NULL;
	char *r = NULL;
	char *word = NULL;
	size_t i = 0;
	size_t n = 0;
	size_t size = 0;
	size_t word_len = 0;
	char otag[] = "=?UTF-8?B?";
	char ctag[] = "?=";
	size_t tags_len = strlen(otag) + strlen(ctag);
	int utf8word = 0;

	setlocale(LC_CTYPE, "en_US.UTF-8");

	size = strlen(*p);
	s = malloc(size);
	if (s == NULL)
		err(1, NULL);

	r = *p;

	while (r[n] != '\0') {
		if (n == 0 || (n > 0 &&
		    (isspace(r[n - 1]) || r[n - 1] == '\n'))) {
			if (n == 0)
				while (isspace(r[n]))
					n++;
			word_len = strcspn(&r[n], "\t\n ");

			if (n != 0 && utf8word)
				word_len++;
			word = malloc(word_len + 1);
			if (word == NULL)
				err(1, NULL);

			if (n != 0 && utf8word) {
				snprintf(word, word_len + 1, "%s", &r[n - 1]);
				n--;
			} else
				snprintf(word, word_len + 1, "%s", &r[n]);

			if (word_len > mbstowcs(NULL, word, 0) ) {
				utf8word = 1;
				s = realloc(s, size + tags_len +
				    (b64len(word_len) - word_len));
				if (s == NULL)
					err(1, NULL);
				size += tags_len +
				    (b64len(word_len) - word_len);

				base64(&word);
				snprintf(&s[i], i + b64len(word_len) +
				    tags_len + 1, "%s%s%s", otag, word, ctag);
				i += b64len(word_len) + tags_len;

				n += word_len;
			} else {
				utf8word = 0;
				if (isspace(r[n]) || r[n] == '\n')
					n++;
			}
			free(word);
		}
		s[i++] = r[n++];
	}
	if (i == size) {
		s = realloc(s, size + 1);
		if (s == NULL)
			err(1, NULL);
	}
	s[i] = '\0';

	/* Remove this free() from mail patch */
	free(r);

	*p = s;
}

void
base64(char **p)
{
	size_t len, olen, i, j;
	uint32_t a, b, c, triple;
	char *enc = NULL;
	unsigned char *in = NULL;
	const char b64table[] =
	    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
	int b64mod[] = { 0, 2, 1 };

	in = (unsigned char*)*p;

	i = 0;
	while (in[i] != '\0') {
		/* Replace control characters by spaces */
		if (iscntrl(in[i])) {
			/* ---- Remove from mail(1) patch ----- */
			if (in[i] == '\n' && in[i + 1] == '\0')
				newline_at_end = 1;
			/* ------------------------------------ */
			in[i] = ' ';
		}
		i++;
	}

	len = strlen(*p);

	enc = malloc(b64len(len));
	if (enc == NULL)
		err(1, NULL);
	olen = b64len(len);

	i = j = 0;
	while (in[i] != '\0') {
		a = i < len ? in[i++] : 0;
		b = i < len ? in[i++] : 0;
		c = i < len ? in[i++] : 0;

		triple = (a << 0x10) + (b << 0x08) + c;

		enc[j++] = b64table[(triple >> 3 * 6) & 0x3F];
		enc[j++] = b64table[(triple >> 2 * 6) & 0x3F];
		enc[j++] = b64table[(triple >> 1 * 6) & 0x3F];
		enc[j++] = b64table[(triple >> 0 * 6) & 0x3F];
	}

	i = 0;
	while (i < b64mod[len % 3])
		enc[olen - 1 - i++] = '=';

	if (olen == j) {
		enc = realloc(enc, olen + 1);
		if (enc == NULL)
			err(1, NULL);
	}
	enc[j] = '\0';

	*p = enc;

	/* Remove from mail(1) patch */
	free(in);
}

unsigned
b64len(unsigned len)
{
	return (len + 2) / 3 * 4;
}

void
wrap_header(char **p, size_t name_len)
{
	char *s = NULL;
	char *r = NULL;
	size_t i = 0;
	size_t n = 0;
	size_t size = 0;
	int col = name_len;	/* let room for header name */
	int wrap = 72;

	size = strlen(*p);
	s = malloc(size);
	if (s == NULL)
		err(1, NULL);

	r = *p;

	while (r[n] != '\0') {
		/* Replace newlines by spaces */
		if (r[n] == '\n' && r[n + 1] != '\0')
			r[n] = ' ';
		if (n != 0 && isspace(r[n]) &&
		    col + strcspn(&r[n + 1], "\t\n ") >= wrap) {
			s = realloc(s, size + 1);
			if (s == NULL)
				err(1, NULL);
			size++;
			s[i++] = '\n';
			while (isspace(r[n + 1]))
				n++;	
			col = 0;
			if (r[n] == '\t')
				col += 7;
		}
		s[i++] = r[n++];
		if (r[n] == '\t')
			col += 7;
		col++;
	}
	if (i == size) {
		s = realloc(s, size + 1);
		if (s == NULL)
			err(1, NULL);
	}
	s[i] = '\0';

	/* Remove this free from mail patch */
	free(r);

	*p = s;
}

void
usage(void)
{
	extern char *__progname;

	fprintf(stderr,
		"Usage: %s [-bhlnp] [-w width] [file ...]\n"
		"  -h   print this help\n",
		__progname);
	exit(1);
}

-- 
Walter