Download raw body.
ksh vi mode: prevent display corruption if line starts with utf8cont
ksh vi mode: prevent display corruption if line starts with utf8cont
Hi Ingo,
This reads fines and does solve the bug I showed to you. OK lucas.
On Fri, May 16, 2025 at 03:18:36PM +0200, Ingo Schwarze wrote:
> Hello,
>
> Ingo Schwarze wrote on Tue, Apr 29, 2025 at 09:14:30PM +0200:
>
> > unfortunately, ksh(1) VI mode contains multiple situations where the
> > cursor can run out of the buffer when UTF-8 continuation bytes appear
> > in places where they do not form valid UTF-8 sequences. In a few
> > cases, i have even seen the shell die from SIGSEGV as a result.
> >
> > Before i can really start to track down and fix these one by one,
> > or address the issue with the 'e' command that Walter reported,
> > the function display() needs to be fixed. Right now, invalid UTF-8
> > at the beginning of the command line can corrupt the display so badly
> > that working on other issues is seriously hindered.
>
> After partially exploring various rabbit holes that i will have to
> return to later, i finally committed my display() diff yesterday because
> i finally convinced myself that it indeed makes things better and that
> it is a necessary step, and i also had an OK from lucas@.
>
> I did not yet commit the regression test though because lucas@ pointed
> out that some similar sequences of input still misbehave (thanks for
> that feedback). It seems better to first get more aspects of
> continuation byte display fixed, then commit a set of related tests
> together.
>
> The next step is fixing the function ed_mov_opt(); specifically, the
> part of it that advances the cursor from the column with the number
> "cur_col" to the column with the number "col".
>
> The problem to solve is this: If the line starts with a UTF-8
> continuation byte, the code ignores it as if it were part of the last
> character of the prompt and puts the *next* character right after the
> prompt, omitting the UTF-8 continuation byte.
>
> The solution that seems easiest to understand restructures the loop in
> several respects:
> * Terminate the loop when we reach the target column (ci == col) *and*
> we do not need to finish a character we started - the latter can be
> the case because we are still at the left margin and never started
> any character in the first place (ci == pwidth) or because the
> current byte is not a continuation byte.
> * To simplify the condition controlling the actual printing, advance
> the column count up front, but without advancing the byte pointer
> yet, such that "ci" always contains the current column, no matter
> whether we are on a start or continuation byte.
> * UTF-8 continuation bytes advance the column count if and only if they
> are at the beginning of the line. So if after newcol() we are still
> at the left margin, advance.
> * Now we can simply print the byte if and only if it is to the right of
> what we printer earlier (ci > cur_col).
> * Only advance the byte pointer at the very end of the loop, when we
> really want to consider the next byte.
>
> Testing this requires doing rightward movements over various byte
> sequences. It helps to first type a sufficiently long string such that
> the '0' command moves left by doing a carriage return and printing a
> fresh prompt over the existing prompt. Then you can move right in small
> steps (e.g. with 'l') or large steps, including with '$'.
>
> Note that when the line starts with a UTF-8 continuation byte and
> the cursors sits at the beginnin of the line, the 'l' command
> arguably still doesn't do the right thing and advances one character
> too far - but that's a separate issue to be fixed later.
>
> I'm not planning to commit the two new tests with this patch because
> that's still not a complete suite for this kind of issues and there are
> still related unfixed issues that hinder regression testing, but i'm
> including them in this mail because they serve as a preliminary
> demonstration of the partial progress we are making.
>
> OK?
> Ingo
>
>
>
> Index: bin/ksh/vi.c
> ===================================================================
> RCS file: /cvs/src/bin/ksh/vi.c,v
> diff -u -p -r1.64 vi.c
> --- bin/ksh/vi.c 15 May 2025 17:07:13 -0000 1.64
> +++ bin/ksh/vi.c 16 May 2025 12:40:46 -0000
> @@ -2006,10 +2006,15 @@ ed_mov_opt(int col, char *wb)
>
> /* Advance the cursor. */
>
> - for (ci = pwidth; ci < col || isu8cont(*wb);
> - ci = newcol((unsigned char)*wb++, ci))
> - if (ci > cur_col || (ci == cur_col && !isu8cont(*wb)))
> + ci = pwidth;
> + while (ci < col || (ci > pwidth && isu8cont(*wb))) {
> + ci = newcol((unsigned char)*wb, ci);
> + if (ci == pwidth)
> + ci++;
> + if (ci > cur_col)
> x_putc(*wb);
> + wb++;
> + }
> cur_col = ci;
> }
>
> Index: regress/bin/ksh/edit/vi.sh
> ===================================================================
> RCS file: /cvs/src/regress/bin/ksh/edit/vi.sh,v
> diff -u -p -r1.12 vi.sh
> --- regress/bin/ksh/edit/vi.sh 27 Apr 2025 17:06:58 -0000 1.12
> +++ regress/bin/ksh/edit/vi.sh 16 May 2025 12:41:13 -0000
> @@ -103,6 +103,9 @@ testseq "hello\003302flD" " # hello\b\b\
> # i: Insert.
> testseq "hello\00332hix" " # hello\b\b\bxllo\b\b\b"
> testseq "hello\00332\b2ix\0033" " # hello\b\b\bxllo\b\b\bxllo\b\b\b\b"
> +testseq "\0270\0202A\00330i\0340\0033" " # \0270\0202A\b\b\0340\0270\0202A\b\b"
> +testseq "\0200abcdef\00330ix\0033" \
> + " # \0200abcdef\b\r # x\0200abcdef\r # x\0200\b"
>
> # I: Insert before first non-blank.
> # ^: Move to the first non-whitespace character.
>
ksh vi mode: prevent display corruption if line starts with utf8cont
ksh vi mode: prevent display corruption if line starts with utf8cont