Download raw body.
ksh vi mode: make 'D' work with UTF-8 input
Hello,
here is a simple patch to make the "delete to EOL" command (D)
work with UTF-8 characters in ksh(1) VI mode.
The problem is that after the deletion, the current implementation
backs up to the last *byte* remaining on the line, which may be a
UTF-8 continuation byte. When you then insert anything, it gets
inserted into the middle of the UTF-8 sequence, resulting in
invalid encoding.
For example,
1. Type two UTF-8 characters.
2. Type one ASCII character.
3. Press ESCAPE. The cursor now sits on the ASCII character.
4. Press D. The ASCII character disappears to the yank buffer,
and the cursor now appears to be sitting on the second UTF-8
character, but it is actually sitting on its last byte.
5. Press P. The ASCII character from the yank buffer gets
inserted into the middle of the UTF-8 character,
resulting in a currupted line similar to:
<UTF-8 char> <invalid byte> <ASCII char> <invalid byte>
With the patch below, we get this desired result instead:
<UTF-8 char> <ASCII char> <UTF-8 char>
because after the del_range(), we have es->cursor == es->linelen
(because del_range() has the side effect of changing es->linelen)
and insert == 0 (because 'd' does not initiate insert mode),
such that, after the end of the select block, the code enters
the default backup code
while (es->cursor > 0)
if (!isu8cont(es->cbuf[--es->cursor]))
break;
backing up the whole character and not just its last byte.
OK?
Ingo
Index: bin/ksh/vi.c
===================================================================
RCS file: /cvs/src/bin/ksh/vi.c,v
diff -u -p -r1.62 vi.c
--- bin/ksh/vi.c 25 Apr 2025 18:28:33 -0000 1.62
+++ bin/ksh/vi.c 25 Apr 2025 20:23:14 -0000
@@ -865,8 +865,6 @@ vi_cmd(int argcnt, const char *cmd)
case 'D':
yank_range(es->cursor, es->linelen);
del_range(es->cursor, es->linelen);
- if (es->cursor != 0)
- es->cursor--;
break;
case 'g':
Index: regress/bin/ksh/edit/vi.sh
===================================================================
RCS file: /cvs/src/regress/bin/ksh/edit/vi.sh,v
diff -u -p -r1.11 vi.sh
--- regress/bin/ksh/edit/vi.sh 25 Apr 2025 18:28:33 -0000 1.11
+++ regress/bin/ksh/edit/vi.sh 25 Apr 2025 20:23:14 -0000
@@ -72,6 +72,8 @@ testseq "one 2.0\0033BD" " # one 2.0\b\b
testseq "one ab.cd\0033bDa.\00332bD" \
" # one ab.cd\b\b \b\b\b..\b\b\b\b \b\b\b\b\b"
testseq "one two\0033bCrep" " # one two\b\b\b \b\b\brep"
+testseq "\0302\0251\0303\0200a\0033DP" \
+ " # \0302\0251\0303\0200a\b \b\ba\0303\0200\b\b"
# c: Change region.
testseq "one two\0033cbrep" " # one two\b\b\bo \b\b\bro\beo\bpo\b"
ksh vi mode: make 'D' work with UTF-8 input