Download raw body.
ksh(1) VI mode: fix rewindow()
Hello,
in ksh(1) VI editing mode, when you move the cursor to the left or
right out of the editing window, the function rewindow() gets called.
Its purpose is to calculate a new value for es->winleft, which is
the index of the first byte of the editing buffer es->cbuf[] to be
displayed. After rewindow() returns, display() redraws the line,
starting at the just selected position.
The algorithm rewindow() currently uses goes like this:
1. The overall goal is to choose es->winleft in such a way that
the cursor position (corresponding to es->cursor) ends up
approximately in the middle of the editing window.
The task is non-trivial because byte positions in the
buffer do not directly correspond to column positions.
In particular, UTF-8 continuation bytes do not take up
column positions, whereas tabs may take up more than one
column, and so do ASCII control characters (and high bytes
if the option vi-show8 is set). Given that some bytes
take up more than one column, it is not always possible
to get the cursor *exactly* to the middle of the window,
but fortunately, *approximately* is good enough.
2. The function iterates the whole command line from the very
beginning up to the cursor position - even if the cursor is far
out to the right and the beginning is completely irrelevant.
3. During this iteration, it remembers the last two places
where the column position is (approximately) an integer
multiple of half the window width - even though we are not
really interested in distances from the beginning, but
rather in distances from the cursor.
4. Finally, it starts *another* iteration at the last but one
multiple of winwidth/2 (which may be too far left) and
goes to the right until it thinks that starting there
might put the cursor in the middle.
This algorithm has multiple defects.
a) It is written in such a way that it starts the editing window
with a UTF-8 continuation byte whenever it gets the chance,
resulting in garbled output that is not valid UTF-8.
b) Needlessly iterating the whole buffer is inefficient.
c) Iterating over bytes again that we already inspected during
the first iteration is inefficient.
d) Even with the second iteration in step 4, the result is not
guaranteed to be optimal because the width of a tab depends
on the position of the editing window, which the second
iteration in step 4 does not take into account.
e) The code, even though relatively short, is hard to follow,
and it is hard to see why it does what it does.
Fixing everything, including completely fixing defect d),
would probably require increasing the order of the algorithm.
Currently, it is something like O(C + W) where C is the cursor
position in bytes from the beginning of the line and W is half
the width of the editing window. Reliably finding the optimal
position in every case involving tabs and control characters
would probably require an algorithm of the order O(W^2) because
one would have to try several starting position and do an
iteration calculating the cursor position for each of them,
resulting in a quadratic algorithm. Well, picking a tricky
search algorithm, one might get that down to O(W * log(W)) -
but either way, an exact solution is likely not worth the
significant complication of the code it would cause.
What i propose is solving a), b), c), and e) by using an
algorithm of the order O(W) - i.e. slightly faster -
that is slightly longer, but much easier to understand.
It doesn't become worse with respect to defect d) -
it may be a handful of bytes off to the left or right in
some cases involving literal tabs, but i consider that
irrelevant for practical purposes.
The key to the new algorithm is iterating *backwards* rather than
forwards - which makes sense because we want to start at the
cursor position and than step left until we have backed up
by about half the window width.
OK?
Ingo
Index: vi.c
===================================================================
RCS file: /cvs/src/bin/ksh/vi.c,v
diff -u -p -r1.66 vi.c
--- vi.c 19 May 2025 14:27:38 -0000 1.66
+++ vi.c 25 May 2025 21:41:00 -0000
@@ -1789,25 +1789,61 @@ outofwin(void)
static void
rewindow(void)
{
- int tcur, tcol;
- int holdcur1, holdcol1;
- int holdcur2, holdcol2;
-
- holdcur1 = holdcur2 = tcur = 0;
- holdcol1 = holdcol2 = tcol = 0;
- while (tcur < es->cursor) {
- if (tcol - holdcol2 > winwidth / 2) {
- holdcur1 = holdcur2;
- holdcol1 = holdcol2;
- holdcur2 = tcur;
- holdcol2 = tcol;
+ int cur; /* byte# in the main command line buffer */
+ int col; /* corresponding display column */
+ int tabc; /* columns a tab character can take up */
+ int thisc; /* columns the current character requires */
+ unsigned char uc; /* the current byte */
+
+ /* The desired cursor position is near the middle of the window. */
+ cur = es->cursor;
+ col = winwidth / 2;
+ tabc = 0;
+
+ /* Step left to find the desired left margin. */
+ while (cur > 0 && col > 0) {
+ uc = es->cbuf[--cur];
+
+ /* Never start the window on a continuation byte. */
+ if (isu8cont(uc))
+ continue;
+
+ if (uc == '\t') {
+ /*
+ * If two tabs occur close together,
+ * count the right one, including optional
+ * characters between both, as 8 columns.
+ */
+ if (tabc > 0) {
+ col -= 8;
+ /* Prefer starting after a tab. */
+ if (col <= 0)
+ cur++;
+ }
+
+ /*
+ * A tab can be preceded by up to 7 characters
+ * without taking up additional space.
+ */
+ tabc = 7;
+ continue;
+ }
+ thisc = char_len(uc);
+ if (tabc > 0) {
+ if (tabc > thisc) {
+ /* The character still fits in the tab. */
+ tabc -= thisc;
+ continue;
+ }
+ col -= 8; /* The tab is now full. */
+ thisc -= tabc; /* This may produce overflow. */
+ tabc = 0;
}
- tcol = newcol((unsigned char) es->cbuf[tcur++], tcol);
+
+ /* Handle a normal character or the overflow. */
+ col -= thisc;
}
- while (tcol - holdcol1 > winwidth / 2)
- holdcol1 = newcol((unsigned char) es->cbuf[holdcur1++],
- holdcol1);
- es->winleft = holdcur1;
+ es->winleft = cur;
}
/* Printing the byte ch at display column col moves to which column? */
ksh(1) VI mode: fix rewindow()