From: Ingo Schwarze Subject: ksh(1) VI mode: fix rewindow() To: tech@openbsd.org Date: Mon, 26 May 2025 00:54:12 +0200 Hello, in ksh(1) VI editing mode, when you move the cursor to the left or right out of the editing window, the function rewindow() gets called. Its purpose is to calculate a new value for es->winleft, which is the index of the first byte of the editing buffer es->cbuf[] to be displayed. After rewindow() returns, display() redraws the line, starting at the just selected position. The algorithm rewindow() currently uses goes like this: 1. The overall goal is to choose es->winleft in such a way that the cursor position (corresponding to es->cursor) ends up approximately in the middle of the editing window. The task is non-trivial because byte positions in the buffer do not directly correspond to column positions. In particular, UTF-8 continuation bytes do not take up column positions, whereas tabs may take up more than one column, and so do ASCII control characters (and high bytes if the option vi-show8 is set). Given that some bytes take up more than one column, it is not always possible to get the cursor *exactly* to the middle of the window, but fortunately, *approximately* is good enough. 2. The function iterates the whole command line from the very beginning up to the cursor position - even if the cursor is far out to the right and the beginning is completely irrelevant. 3. During this iteration, it remembers the last two places where the column position is (approximately) an integer multiple of half the window width - even though we are not really interested in distances from the beginning, but rather in distances from the cursor. 4. Finally, it starts *another* iteration at the last but one multiple of winwidth/2 (which may be too far left) and goes to the right until it thinks that starting there might put the cursor in the middle. This algorithm has multiple defects. a) It is written in such a way that it starts the editing window with a UTF-8 continuation byte whenever it gets the chance, resulting in garbled output that is not valid UTF-8. b) Needlessly iterating the whole buffer is inefficient. c) Iterating over bytes again that we already inspected during the first iteration is inefficient. d) Even with the second iteration in step 4, the result is not guaranteed to be optimal because the width of a tab depends on the position of the editing window, which the second iteration in step 4 does not take into account. e) The code, even though relatively short, is hard to follow, and it is hard to see why it does what it does. Fixing everything, including completely fixing defect d), would probably require increasing the order of the algorithm. Currently, it is something like O(C + W) where C is the cursor position in bytes from the beginning of the line and W is half the width of the editing window. Reliably finding the optimal position in every case involving tabs and control characters would probably require an algorithm of the order O(W^2) because one would have to try several starting position and do an iteration calculating the cursor position for each of them, resulting in a quadratic algorithm. Well, picking a tricky search algorithm, one might get that down to O(W * log(W)) - but either way, an exact solution is likely not worth the significant complication of the code it would cause. What i propose is solving a), b), c), and e) by using an algorithm of the order O(W) - i.e. slightly faster - that is slightly longer, but much easier to understand. It doesn't become worse with respect to defect d) - it may be a handful of bytes off to the left or right in some cases involving literal tabs, but i consider that irrelevant for practical purposes. The key to the new algorithm is iterating *backwards* rather than forwards - which makes sense because we want to start at the cursor position and than step left until we have backed up by about half the window width. OK? Ingo Index: vi.c =================================================================== RCS file: /cvs/src/bin/ksh/vi.c,v diff -u -p -r1.66 vi.c --- vi.c 19 May 2025 14:27:38 -0000 1.66 +++ vi.c 25 May 2025 21:41:00 -0000 @@ -1789,25 +1789,61 @@ outofwin(void) static void rewindow(void) { - int tcur, tcol; - int holdcur1, holdcol1; - int holdcur2, holdcol2; - - holdcur1 = holdcur2 = tcur = 0; - holdcol1 = holdcol2 = tcol = 0; - while (tcur < es->cursor) { - if (tcol - holdcol2 > winwidth / 2) { - holdcur1 = holdcur2; - holdcol1 = holdcol2; - holdcur2 = tcur; - holdcol2 = tcol; + int cur; /* byte# in the main command line buffer */ + int col; /* corresponding display column */ + int tabc; /* columns a tab character can take up */ + int thisc; /* columns the current character requires */ + unsigned char uc; /* the current byte */ + + /* The desired cursor position is near the middle of the window. */ + cur = es->cursor; + col = winwidth / 2; + tabc = 0; + + /* Step left to find the desired left margin. */ + while (cur > 0 && col > 0) { + uc = es->cbuf[--cur]; + + /* Never start the window on a continuation byte. */ + if (isu8cont(uc)) + continue; + + if (uc == '\t') { + /* + * If two tabs occur close together, + * count the right one, including optional + * characters between both, as 8 columns. + */ + if (tabc > 0) { + col -= 8; + /* Prefer starting after a tab. */ + if (col <= 0) + cur++; + } + + /* + * A tab can be preceded by up to 7 characters + * without taking up additional space. + */ + tabc = 7; + continue; + } + thisc = char_len(uc); + if (tabc > 0) { + if (tabc > thisc) { + /* The character still fits in the tab. */ + tabc -= thisc; + continue; + } + col -= 8; /* The tab is now full. */ + thisc -= tabc; /* This may produce overflow. */ + tabc = 0; } - tcol = newcol((unsigned char) es->cbuf[tcur++], tcol); + + /* Handle a normal character or the overflow. */ + col -= thisc; } - while (tcol - holdcol1 > winwidth / 2) - holdcol1 = newcol((unsigned char) es->cbuf[holdcur1++], - holdcol1); - es->winleft = holdcur1; + es->winleft = cur; } /* Printing the byte ch at display column col moves to which column? */