Download raw body.
exact floating point calculations in roff(7)
Hello,
yesterday, bluhm@ made me aware that on i386, the regression test
roff/nr/scale fails because it calculates
100M = 100 * 24/100 = 23 # sic, 23 instead of 24
when compiled in the default way, i.e. with -O2. The problem goes
away and yields the correct result 24 when compiling with -O0 instead.
On amd64, we get the correct result 24 with both -O2 and -O0.
In a private mail, bluhm@ told me (that's above my pay grade,
i'm merely quoting his words):
> i386 keeps doubles temporarily as 10 bytes within the FPU co-processor.
> amd64 uses SSE floating point intructions which operate on IEEE
> conforming 8 byte doubles. That would explain why rounding behaves
> differently on i386.
I tried to find a minimal reproducer of the issue but so far did not.
In particular, even on i386 with -O2, the following code yields
exactly 24.0:
double x, y;
x = 100.0;
y = 24.0/100.0;
x *= y;
So apparently, the result depends on how exactly or in which context
this calculation is done - not sure about the details.
I'm asking for advice because i suspect this might have to do with
undefined behaviour (but i'm not sure, maybe results depending on
the -O level might possibly even indicate a compiler bug?), and also
because the code has the following very unusual combination of two
requirements:
1. On the one hand, the roff(7) language requires parsing of
arbitrary decimal fractions. For example,
0.0009765625f = 64
and this works as desired with both -O2 and -O0 even on i386.
2. On the other hand, for some specific inputs, the roff(7) language
requires the same calculations to yield exact results followed
by floor(3)-like rounding.
This combination of requirements seems very unusual to me.
From designing floating point calculations in scientific contexts,
for example analyzing experimental or generating simulation data
in high energy physics, i'm used to the requirement 1, but then
every floating point number always comes with an error or standard
deviation and testing for exact equality never makes any sense.
That scientific mode of operation implies best practices like
1.a) Do not (tiny + huge) - huge because that might result in zero
or at least lose precision in the mantissa. Instead,
do tiny + (huge - huge) which will correctly yield tiny.
1.b) Do not (huge * huge) / huge because that might overflow the
exponent. Instead, do huge * (huge / huge) which will
correctly yield huge.
None of these typical FP gotchas appear to be relevant here, though.
On the other hand, from mathematical contexts, i'm used to the concept
that when doing rational arithmetics (as opposed to real arithmetics),
requirement 2 can sometimes be met by finding the larget common denominator
of all possible inputs up front, multipling all numbers by that, and doing
all calculations using integer rather than floating point arithmetic,
which naturally yields exact results. Unfortunately, that is not an
option here because of requirement 1. When users can provide
arbitrary decimal fractions, there is no larget common denominator
that could be used.
So i think we are stuck with having to do exact arithmetic in floating
point, which i never had to do before. How does one make that
well-defined, or at least as likely as possible to succeed?
I think the problem with 100 * 24/100 is that dividing by 100 divides,
among other factors, by the odd prime factor 5, which can result in
infinte 2-adic expansion and thus ultimately roundoff error.
The following patch employs the strategy of doing all multiplications
first and only dividing at the end, hoping to avoid intemediate results
that result in infinte 2-adic expansions by having as many prime factors
as possible in the integer part of the number before starting divisions.
Doing this feels very weird to a science-educated person like me
because it is kind of the opposite of best practice 1.b) above.
It also feels suspiciously like bogus original research - i mean,
even though the combination of requirements the roff(7) language
imposes is no doubt unusual, i can't really be the first person
encountering this kind of issue in practice?
Does the following patch make any sense to you?
It does fix the failing unit test on i386,
but what is the standard way to deal with this weird combination
of requirements, if there is any standard way?
Also note that the unit tests are *not* over-testing.
That 100M = 100 mini-em units = 1 em unit = 24 basic units
yields the correct result "24" and not the off-by-one "23"
is potentially relevant for practical work with roff(7) documents.
I also audited the rest of the mandoc(1) code for the use of the
double and float data types and did not find any other places where
the above issue might blow up, except in this one function
roff_getnum() in the file roff.c, which is used for parsing
various roff requests, most prominently assignments to number
registers (.nr), but also some others like conditionals, .it,
.ce, .rj, and .shift.
Puzzled,
Ingo
Index: roff.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/roff.c,v
diff -u -p -r1.276 roff.c
--- roff.c 6 Jan 2025 18:48:13 -0000 1.276
+++ roff.c 2 Apr 2025 13:52:53 -0000
@@ -2468,7 +2468,8 @@ roff_getnum(const char *v, int *pos, int
myres *= 240.0;
break;
case 'c':
- myres *= 240.0 / 2.54;
+ myres *= 24000.0;
+ myres /= 254.0;
break;
case 'v':
case 'P':
@@ -2479,12 +2480,14 @@ roff_getnum(const char *v, int *pos, int
myres *= 24.0;
break;
case 'p':
- myres *= 40.0 / 12.0;
+ myres *= 40.0;
+ myres /= 12.0;
break;
case 'u':
break;
case 'M':
- myres *= 24.0 / 100.0;
+ myres *= 24.0;
+ myres /= 100.0;
break;
default:
break;
exact floating point calculations in roff(7)