Ieee arithmetic
We will present some of the trade-offs for computation of IEEE arithmetic for REAL_64 and REAL_32 as implemented in EiffelStudio where NaN is not an unordered value but a value less than all the other values (note that in some other frameworks, we have seen it defined as the largest value). In other words:
- NaN = NaN yields True
- NaN < x for all x but NaN
To best show the trade-offs we will start by showing some benchmark results.
Contents
Benchmarks
The code
The code below defines an equality function as well as a comparison function. The test is divided in two parts, first the initialization and then the computation.
static EIF_NATURAL_64 to_raw_bits (EIF_REAL_64 d) { return *((EIF_NATURAL_64 *)&d); } static int eif_is_nan_bits (EIF_NATURAL_64 value) { /* Clear the sign mark. */ EIF_NATURAL_64 jvalue = (value & ~RTU64C(0x8000000000000000)); /* Ensure that it starts with 0x7ff and that the mantissa is not 0. */ return (jvalue > RTU64C(0x7ff0000000000000)); } static int eif_is_nan (EIF_REAL_64 value) { return eif_is_nan_bits(to_raw_bits (value)); } static int eif_equal_real_64 (EIF_REAL_64 d1, EIF_REAL_64 d2) { #ifdef METH1 /* Here the base comparison is IEEE arithmetic. */ return (d1 == d2); #elif defined(METH2) /* Conversion to perform comparison on the binary representation. */ EIF_NATURAL_64 f1 = to_raw_bits(d1); EIF_NATURAL_64 f2 = to_raw_bits(d2); return (f1 == f2 ? 1 : (eif_is_nan_bits (f1) && eif_is_nan_bits(f2))); #elif defined(METH3) /* Use IEEE arithmetic to compare and find out if we have NaNs. */ return (d1 == d2 ? 1 : ((d1 != d1) && (d2 != d2))); #elif defined (METH4) /* Pessimist case, we assume that we compare mostly NaNs. */ return (d1 == d1 ? d1 == d2 : d2 != d2); #elif defined(METH5) /* Use IEEE arithmetic to compare but use binary representation to * find out if we have NaNs. */ return (d1 == d2 ? 1 : (eif_is_nan (d1) && eif_is_nan(d2))); #endif } #define ARR_SIZE 100000 int main(void) { EIF_NATURAL_64 res, i; EIF_REAL_64 *d = (EIF_REAL_64 *) malloc (sizeof(EIF_REAL_64) * ARR_SIZE + 1); /* Initialization of `d'. */ ... for (i = 0; i <= 0x3FFFFFFF; i++) { /* Substitute comparison_function with what needs to be tested. */ res = res + comparison_function (d [i % ARR_SIZE], d[(i - 1) % ARR_SIZE]); } printf ("%d\n", res); }
Testing for NaN values
The array is filled with NaN values, which gives the following results:
Method used | Timing VC++ 2005 x64 | Timing gcc 4.4.1 x64 |
METH1 | 4.777s | ? |
METH2 | 5.440s | ? |
METH3 | 6.266s | ? |
METH4 | 5.560s | ? |
METH5 | 7.413s | ? |
Testing for non-NaN values which are always different
The array is filled with non-NaN values which are always different, which gives the following results:
Method used | Timing VC++ 2005 x64 | Timing gcc 4.4.1 x64 |
METH1 | 5.068s | ? |
METH2 | 6.495s | ? |
METH3 | 6.448s | ? |
METH4 | 6.424s | ? |
METH5 | 6.832s | ? |
Testing for non-NaN values which are always the same
The array is filled with non-NaN values which are always the same, which gives the following results:
Method used | Timing VC++ 2005 x64 | Timing gcc 4.4.1 x64 |
METH1 | 5.242s | ? |
METH2 | 5.464s | ? |
METH3 | 5.308s | ? |
METH4 | 6.428s | ? |
METH5 | 5.384s | ? |
Testing for 50% NaN values and 50% non-NaN values
The array is filled at 50% with NaN values and the rest with different values which gives the following results:
Method used | Timing VC++ 2005 x64 | Timing gcc 4.4.1 x64 |
METH1 | 6.889s | ? |
METH2 | 6.914s | ? |
METH3 | 6.405s | ? |
METH4 | 6.068s | ? |
METH5 | 6.880s | ? |
Testing for 25% NaN values and 75% non-NaN values
The array is filled at 25% with NaN values and the rest with different values which gives the following results:
Method used | Timing VC++ 2005 x64 | Timing gcc 4.4.1 x64 |
METH1 | 7.553s | ? |
METH2 | 6.672 | ? |
METH3 | 8.997s | ? |
METH4 | 6.250s | ? |
METH5 | 8.063s | ? |