--- title: Rounding to Decimal Digits in Binary author: Martin Mächler address: ETH Zurich and R Core Team date: 'July 4, 2020; rendered on `r Sys.Date()`' output: html_vignette: css: style.css vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Rounding to Decimal Digits in Binary} %\VignetteEncoding{UTF-8} %\VignetteDepends{round} --- ## Intro In R (and its predecessors S and S+), we have always known and often used `round(x, digits)` to round a numeric (or complex) vector of numbers `x` to `digits` decimals after the (decimal) point. However, not many R users (nor scientists for that matter) have been aware of the fact that such rounding is not trivial because our computers use binary (_base_ __2__) arithmetic and we are rounding to decimal digits, aka decimals, i.e., in _base_ __10__. On the topic of floating point computation, we have had the _most_ frequently asked question (FAQ) about R, the infamous [R FAQ 7.31][], and in 2017, Romain François even created an R package [seven31][] (not on CRAN) to help useRs exploring what we say in the FAQ. Recently, there has been an official R bug report (on R's bugzilla ), [`PR#17668`][] with summary title __Artificial numerical error in `round()` causing round to even to fail__. Adam Wheeler started with a shorter version (just using digits = 1,2,..,8) of the following examples and his own remarks about correctness: ```{r} op <- options(digits = 15) ##' maybe add to package -- it's really useful here : formatF <- function(x, ...) noquote(format(x, scientific=FALSE, drop0trailing=TRUE, ...)) formatF(rbind( # R <= 3.6.x : round(55.5, 0), # 56 <- correct round(55.55, 1), # 55.5 round(55.555, 2), # 55.55 round(55.5555, 3), # 55.556 <- correct round(55.55555, 4), # 55.5555 round(55.555555, 5), # 55.55555 round(55.5555555, 6), # 55.555555 round(55.55555555, 7), # 55.5555556 <- correct round(55.555555555, 8), # 55.55555555 round(55.5555555555, 9), # 55.555555555 round(55.55555555555,10),# 55.5555555556 <- correct round(55.555555555555,11)# 55.55555555556 <- correct )) ``` (Note that we eventually will *not agree* on the above `correct` judgements, as the matter is more subtle than one might think at first) Whereas the exact result of the R code above currently depends on your version of R, our `round` package's `roundX(x, dig, version = "r1.C")` now provides these, using the same C source code as R 3.6.2 (Note we adopt the convention to use `"r.C"`, ending in `.C` for round()ing versions where R calls package C code): ```{r} require(round) formatF(rbind( roundX(55.5, 0, "r1.C"), roundX(55.55, 1, "r1.C"), roundX(55.555, 2, "r1.C"), roundX(55.5555, 3, "r1.C"), roundX(55.55555, 4, "r1.C"), roundX(55.555555, 5, "r1.C"), roundX(55.5555555, 6, "r1.C"), roundX(55.55555555, 7, "r1.C"), roundX(55.555555555, 8, "r1.C"), roundX(55.5555555555, 9, "r1.C"), roundX(55.55555555555,10, "r1.C"), roundX(55.555555555555,11, "r1.C"))) ``` Adam (see above) used his own C code to see what happens in R's C code for `round()` and proposed to simplify the C code, *not* doing offset calculations (which substract the integer part `intx`, round and re-add `intx`), as now available with `roundX(*, "r0.C")`. This has started an _"investigative"_ story which got quite a bit longer than anticipated. ## The Easy Problem "in Theory" ### Rounding to Integers Well, rounding is supposedly quite simple and most of us learned a version of it in our first years of school when learning the (decimal) numbers, calculating and then simple decimal fractions. What most pupils learn (first) is round _up_, and round _down_ (to an _integer_), and then end with the "best" __rounding to nearest__ i.e., rounding to the nearest _integer_, notably using _"round half up"_, which mathematically is simply computing $y = r_0(x) := \left\lfloor x + 0.5 \right\rfloor$, see e.g., [Wikipedia, Rounding][], or when taking _negative_ numbers into account and ensuring _symmetry_, $y = \mathrm{sgn}(x) \left\lfloor \left| x \right| + 0.5 \right\rfloor = -\mathrm{sgn}(x) \left\lceil -\left| x \right| - 0.5 \right\rceil$, where the _"floor"_ $\lfloor x \rfloor$ and _"ceiling"_ $\lceil x \rceil$ operators are defined (customary in mathematics) as $$ \lfloor x \rfloor := \max_{n \in \mathbb{Z}} \{n \le x\}, $$ and $$ \lceil x \rceil := \min_{n \in \mathbb{Z}} \{n \ge x\}. $$ Whereas these (_rounding to nearest_ and _"round half up"_) rules, also called _"commercial rounding"_, are widely taught and used, they lead to a small bias e.g., when numbers have all the same sign, and for this reason, for computing, __round_half_to_even__ has been proposed and adopted as default rounding mode in the IEEE 754 floating-point standard and it corresponds to C and C++ math library function `nearbyint()`, and has been called _convergent rounding_, _statistician's rounding_, _bankers' rounding_ (and more), see [Wikipedia, Round half to even][]. It is also what you get with R's `round(x)`, as `round()`'s second argument has default 0: ```{r, str-round} str(round) ``` So, for now, let's set and use $$ round(x) := r(x) := nearbyint(x) $$ Note that all this rounding _to integer_ is defined _independently_ of the base of your number representation system (such as _"decimal"_ or _"binary"_ ). ### Rounding to non-zero Digits Things change slightly, when rounding to a non-zero number of decimal places, i.e., _digits_. In theory, i.e., if computers would compute mathematically perfectly accurately, also this has a simple solution: $$ round(x, d) := r(x \cdot 10^d) / 10^d $$ for all integer $d \in \mathbb{Z}$ (i.e., including negative `digits` $d$). All the practical problems stem from the fact that on a (binary arithmetic) computer, the number $x/10$ cannot be represented exactly in binary unless $x$ is an integer multiple of 5. Indeed, the simple fraction 1/5 is an infinite length binary fraction: $$ \frac 1 5 = \frac{1 \cdot \frac{3}{16}}{5 \cdot \frac{3}{16}} = \frac{\frac{3}{16}}{1 - \frac{1}{16}} = \frac{3}{16} \frac{1}{1 - \frac{1}{16}} = \left(\frac{1}{8} + \frac{1}{16}\right) \cdot \left(1 + \frac{1}{16} + \frac{1}{16^2} + \ldots\right) \\ = (0.001100110011001100110011\ldots\ldots)_2 . $$ ## Versions of round()ing - The Story At first, [R bugzilla, comment #6][], I've committed a version of Adam's proposal to R-devel ([R svn r77609][], on 2019-12-21, [^1]) but found that the simplification improved the above examples in that it always rounded to even, but it clearly broke cases that were working correctly in R 3.6.x. That version is available with our `roundX(*, version = "r0.C")` (Version `0` as it is even simpler than version `1`). One CRAN package had relied on `round(x, digits = .Machine$integer.max)` to return integers unchanged, but ```{r} roundX(c(-2,2), digits = .Machine$integer.max, version = "r0.C") ``` gave non-sense, and there were less extreme cases of relatively large `digits` which had stopped working with "r0". See the two `roundX()` versions via simple wrapper `roundAll()`: ```{r, integer-x-large-digs} i <- c(-1,1)* 2^(33:16) stopifnot(i == floor(i)) # are integer valued roundAll(i, digits = 300, versions = c("r0.C", "r1.C")) ``` Looking at these, I also found that internally, R's `round()` had effectively worked as if `digits <- pmin(308, digits)`, i.e., truncated digits larger than 308. This is clearly not good enough for very small numbers (in absolute value), ```{r, small-x-large-digs} e <- 5.555555555555555555555e-308 d <- 312:305 ; names(d) <- paste0("d=", d) roundAll(e, d, versions = c("r0.C", "r1.C", "r2.C")) ``` As I was embarrassed to have blundered, I've worked and committed what now corresponds to `roundX(*, version = "r2.C")` to R-devel ([R svn r77618][], on 2019-12-24, 16:11). Also, Jeroen Ooms, maintainer of CRAN package `jsonlite`, contacted the CRAN team and me about the change in R-devel which broke one regression test of that package, and on Dec 27, he noticed that R 3.6.2's version of `round()` was seemingly compatible [^2] with the (C library dependent) versions of `sprintf()` and also with R's `format()` whereas the R-devel versions where not, for his example: (Note: `format(x, digits = d)` uses __significant__ digits `d`, not digits after the decimal point as `round()` does!) ```{r, JO-ex-sprintf} x <- 9.18665 format(x, digits = 5) # "9.1867" sprintf("%.4f", x) # "9.1867" ## but -- showing now roundAll(x, 4) ## but note that print(x, digits=18) ``` which (typically) shows `9.1866500000000002`, i.e., a number closer to 9.1867 than to 9.1866, and so really should be rounded up, not down. However, that is partly wrong: Whereas it is true that it's closer to 9.1867 than to 9.1866, one must be aware that these two decimal numbers are __neither__ exactly representable in binary double precision, and a further careful look shows actually the double precision version of these rounded numbers do have the exact same distance to `x` and that the main principle *round to nearest* here gives a tie: ```{r, JO-ex-diff} print(rbind(9.1866, 9.18665, 9.1867), digits=18) (dx <- c(9.1866, 9.1867) - x) # [1] -4.99999999998835e-05 4.99999999998835e-05 diff(abs(dx)) # is zero ! options(digits=7) # revert to (typical) default ``` and because of the tie, the *round to even* rule must apply which means rounding *down* to 9.1866, and so both libc's printf and hence R's `sprintf()` are as wrong as R 3.6.x has been, and indeed all our `roundX()` versions apart from `"sprintf"` and the '"r1*"' (previous R) ones, do round down: ```{r, roundAll-x} roundAll(x, 4) ``` Finally, I think we've seen the light and on one hand recalled what we have known for long (but most R users will not be aware of at all) > 1. Almost all finite decimal fractions are __not__ (exactly) representable as binary > double precision numbers, and consequently, > 2. *round to nearest* applies much more often _directly_ rather than via the > tie breaking rule *round to even* even for the case where the decimal fraction ends in a `5`. and hence, a _"correct"_ implementation must really _measure, not guess_ which of the two possible decimals is closer to `x`. This lead to our R level algorithm `round_r3()` which is the workhorse used by `roundX(x,d, version = "r3")` : ```{r, round_r3-def} round_r3 ``` and its two C level versions `"r3.C"` (using `long double`) and `"r3d.C"` (_"d"_ for _"double"_, as it uses double precision only). For R 4.0.0, this (equivalent of `"r3d.C")`, has been used for `round(x, digits)` now, as does not use `long double` and is very close to the R level implementation `"r3"` (i.e., `round_r3()`) makes it potentially less platform dependent and easier to explain and document. Lastly, note that the original set of examples is then treated differently from all previous proposals: ```{r, roundAll-55.5} op <- options(digits = 15, width = 2*3*23) formatF(rbind( d0 = roundAll(55.5, 0) ,d1 = roundAll(55.55, 1) ,d2 = roundAll(55.555, 2) ,d3 = roundAll(55.5555, 3) ,d4 = roundAll(55.55555, 4) ,d5 = roundAll(55.555555, 5) ,d6 = roundAll(55.5555555, 6) ,d7 = roundAll(55.55555555, 7) ,d8 = roundAll(55.555555555, 8) ,d9 = roundAll(55.5555555555, 9) ,d10= roundAll(55.55555555555,10))); options(op) ``` ## Alternative Approaches As I had asked for comments about my proposal(s), on the R-devel mailing lists, Steven Dirkse (@ GAMS) mentioned he was not quite happy with the considerations above, and in the end (not on-list) summarized his own rational approach in the following way -- which I like to present as it is coherent and simple, summarized as: 1. all double precision numbers are rationals mathematically, i.e., $\in \mathbb{Q}$. 2. `round(x, n)` as a mathematical function is well defined unambigously on the rationals. In our notation above, simply $round(x, n) := r(x \cdot 10^n) / 10^n$, where $r(x) := round(x) = nearbyint(x)$, using _round to even_ if desired. These two define _exact rounding_ and he would want R to use that. Note for that, R would have to use exact arithmetic with rational numbers which has been available for years via the C library `GNU MP` aka [`GMP`] and in R via CRAN pkg [gmp][]. In the vignette [Q Round][], we indeed use CRAN package `gmp` to perform such exact rounding using exact rational numbers, and compare these with the (double precision arithmetic) algorithms we have introduced here. However, even if R would _"come with GMP"_ in the future (which is conceivable for other reasons), we should note that this would still keep `rx <- round(x, n)` in R inaccurate in most cases, since `rx` is `numeric`, i.e., a double precision number, and almost all rational numbers are _not_ exactly representable as double. [Q Round]: https://CRAN.R-project.org/package=round/vignettes/rationalRound.html [R FAQ 7.31]: https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f [seven31]: https://github.com/ThinkR-open/seven31 "seven31" [`PR#17668`]: https://bugs.r-project.org/show_bug.cgi?id=17668 [Wikipedia, Rounding]: https://en.wikipedia.org/wiki/Rounding [Wikipedia, Round half to even]: https://en.wikipedia.org/wiki/Rounding#Round_half_to_even [R bugzilla, comment #6]: https://bugs.r-project.org/show_bug.cgi?id=17668#c6 [`GMP`]: https://gmplib.org/ [gmp]: https://CRAN.R-project.org/package=gmp [Rounding]: https://CRAN.R-project.org/package=round/vignettes/Rounding.html [^1]: using Winston Chang's semi-official mirror of R's official SVN repository at (as `https://svn.r-project.org/R/trunk@77609` has not been enabled in its server) [^2]: do note that `sprintf()` and R 3.6.x's version of `round()` are *not* at all equivalent, even if they are for Jeroen's example [R svn r77609]: https://github.com/wch/r-source/commit/33d95f1 [R svn r77618]: https://github.com/wch/r-source/commit/7a55bbeb ### Session information (Note half a dozen non-standard packages present only as dependences of `rmarkdown` we use for rendering this vignette) ```{r, echo=FALSE} print(sessionInfo(), locale=FALSE) ```