Stata and R compute percentiles differently. Let us load the auto
dataset and compute the 75th percentile of price
using Stata’s centile
. sysuse auto, clear (1978 automobile data) . centile price, centile(75) Binom. interp. Variable │ Obs Percentile Centile [95% conf. interval] ─────────────┼───────────────────────────────────────────────────────────── price │ 74 75 6378 5798.432 9691.6 . save auto, replace file auto.dta saved
We find that the 75-th percentile is 6378.
Now let us do the same with R. We’ll use the haven
library to read a Stata file
> library(haven) > auto <- read_dta("auto.dta") > q <- quantile(auto$price, 0.75); q 75% 6332.25
According to R, the 75-th percentile is 6332.2.
Turns out R has 9 types of quantiles, the default is 7. To get the same result as centile
specify type 6, which gives 6378.
The Stata commands summarize, detail
, xtile
, pctile
and _pctile
use yet another method, equivalent to R’s type 2. These give the third quartile as 6342. The last three commands have an altdef
option that gives the same answer as centile
.
For a discussion of these methods see Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50:361-365.