Quantiles in Stata and R

Starting with version 2.1, markstat lets you combine Stata, Mata and R code blocks and inline code. Here is a simple example regarding the calculation of quantiles.

qsr.stmd
% Quantiles in Stata and R

Stata and R compute percentiles differently. Let us load the `auto`
dataset and compute the 75th percentile of `price` using Stata's `centile`

```s
    sysuse auto, clear
    centile price, centile(75)
    save auto, replace
```

We find that the 75-th percentile is `s r(c_1)`.

Now let us do the same with R. We'll use the `haven` library to read a 
Stata file

```r
    library(haven)
    auto <- read_dta("auto.dta")
    q <- quantile(auto$price, 0.75); q
```

According to R, the 75-th percentile is `r round(q, 1)`. 

Turns out R has 9 types of quantiles, the default is 7.  To get the same result 
as `centile` specify type 6, which gives `r quantile(auto$price, 0.75, type=6)`.

The Stata commands `summarize, detail`, `xtile`, `pctile` and `_pctile` use yet 
another method, equivalent to R's type 2. These give the third quartile as
`r quantile(auto$price, 0.75, type=2)`. The last three commands have an 
`altdef` option that gives the same answer as `centile`.

For a discussion of these methods see
Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, 
*American Statistician* 50:361-365.

As you can see, we handle R code the same way as Stata and Mata, using code fences but with an r instead of an s or m. You can copy and paste this script, or download it to your working directory using the command

copy https://grodri.github.io/markstat/qsr.stmd .

To run this script in Stata you use the command

markstat using qsr

The script uses the strict syntax, but markstat 2.1 and higher detects the use of code fences and sets strict mode automatically. (The strict option remains available for rare cases where autodetection will not work, such as files with indented Markdown but no Stata, Mata or R code.)

You can see the html output here.

For this to work you need to have R installed, and you need to use whereis from SSC to register the location of R in your computer. I recommend you first update whereis to make sure you have the latest version. Then follow the R instructions on Getting Started, which has registration examples for Windows 10 and Mac OS X.

This particular script also requires R’s haven package to read Stata files. Stas Kolenikov pointed out that you could modify the script to install the package on demand, replacing library(haven) with

tryCatch(library("haven"), 
    error = function(e) install.packages("haven", repos="https://cloud.r-project.org"),
    finally = library("haven"))

For a more extensive example, see this page, which uses Bootstrap tabs to switch between Stata and R in a Cox regression.

Reference

Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50:361-365.

New in markstat 2.1