Germán Rodríguez

Introducing R
Princeton University
These notes have hardly scratched the surface of R, which has many more statistical functions. These include functions to calculate the density, cdf, and inverse cdf of distributions such as chi-squared, t, F, lognormal, logistic and others.

The `survival`

library includes methods for the estimation
of survival curves, tests of differences between survival curves, and
Cox proportional hazards models. The library `lme4`

includes
code for fitting generalized linear mixed effect models, including
multilevel models. Many new statistical procedures are first made
available to the research community in the form of R functions.

To produce really nice graphs consider installing the
`ggplot2`

package. To draw a plot you specify a data frame,
aesthetics that map variables to aspects of the graph, and geometries
that specify whether to use points, lines, or other primitives. You fill
find more information at https://ggplot2.tidyverse.org/

For data management I recommend that you install the
`dplyr`

package, which includes tools for adding new
variables, selecting cases or variables (rows or columns), as well as
summarizing and re-arranging your data. Check the overview at https://dplyr.tidyverse.org/.

You can also run `install.packages("tidyverse")`

to
install all the packages in the tidyverse, including
`ggplot2`

and `dplyr`

, as well as
`tidyr`

(for help tidying data), `readr`

(for
reading rectangular data like csv files), `purrr`

(for an
alternative to loops), `tibble`

(for tidy data frames),
`stringr`

(for working with strings) and `forcats`

(for working with factors). Learn more at https://www.tidyverse.org/packages/.

In addition, R is a full-fledged programming language, with a rich
complement of mathematical functions, matrix operations and control
structures. It is very easy to write your own functions. To learn more
about programming R, I recommend Wickham
(2019)’s *Advanced R* book.

R is an interpreted language but it is reasonably fast, particularly if you take advantage of the fact that operations are vectorized, and try to avoid looping. Where efficiency is crucial you can always write a function in a compiled language such as C++ and then call it from R. Some of my work on multilevel generalized linear models used this approach.

Last, but most certainly not least, you will want to learn about dynamic documents using R Markdown. The basic idea here is to combine a narrative written in Markdown with R code, an approach that has excellent support in R Studio. The definite book on the subject is Xie, Allaire, and Grolemund (2019).

This tutorial has been written in R Markdown. You can download the
source code `introducingR.Rmd`

, the bibliography file
`introducingR.bib`

, and the image file
`RStudioIDE.png`

from GitHub.
To reproduce the PDF document you also need `tweaks.tex`

from
the same source. To generate an HTML document instead, change the output
specification near the top of the script.

Becker, Richard A., and John M. Chambers. 1984. *S an
Interactive Environment for Data Analysis and Graphics*. Belmont,
CA: Wadsworth.

Becker, Richard A., John M. Chambers, and Allan R. Wilks. 1988. *The
New S Language*. Pacific Grove, CA: Wadsworth.

Braun, W. John, and Duncan J. Murdoch. 2016. *A First Course in
Statistical Programming with R*. Second Edition.
Cambridge University Press.

Chambers, John M. 1998. *Programming with Data*. New York:
Springer.

———. 2008. *Software for Data Analysis. Programming with
R*. New York: Springer.

———. 2016. *Extending R*. Boca Raton, FL: Chapman
Hall/CRC.

Chambers, John M., and Trevor J. Hastie, eds. 1992. *Statistical
Models in S*. Pacific Grove, CA: Wadsworth.

Dalgaard, Peter. 2008. *Introductory Statistics with
R*. Second Edition. New York: Springer.

Fox, John. 2002. *An R and S-Plus Companion
to Applied Regression*. Thousand Oaks, CA: SAGE.

Hothorn, Torsten, and Brian S. Everitt. 2014. *A Handbook of
Statistical Analyses Using R*. Third Edition. Boca
Raton, FL: CRC Press.

Krause, Anreas, and Melvin Olson. 1997. *The Basics of S
and S-Plus*. New York: Springer.

Murrell, Paul. 2006. *R Graphics*. Boca Raton, FL: Chapman
Hall/CRC.

Pinheiro, José C., and Douglas M. Bates. 2000. *Mixed-Effects Models
in S and S-Plus*. New York: Springer.

Therneau, Terry M., and Patricia M. Grambsch. 2000. *Modeling
Survival Data*. New York.

Venables, W. N., and B. D. Ripley. 2000. *S
Programming*. New York: Springer.

———. 2002. *Modern Applied Statistics with S-Plus*.
Fourth Edition. New York: Springer.

Wickham, Hadley. 2016. *ggplot2 Elegant
Graphics for Data Analysis*. Second Edition. New York: Springer. https://ggplot2-book.org/.

———. 2019. *Advanced R*. Second Edition. Boca Raton,
FL: CRC Press. https://adv-r.hadley.nz/.

Wickham, Hadley, and Garret Grolemund. 2017. *R for Data
Science*. Sebastopol, CA: O’Reilly. https://r4ds.had.co.nz/.

Wilkinson, Leland. 2005. *The Grammar of Graphics*. Second
Edition. New York: Springer.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2019. *R Markdown:
The Definitive Guide*. Boca Raton, FL: Chapman Hall/CRC. https://bookdown.org/yihui/rmarkdown/.