Germán Rodríguez

Introducing R
Princeton University
The purpose of these notes is to provide a quick introduction to R,
particularly as a tool for fitting linear and generalized linear models.
A PDF version is available here. The web
pages and PDF were created using

`markdown`

as explained in
the last section. You will find many applications in the computing logs
for the statistics courses on this site.R is a powerful environment for statistical computing which runs on several platforms. These notes are written specially for users running the Windows version, but most of the material applies to the Mac and Linux versions as well. Following some bibliographic remarks and tips for getting started, I describe reading and examining data, and fitting linear and generalized linear models. I close with a few pointers and references on where to go from here. I have tried to introduce key features of R as needed by students in my statistics classes. As a result, I often postpone (or altogether omit) discussion of some of the more powerful features of R as a programming language.

R was first written as a research project by Ross Ihaka and Robert Gentleman, and is now under active development by a group of statisticians called ‘the R core team’, with a home page at https://www.r-project.org.

R was designed to be ‘not unlike’ the S language developed by John Chambers and others at Bell Labs. A commercial version of S with additional features was developed and marketed as S-Plus by Statistical Sciences, eventually becoming part of TIBCO Spotfire. You can view R and S-Plus as alternative implementations of the same underlying S language. The modern R implementation, however, is by far the most popular.

R is available free of charge and is distributed under the terms of the Free Software Foundation‘s GNU General Public License. You can download the program from the Comprehensive R Archive Network (CRAN). Ready-to-run ’binaries’ are available for Windows, Mac OS, and Linux. The source code is also available for download and can be compiled for other platforms.

S was first introduced by Becker and Chambers [-@beckerChambers1984] in what’s known as the ‘brown’ book. The new S language was described by Becker, Chambers and Wilks [-@beckerChambersWilks1988] in the ‘blue’ book. Chambers and Hastie [-@chambersHastie1992] edited a book discussing statistical modeling in S, called the ‘white’ book. The latest version of the S language is described by @chambers1998 in the ‘green’ book, but R is largely an implementation of the versions documented in the blue and white books. Chamber’s latest books, @chambers2008 and @chambers2016, focus on programming with R.

Venables and Ripley -@venablesRipley2002 have written an excellent
book on Modern Applied Statistics with S-PLUS that is now in its fourth
edition. The latest edition is particularly useful to R users because
the main text explains differences between S-Plus and R where relevant.
A companion volume called *S Programming* appeared in 2000 and
applies to both S-Plus and R [@venablesRipley2000]. These authors have also
made available in their website an extensive collection of complements
to their books, follow the links at MASS 4.

There is now an extensive and rapidly growing literature on R. Good
introductions include the books by @krauseOlson1997, @dalgaard2008, and @braunMurdoch2016. Beginners will probably
benefit from working through the examples in @hothornEveritt2014 *A Handbook of
Statistical Analyses Using R*, now in its third edition, or @fox2002’s companion to applied regression.
Among more specialized books my favorites include @murrell2006, an essential reference on R
graphics, @pinheiroBates2000, a book on
mixed models, and @therneauGrambsch2000’s
book on survival models. (Therneau wrote the survival analysis code used
in S-Plus and R.)

Hadley Wickham has made a number of ground-breaking contributions to
R that deserve special mention.He is the author of the
`ggplot2`

package [@wickham2016], a very popular graphics package
that has brought to R the grammar of graphics proposed by @wilkinson2005. He has also contributed a number
of data-management packages, of which the most notable is
`dplyr`

, and has advocated the use of “tidy” datasets. His
approach to data management is explained in detail in @wickhamGrolemund2017, with its own website at
https://www.tidyverse.org/. He has also written an
advanced book on programming R, now in its second edition (@wickham2019). You will find most of his work
available online; follow the links in the list of references at the end
of this tutorial.

The official R manuals are distributed as PDF files. These include
*An Introduction to R* (a nice 100-page introduction), a manual
on *R Data Import/Export* describing facilities for transferring
data to and from other packages, and useful notes on *R installation
and Administration*. More specialized documents include a draft of
the *R Language Definition*, a guide to *Writing R
Extensions*, documentation on *R Internals* including coding
standards, and finally the massive *R Reference Index* (~3000
pages). The online help facility is excellent. For additional references
see the annotated list at R Books.

Continue with Getting Started