Packages v. Libraries in R

Posted on January 2nd, 2013

In the past I've used the terms "R library" and "R package" synonymously (e.g. this blog post and this paper), but a careful reader has called me out. Mark Sharp notes that there are differences between libraries and packages.

Chapter one of the R Manual Writing R Extensions gives the details:

package is a directory of files which extend R, either a source package (the master files of a package), or a tarball containing the files of a source package, or an installed package, the result of running R CMD INSTALL  on a source package. On some platforms there are also binary packages, a zip file or tarball containing the files of an installed package which can be unpacked rather than installing from sources.

A package is not a library. The latter is used in two senses in R documentation. The first is a directory into which packages are installed, e.g. /usr/lib/R/library: in that sense it is sometimes referred to as a library directory orlibrary tree (since the library is a directory which contains packages as directories, which themselves contain directories). The second sense is that used by the operating system, as a shared library or static library or (especially on Windows) a DLL, where the second L stands for ‘library’. Installed packages may contain compiled code in what is known on most Unix-alikes as a shared object and on Windows as a DLL (and used to be called a shared library on some Unix-alikes). The concept of a shared library (dynamic library on Mac OS X) as a collection of compiled code to which a package might link is also used, especially for R itself on some platforms.

However, the manual also gives me a little credit.

This is common mis-usage. It seems to stem from S, whose analogues of R's packages were officially known as library sections and later as chapters, but almost always referred to as libraries.

Indeed, it seems like I'm not alone.

It is a little counter-intuitive that you load packages with the library() function. Perhaps this contributes to the persistence of the mis-usage. However, as someone else points out

Even if we don't like the current semantics, the *name* of library() in itself should not be a problem. After all, calling summary() does not imply that your primary argument is a summary, so why should calling library() imply that its primary argument is a "library"?

Even the Quick-R site makes a careful distinction:

Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library.

Thanks to Mark for pointing this out. In the future, I'll definitely be more careful.