Principal Component Analysis, or
PCA is a widely used method to explore linear associations among
variables of large datasets. There is, unfortunately, no consistent
implementation of this technique in R, which is even more a problem
because the numerous additional R packages that provide enhanced
versions of PCA, or additional tools, have no consistent interfaces. In
the stats package, there are two implementations called
princomp()
and prcomp()
that both create S3
objects of the same name. There are a few methods available, like
print()
, summary()
, plot()
,
predict()
or biplot()
. The whole set is rather
deceptive and produces less interesting plots than other (more
specialized) software can do. For instance, there is nothing to plot the
so-called “graph of the variables” in the French terminology and you
have to program it yourself.
Of course, there are several specialized R packages available that
provide more powerful and/or more extended implementations, among
others: ade4, FactoMineR and
vegan. Each of these packages has a totally different
approach: ade4 creates a c('pca', 'dudi')
S3 object and proposes nice graphs but has an interface that is
completely inconsistent with usual R analyses (no optional formula
interface, exotic names of arguments, non-standard handling of missing
data, etc.). Object orientation and name of objects are obscure and do
not facilitate first use of the PCA in ade4. A PCA is
done, indeed, using the dudi.pca()
function (or possibly,
nipals()
, but that creates a different nipals
object). The same remarks can be made about the interface of functions
in FactoMineR: they use strange arguments and do not
respect the general organization of analyses in R (an object constructs
the analysis, possibly defined using a formula; methods summarize or
plot the results piece by piece). At least, name of function and object
related to PCA are clear in FactoMineR:
PCA()
. There is also a non conventional handling of missing
observations. But the function is powerful and allows for several
investigations around the PCA. In vegan, there is no
PCA function, but a redundancy analysis rda()
, which
reduces to a classical PCA when arguments X =
and
Y =
are missing. It creates a c('cca', 'rda')
S3 object which is not optimized at all for holding pure PCA data (many
unnecessary items in it for a PCA). Finally, labdsv
uses the default prcomp()
, but it wraps it into a
pca
S3 object, in order to define additional plotting
methods that are consistent with the other analyses and objects in that
package. Note that both pca
S3 objects in
ade4 and labdsv are completely
different, and you are likely to get very bad results in case you load
both packages and mix their respective methods!
So, given that chaotic set of PCA functions in R, would it be
possible to design an object with minimal code that reuses code in the
stats package (princomp()
and prcomp()
),
provides a couple of additional methods to make decent variables and
individuals plots (possibly with ellipses or convex hulls for subgroups)
in a way that a whole analysis would be easy to perform and to read in R
code? We have tried to do so in the present SciViews
package.
It would have been nice to name our object pca
. But that
pca
S3 object could not be compatible with
ade4’s pca
object. So, we decided to give
it a different name: pcomp
.
pcomp
objectThe pcomp
S3 object, inherits from pca
and
princomp
. It returns a list with components:
loadings
: (also required for
labdsv’s pca
object). This is
$rotation
in prcomp
, and a
loadings
object in princomp
,
scores
: (also required for labdsv’s
pca
object). Note for scores in princomp
,
components are Comp.1
, Comp.2
, etc., in
prcomp
, they are named PC1
, PC2
,
…, as well as in pca
). So, we keep PC1
,
PC2
, … This is $x
in prcomp
. For
princomp()
, the argument scores = TRUE
(by
default) must be used to get these.
sdev
: (also required for **labdsv*’s
pca
object). princomp()
uses names (to rename
into PC1
, PC2
, …), while prcomp()
does not,
totdev
: the total deviance, as required to be
compliant with labdsv’s pca
object.
n.obs
: the number of observations,
center
: (use 0 for all, if not centered),
scale
: (use 1 for all, if not scaled),
method
: currently only either svd
(and
the computation is the same as prcomp()
), or
eigen
(and the computation is the same as
princomp()
),
call
: the matched call,
na.action
: if relevant.