The {modelit} package enhances
statistical models calculated using lm()
,
glm()
or nls()
with nice charts and
rich-formatted tables. It used both the
fun$type(data = ...., formula)
syntax. It also considers
the labels of the variables in its outputs if a label
attribute is defined.
Load the required packages for this example.
The Loblolly
dataset is used to illustrate some
{modelit} features. It focuses on the growth of Loblolly pine trees
(Pinus tadea L. 1753). The dataset has three variables:
height
, age
, and the seed
.
data("Loblolly", package = "datasets")
# convert feet to meters
Loblolly$height <- round(Loblolly$height * 0.3048, 2)
# Add labels and units
Loblolly <- data.io::labelise(Loblolly,
label = list(height = "Heigth", age = "Age", Seed = "Seed"),
units = list(height = "m", age = "years"))
# First and last rows of the dataset with tabularise
tabularise$headtail(Loblolly)
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.
Heigth [m] | Age [years] | Seed |
---|---|---|
1.37 | 3 | 301 |
3.32 | 5 | 301 |
8.75 | 10 | 301 |
12.72 | 15 | 301 |
16.06 | 20 | 301 |
... | ... | ... |
2.76 | 5 | 331 |
7.88 | 10 | 331 |
11.93 | 15 | 331 |
14.97 | 20 | 331 |
18.13 | 25 | 331 |
First and last 5 rows of a total of 84 |
Here is a plot of the data with chart()
, automatically
using labels and units defined here above:
We only keep one measurement per tree (alias
Seed
) to get a dataset with independent observations to
adjust a simple linear model.
set.seed(3652)
pine <- dplyr::slice_sample(Loblolly, n = 1, by = Seed)
chart(pine, height ~ age) +
geom_point()
We fit a quadratic model to the data.
pine_lm <- lm(data = pine, height ~ age + I(age^2))
summary(pine_lm)
#>
#> Call:
#> lm(formula = height ~ age + I(age^2), data = pine)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.55076 -0.13703 0.08873 0.13593 0.64271
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.039295 0.289531 -7.043 2.14e-05 ***
#> age 1.134437 0.058449 19.409 7.37e-10 ***
#> I(age^2) -0.012705 0.002095 -6.064 8.14e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.3345 on 11 degrees of freedom
#> Multiple R-squared: 0.9981, Adjusted R-squared: 0.9978
#> F-statistic: 2951 on 2 and 11 DF, p-value: 9.617e-16
Compare the “raw” summary of the lm object above
with the version formatted using tabularise()
.
summary(pine_lm) |>
tabularise()
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.
Linear model | |||||
---|---|---|---|---|---|
| |||||
Term | Estimate | Standard Error | t value | p value | |
| -2.0393 | 0.2895 | -7.04 | 2.14·10-05 | *** |
| 1.1344 | 0.0584 | 19.41 | 7.37·10-10 | *** |
| -0.0127 | 0.0021 | -6.06 | 8.14·10-05 | *** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | |||||
Residuals range: [-0.5508, 0.6427] | |||||
Residuals standard error: 0.3345 on 11 degrees of freedom | |||||
Multiple R2: 0.9981 - adjusted R2: 0.9978 | |||||
F-statistic: 2951 on 2 and 11 df - p value: 9.6175e-16 |
The equation can be directly displayed in an R Markdown or Quarto
document with the equation(pine_lm)
code in an inline R
chunk.
Heigth [m] = α + β1(Age [years]) + β2(Age2 [years2]) + ϵ
Several plots are also available with chart()
, starting
with the graph showing the model fitted to the data:
The plots of the analysis of the residuals are chart()
variants that can be indicated in a short form as
chart$<type>()
function and the possible types are:
"resfitted"
, "qqplot"
,
"scalelocation"
, "cooksd"
,
"resleverage"
, "cookleverage"
,
"reshist"
, and "resautocor"
. A composite plot,
incorporating the four most used residual analysis charts, is obtained
with the "residuals"
type (note that the four charts are
labeled A-D to facilitate their description in the legend).
Other objects are also supported by {modelit}. For instance, a nonlinear model can be fitted to the same dataset. A Gompertz model is more suitable for such growth measurements.
pine_nls <- nls(data = pine, height ~ SSgompertz(age, a, b1, b2))
summary(pine_nls) |>
tabularise()
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.
Nonlinear least squares Gompertz model | |||||
---|---|---|---|---|---|
| |||||
Term | Estimate | Standard Error | tobs. value | p value | |
a | 20.765 | 0.72637 | 28.6 | 1.13·10-11 | *** |
b1 | 3.755 | 0.20549 | 18.3 | 1.40·10-09 | *** |
b2 | 0.876 | 0.00827 | 105.9 | < 2·10-16 | *** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | |||||
Residual standard error : 0.4076 on 11 degrees of freedom | |||||
Number of iterations to convergence : 4 | |||||
Achieved convergence tolerance : 5.673e-06 |
The plot of the model superposed to the data can be obtained once
again with chart()
.