The {modelit} package enhances
statistical models calculated using lm(),
glm() or nls() with nice charts and
rich-formatted tables. It used both the
fun$type(data = ...., formula) syntax. It also considers
the labels of the variables in its outputs if a label
attribute is defined.
Load the required packages for this example.
The Loblolly dataset is used to illustrate some
{modelit} features. It focuses on the growth of Loblolly pine trees
(Pinus tadea L. 1753). The dataset has three variables:
height, age, and the seed.
data("Loblolly", package = "datasets")
# convert feet to meters
Loblolly$height <- round(Loblolly$height * 0.3048, 2)
# Add labels and units
Loblolly <- svBase::labelise(Loblolly,
label = list(height = "Heigth", age = "Age", Seed = "Seed"),
units = list(height = "m", age = "years"))
# First and last rows of the dataset with tabularise
tabularise$headtail(Loblolly)Heigth [m] | Age [years] | Seed |
|---|---|---|
1.37 | 3 | 301 |
3.32 | 5 | 301 |
8.75 | 10 | 301 |
12.72 | 15 | 301 |
16.06 | 20 | 301 |
... | ... | ... |
2.76 | 5 | 331 |
7.88 | 10 | 331 |
11.93 | 15 | 331 |
14.97 | 20 | 331 |
18.13 | 25 | 331 |
First and last 5 rows of a total of 84 | ||
Here is a plot of the data with chart(), automatically
using labels and units defined here above:
We only keep one measurement per tree (alias
Seed) to get a dataset with independent observations to
adjust a simple linear model.
set.seed(3652)
pine <- dplyr::slice_sample(Loblolly, n = 1, by = Seed)
chart(pine, height ~ age) +
geom_point()We fit a quadratic model to the data.
pine_lm <- lm(data = pine, height ~ age + I(age^2))
summary(pine_lm)
#>
#> Call:
#> lm(formula = height ~ age + I(age^2), data = pine)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.55076 -0.13703 0.08873 0.13593 0.64271
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.039295 0.289531 -7.043 2.14e-05 ***
#> age 1.134437 0.058449 19.409 7.37e-10 ***
#> I(age^2) -0.012705 0.002095 -6.064 8.14e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.3345 on 11 degrees of freedom
#> Multiple R-squared: 0.9981, Adjusted R-squared: 0.9978
#> F-statistic: 2951 on 2 and 11 DF, p-value: 9.617e-16Compare the “raw” summary of the lm object above
with the version formatted using tabularise().
summary(pine_lm) |>
tabularise()
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.Linear model | |||||
|---|---|---|---|---|---|
| |||||
Term | Estimate | Standard Error | t value | p value | |
| -2.0393 | 0.2895 | -7.04 | 2.14·10-05 | *** |
| 1.1344 | 0.0584 | 19.41 | 7.37·10-10 | *** |
| -0.0127 | 0.0021 | -6.06 | 8.14·10-05 | *** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | |||||
Residuals range: [-0.5508, 0.6427] | |||||
The equation can be directly displayed in an R Markdown or Quarto
document with the equation(pine_lm) code in an inline R
chunk.
\[ \operatorname{Heigth \ [m]} = \alpha + \beta_{1}(\operatorname{Age \ [years]}) + \beta_{2}(\operatorname{Age^{2} \ [years^{2}]}) + \epsilon \]
Several plots are also available with chart(), starting
with the graph showing the model fitted to the data:
chart(pine_lm) # Equivalent to : chart$model(pine_lm)
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> ℹ Please use tidy evaluation idioms with `aes()`.
#> ℹ See also `vignette("ggplot2-in-packages")` for more information.
#> ℹ The deprecated feature was likely used in the modelit package.
#> Please report the issue at <https://github.com/SciViews/modelit/issues>.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.The plots of the analysis of the residuals are chart()
variants that can be indicated in a short form as
chart$<type>() function and the possible types are:
"resfitted", "qqplot",
"scalelocation", "cooksd",
"resleverage", "cookleverage",
"reshist", and "resautocor". A composite plot,
incorporating the four most used residual analysis charts, is obtained
with the "residuals" type (note that the four charts are
labeled A-D to facilitate their description in the legend).
chart$residuals(pine_lm)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> ℹ The deprecated feature was likely used in the modelit package.
#> Please report the issue at <https://github.com/SciViews/modelit/issues>.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.Other objects are also supported by {modelit}. For instance, a nonlinear model can be fitted to the same dataset. A Gompertz model is more suitable for such growth measurements.
pine_nls <- nls(data = pine, height ~ SSgompertz(age, a, b1, b2))
summary(pine_nls) |>
tabularise()
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.Nonlinear least squares Gompertz model | |||||
|---|---|---|---|---|---|
| |||||
Term | Estimate | Standard Error | tobs. value | p value | signif |
a | 20.765 | 0.72637 | 28.6 | 1.13·10-11 | *** |
b1 | 3.755 | 0.20549 | 18.3 | 1.40·10-09 | *** |
b2 | 0.876 | 0.00827 | 105.9 | < 2·10-16 | *** |
0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05 | |||||
Residuals standard error: 0.4076 on 11 degrees of freedom | |||||
Number of iterations to convergence: 4 | |||||
The plot of the model superposed to the data can be obtained once
again with chart().