Statistical Models for ‘SciViews::R’

The {modelit} package enhances statistical models calculated using lm(), glm() or nls() with nice charts and rich-formatted tables. It used both the fun$type(data = ...., formula) syntax. It also considers the labels of the variables in its outputs if a label attribute is defined.

Example

Load the required packages for this example.

library(modelit)
library(tabularise)
library(chart)
library(dplyr)

The Loblolly dataset is used to illustrate some {modelit} features. It focuses on the growth of Loblolly pine trees (Pinus tadea L. 1753). The dataset has three variables: height, age, and the seed.

data("Loblolly", package = "datasets")
# convert feet to meters
Loblolly$height <- round(Loblolly$height * 0.3048, 2)
# Add labels and units
Loblolly <- data.io::labelise(Loblolly,
  label = list(height = "Heigth", age = "Age", Seed = "Seed"),
  units = list(height = "m", age = "years"))

# First and last rows of the dataset with tabularise
tabularise$headtail(Loblolly)
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.

Heigth [m]	Age [years]	Seed
1.37	3	301
3.32	5	301
8.75	10	301
12.72	15	301
16.06	20	301
...	...	...
2.76	5	331
7.88	10	331
11.93	15	331
14.97	20	331
18.13	25	331
First and last 5 rows of a total of 84

Here is a plot of the data with chart(), automatically using labels and units defined here above:

chart(Loblolly, height ~ age %col=% Seed) +
  geom_line()

We only keep one measurement per tree (alias Seed) to get a dataset with independent observations to adjust a simple linear model.

set.seed(3652)
pine <- dplyr::slice_sample(Loblolly, n = 1, by = Seed)
chart(pine, height ~ age) + 
  geom_point()

We fit a quadratic model to the data.

pine_lm <- lm(data = pine, height ~ age + I(age^2))
summary(pine_lm)
#> 
#> Call:
#> lm(formula = height ~ age + I(age^2), data = pine)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.55076 -0.13703  0.08873  0.13593  0.64271 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -2.039295   0.289531  -7.043 2.14e-05 ***
#> age          1.134437   0.058449  19.409 7.37e-10 ***
#> I(age^2)    -0.012705   0.002095  -6.064 8.14e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3345 on 11 degrees of freedom
#> Multiple R-squared:  0.9981, Adjusted R-squared:  0.9978 
#> F-statistic:  2951 on 2 and 11 DF,  p-value: 9.617e-16

Compare the “raw” summary of the lm object above with the version formatted using tabularise().

summary(pine_lm) |> 
  tabularise() 
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.

Linear model
$\operatorname{Heigth \ [m]} = \alpha + \beta_{1}(\operatorname{Age \ [years]}) + \beta_{2}(\operatorname{Age^{2} \ [years^{2}]}) + \epsilon$
Term	Estimate	Standard Error	t value	p value
$\alpha$	-2.0393	0.2895	-7.04	2.14·10-05	***
$\beta_{1}$	1.1344	0.0584	19.41	7.37·10-10	***
$\beta_{2}$	-0.0127	0.0021	-6.06	8.14·10-05	***
0 <= '*' < 0.001 < '' < 0.01 < '*' < 0.05
Residuals range: [-0.5508, 0.6427]
Residuals standard error: 0.3345 on 11 degrees of freedom
Multiple R2: 0.9981 - adjusted R2: 0.9978
F-statistic: 2951 on 2 and 11 df - p value: 9.6175e-16

The equation can be directly displayed in an R Markdown or Quarto document with the equation(pine_lm) code in an inline R chunk.

Heigth [m] = α + β₁(Age [years]) + β₂(Age² [years²]) + ϵ

Several plots are also available with chart(), starting with the graph showing the model fitted to the data:

chart(pine_lm) # Equivalent to : chart$model(pine_lm)

The plots of the analysis of the residuals are chart() variants that can be indicated in a short form as chart$<type>() function and the possible types are: "resfitted", "qqplot", "scalelocation", "cooksd", "resleverage", "cookleverage", "reshist", and "resautocor". A composite plot, incorporating the four most used residual analysis charts, is obtained with the "residuals" type (note that the four charts are labeled A-D to facilitate their description in the legend).

chart$residuals(pine_lm)

Nonlinear model

Other objects are also supported by {modelit}. For instance, a nonlinear model can be fitted to the same dataset. A Gompertz model is more suitable for such growth measurements.

pine_nls <- nls(data = pine, height ~ SSgompertz(age, a, b1, b2))
summary(pine_nls) |> 
  tabularise()
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.

Nonlinear least squares Gompertz model
$\begin{aligned} \operatorname{height} = a \cdot e^{(-b1 \cdot b2^{\operatorname{age}})} + \epsilon \end{aligned}$
Term	Estimate	Standard Error	tobs. value	p value
a	20.765	0.72637	28.6	1.13·10-11	***
b1	3.755	0.20549	18.3	1.40·10-09	***
b2	0.876	0.00827	105.9	< 2·10-16	***
0 <= '*' < 0.001 < '' < 0.01 < '*' < 0.05
Residual standard error : 0.4076 on 11 degrees of freedom
Number of iterations to convergence : 4
Achieved convergence tolerance : 5.673e-06

The plot of the model superposed to the data can be obtained once again with chart().

chart(pine_nls, lang = "fr")

- Example
- Nonlinear model