Statistical Models for ‘SciViews::R’

The {modelit} package enhances statistical models calculated using lm(), glm() or nls() with nice charts and rich-formatted tables. It used both the fun$type(data = ...., formula) syntax. It also considers the labels of the variables in its outputs if a label attribute is defined.

Example

Load the required packages for this example.

library(modelit)
library(tabularise)
library(chart)
library(dplyr)

The Loblolly dataset is used to illustrate some {modelit} features. It focuses on the growth of Loblolly pine trees (Pinus tadea L. 1753). The dataset has three variables: height, age, and the seed.

data("Loblolly", package = "datasets")
# convert feet to meters
Loblolly$height <- round(Loblolly$height * 0.3048, 2)
# Add labels and units
Loblolly <- data.io::labelise(Loblolly,
  label = list(height = "Heigth", age = "Age", Seed = "Seed"),
  units = list(height = "m", age = "years"))

# First and last rows of the dataset with tabularise
tabularise$headtail(Loblolly)
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.

Heigth [m]

Age [years]

Seed

1.37

3

301

3.32

5

301

8.75

10

301

12.72

15

301

16.06

20

301

...

...

...

2.76

5

331

7.88

10

331

11.93

15

331

14.97

20

331

18.13

25

331

First and last 5 rows of a total of 84

Here is a plot of the data with chart(), automatically using labels and units defined here above:

chart(Loblolly, height ~ age %col=% Seed) +
  geom_line()

We only keep one measurement per tree (alias Seed) to get a dataset with independent observations to adjust a simple linear model.

set.seed(3652)
pine <- dplyr::slice_sample(Loblolly, n = 1, by = Seed)
chart(pine, height ~ age) + 
  geom_point()

We fit a quadratic model to the data.

pine_lm <- lm(data = pine, height ~ age + I(age^2))
summary(pine_lm)
#> 
#> Call:
#> lm(formula = height ~ age + I(age^2), data = pine)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.55076 -0.13703  0.08873  0.13593  0.64271 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -2.039295   0.289531  -7.043 2.14e-05 ***
#> age          1.134437   0.058449  19.409 7.37e-10 ***
#> I(age^2)    -0.012705   0.002095  -6.064 8.14e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3345 on 11 degrees of freedom
#> Multiple R-squared:  0.9981, Adjusted R-squared:  0.9978 
#> F-statistic:  2951 on 2 and 11 DF,  p-value: 9.617e-16

Compare the “raw” summary of the lm object above with the version formatted using tabularise().

summary(pine_lm) |> 
  tabularise() 
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.

Linear model

Heigth [m]=α+β1(Age [years])+β2(Age2 [years2])+ϵ\operatorname{Heigth \ [m]} = \alpha + \beta_{1}(\operatorname{Age \ [years]}) + \beta_{2}(\operatorname{Age^{2} \ [years^{2}]}) + \epsilon

Term

Estimate

Standard Error

t value

p value

α\alpha

-2.0393

0.2895

-7.04

2.14·10-05

***

β1\beta_{1}

1.1344

0.0584

19.41

7.37·10-10

***

β2\beta_{2}

-0.0127

0.0021

-6.06

8.14·10-05

***

0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05

Residuals range: [-0.5508, 0.6427]

Residuals standard error: 0.3345 on 11 degrees of freedom

Multiple R2: 0.9981 - adjusted R2: 0.9978

F-statistic: 2951 on 2 and 11 df - p value: 9.6175e-16

The equation can be directly displayed in an R Markdown or Quarto document with the equation(pine_lm) code in an inline R chunk.

Heigth [m] = α + β1(Age [years]) + β2(Age2 [years2]) + ϵ

Several plots are also available with chart(), starting with the graph showing the model fitted to the data:

chart(pine_lm) # Equivalent to : chart$model(pine_lm)

The plots of the analysis of the residuals are chart() variants that can be indicated in a short form as chart$<type>() function and the possible types are: "resfitted", "qqplot", "scalelocation", "cooksd", "resleverage", "cookleverage", "reshist", and "resautocor". A composite plot, incorporating the four most used residual analysis charts, is obtained with the "residuals" type (note that the four charts are labeled A-D to facilitate their description in the legend).

chart$residuals(pine_lm)

Nonlinear model

Other objects are also supported by {modelit}. For instance, a nonlinear model can be fitted to the same dataset. A Gompertz model is more suitable for such growth measurements.

pine_nls <- nls(data = pine, height ~ SSgompertz(age, a, b1, b2))
summary(pine_nls) |> 
  tabularise()
#> Warning in set2(resolve(...)): The object is read-only and cannot be modified.
#> If you have to modify it for a legitimate reason, call the method $lock(FALSE)
#> on the object before $set(). Using $lock(FALSE) to modify the object will be
#> enforced in future versions of knitr and this warning will become an error.

Nonlinear least squares Gompertz model

height=ae(b1b2age)+ϵ\begin{aligned} \operatorname{height} = a \cdot e^{(-b1 \cdot b2^{\operatorname{age}})} + \epsilon \end{aligned}

Term

Estimate

Standard Error

tobs. value

p value

a

20.765

0.72637

28.6

1.13·10-11

***

b1

3.755

0.20549

18.3

1.40·10-09

***

b2

0.876

0.00827

105.9

< 2·10-16

***

0 <= '***' < 0.001 < '**' < 0.01 < '*' < 0.05

Residual standard error : 0.4076 on 11 degrees of freedom

Number of iterations to convergence : 4

Achieved convergence tolerance : 5.673e-06

The plot of the model superposed to the data can be obtained once again with chart().

chart(pine_nls, lang = "fr")