--- title: "Statistical Models for 'SciViews::R'" author: "Philippe Grosjean & Guyliann Engels" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 fig_caption: yes vignette: > %\VignetteIndexEntry{Statistical Models for 'SciViews::R'} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.align = 'center', fig.width = 7, fig.height = 5, out.width = "100%") library(modelit) library(tabularise) library(chart) library(dplyr) ``` The {modelit} package enhances statistical models calculated using `lm()`, `glm()` or `nls()` with nice charts and rich-formatted tables. It used both the `fun$type(data = ...., formula)` syntax. It also considers the labels of the variables in its outputs if a `label` attribute is defined. ## Example Load the required packages for this example. ```{r} library(modelit) library(tabularise) library(chart) library(dplyr) ``` The `Loblolly` dataset is used to illustrate some {modelit} features. It focuses on the growth of Loblolly pine trees (*Pinus tadea* L. 1753). The dataset has three variables: `height`, `age`, and the `seed`. ```{r} data("Loblolly", package = "datasets") # convert feet to meters Loblolly$height <- round(Loblolly$height * 0.3048, 2) # Add labels and units Loblolly <- data.io::labelise(Loblolly, label = list(height = "Heigth", age = "Age", Seed = "Seed"), units = list(height = "m", age = "years")) # First and last rows of the dataset with tabularise tabularise$headtail(Loblolly) ``` Here is a plot of the data with `chart()`, automatically using labels and units defined here above: ```{r} chart(Loblolly, height ~ age %col=% Seed) + geom_line() ``` We only keep one measurement per tree (*alias* `Seed`) to get a dataset with independent observations to adjust a simple linear model. ```{r} set.seed(3652) pine <- dplyr::slice_sample(Loblolly, n = 1, by = Seed) chart(pine, height ~ age) + geom_point() ``` We fit a quadratic model to the data. ```{r} pine_lm <- lm(data = pine, height ~ age + I(age^2)) summary(pine_lm) ``` Compare the "raw" summary of the **lm** object above with the version formatted using `tabularise()`. ```{r} summary(pine_lm) |> tabularise() ``` The equation can be directly displayed in an R Markdown or Quarto document with the `equation(pine_lm)` code in an inline R chunk. `r equation(pine_lm)` Several plots are also available with `chart()`, starting with the graph showing the model fitted to the data: ```{r} chart(pine_lm) # Equivalent to : chart$model(pine_lm) ``` The plots of the analysis of the residuals are `chart()` variants that can be indicated in a short form as `chart$()` function and the possible types are: `"resfitted"`, `"qqplot"`, `"scalelocation"`, `"cooksd"`, `"resleverage"`, `"cookleverage"`, `"reshist"`, and `"resautocor"`. A composite plot, incorporating the four most used residual analysis charts, is obtained with the `"residuals"` type (note that the four charts are labeled A-D to facilitate their description in the legend). ```{r} chart$residuals(pine_lm) ``` ## Nonlinear model Other objects are also supported by {modelit}. For instance, a nonlinear model can be fitted to the same dataset. A Gompertz model is more suitable for such growth measurements. ```{r} pine_nls <- nls(data = pine, height ~ SSgompertz(age, a, b1, b2)) summary(pine_nls) |> tabularise() ``` The plot of the model superposed to the data can be obtained once again with `chart()`. ```{r} chart(pine_nls, lang = "fr") ```