--- title: "Introduction to {equatiomatic}" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to {equatiomatic}} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(equatiomatic) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Background and Motivation If you use R to statistically analyze your data, you might be used to seeing and interpreting the output from functions for models, like `lm()` and `glm()`. For example, here is the code and output for a single regression model, fit using the `lm()` function. We'll examine how the depth of penguin's bills relates to their bill length using data from the [{palmerpenguins}](https://github.com/allisonhorst/palmerpenguins) package: ```{r, warning = FALSE, results = 'markup'} lm(bill_length_mm ~ bill_depth_mm, penguins) ``` At the same time, you might have come across---or written!---equations that appear in books, journal articles, and reports. Equations that look like this: ```{r, echo = FALSE} library(equatiomatic) extract_eq(lm(bill_length_mm ~ bill_depth_mm, penguins)) ``` The purpose of [{equatiomatic}](https://datalorax.github.io/equatiomatic/) is to help you go from model output, like the `lm()` output above, to the equation we just saw. As we detail in this vignette, {equatiomatic} provides the underlying equation corresponding to the statistical model output. It does this through the use of the [TeX](https://en.wikipedia.org/wiki/TeX) equation formatting system, which can then be included in documents written in a number of formats, including (importantly) R Markdown documents rendered to either HTML or PDF formats. # Example Usage ## A basic example for a multiple linear regression Let's look at another basic example, again using the {palmerpenguins} data: ```{r fit-m1} library(equatiomatic) # fit a basic multiple linear regression model m <- lm(bill_length_mm ~ bill_depth_mm + flipper_length_mm, penguins) ``` Now we can pull the TeX code with `extract_eq` ```{r extract_eq1} extract_eq(m) ``` You can of course set `echo = FALSE` as well, and then you'll get just the equation, which will look like the below. ```{r extract_eq1-no-echo, echo = FALSE} extract_eq(m) ``` Let's dive into a bit more complex example. ## An example with a slightly more complicated model The above was super simple, but---often---you have categorical variables with lots of levels and interactions, which can make the output of your models (and writing a model equation for your models) a bit more complicated. {equatiomatic} is intended to "smooth out" some of these issues, requiring the same process as in the above example to extract (and present) the model's equation: For example, here is an example using a categorical variable (`island`) in an interaction with the `bill_depth_mm` variable we used in the example above: ```{r eq2} m2 <- lm(bill_length_mm ~ bill_depth_mm * island, penguins) extract_eq(m2) ``` Sometimes, for such models, the equations can get overly long. That's where the `wrap` and `terms_per_line` arguments come in. ```{r eq2-wrap} extract_eq(m2, wrap = TRUE) # default terms_per_line = 4 extract_eq(m2, wrap = TRUE, terms_per_line = 2) ``` Maybe you want different intercept notation, such as $\beta_0$? Simply pass "beta" to the `intercept` argument, as follows. ```{r eq2-intercept-beta} extract_eq(m2, wrap = TRUE, intercept = "beta") ``` We also wrap all the variable names in `\operatorname` by default so they show up as plain text, but if you'd like your variable names to be italicized just set `ital_vars = TRUE`. ```{r eq2-ital-vars} extract_eq(m2, wrap = TRUE, ital_vars = TRUE) ``` ## Raw TeX code Currently, the `intercept` argument defaults to `"alpha"` and only takes one additional argument, `"beta"`. However, the `raw_tex` and `greek` arguments allows you to specify whatever syntax you would like both for the intercept and for the coefficients. For example ```{r raw-tex} extract_eq(m2, wrap = TRUE, intercept = "\\hat{\\phi}", greek = "\\hat{\\gamma}", raw_tex = TRUE ) ``` ## Coefficients In many cases, you're interested more in including the coefficient estimates (e.g., `3.04`), than the Greek symbols (e.g., $\\beta\_1$). This may be helpful for communicating the results of a model (and, possibly, for teaching about the statistical model). To do this, simply change the `use_coefs` argument: ```{r use_coefs} extract_eq(m2, wrap = TRUE, use_coefs = TRUE) ``` # Other models While the above examples focused on regression models (and the `lm()` function), {equatiomatic} supports output from other model types as well, specifically: ```{r echo = FALSE} supported <- data.frame( model = c( "linear regression", "logistic regression", "probit regression", "ordinal logistic regression", "ordinal probit regression", "auto-regressive integrated moving average", "regression with auto-regressive integrated moving average errors" ), packages = c( "`stats::lm`", "`stats::glm(family = binomial(link = 'logit'))`", "`stats::glm(family = binomial(link = 'probit'))`", "`MASS::polr(method = 'logistic')`; `ordinal::clm(link = 'logit')`", "`MASS::polr(method = 'probit')`; `ordinal::clm(link = 'probit')`", "`forecast::Arima`", "`forecast::Arima`" ) ) knitr::kable(supported, col.names = c("Model", "Packages/Functions")) ``` Here are a few basic examples using these supported model types: ## Logistic Regression ```{r log-reg1} lr <- glm(sex ~ species * bill_length_mm, data = penguins, family = binomial(link = "logit") ) extract_eq(lr, wrap = TRUE) ``` You can also (optionally) show the how the data are assumed to be distributed. ```{r log-reg2} extract_eq(lr, wrap = TRUE, show_distribution = TRUE) ``` ## Probit Regression Probit regression works similarly to logistic regression: ```{r prob-reg1} pr <- glm(sex ~ species * bill_length_mm, data = penguins, family = binomial(link = "probit") ) extract_eq(pr, wrap = TRUE) ``` Again, you can (optionally) show the how the data are assumed distributed. ```{r prob-reg2} extract_eq(pr, wrap = TRUE, show_distribution = TRUE) ``` ## Ordinal Regression For these examples, we'll use wine tasting data from the [{ordinal}](https://github.com/runehaubo/ordinal) package. If you don't already have this package, you can download it with ```{r install-ordinal, eval = FALSE} install.packages("ordinal") ``` The data look like this ```{r wine-data, echo = FALSE} knitr::kable(head(ordinal::wine), align = "l") ``` We'll fit a model from the documentation, with the ordinal rating response predicted by an interaction between the temperature and and contact. You can fit ordinal response models with either `MASS::polr` or `ordinal::clm`, and {equatiomatic} works with either, and with logistic or probit link functions. ### {MASS} #### Logit method ```{r mass-ologit} mass_ologit <- MASS::polr(rating ~ temp * contact, data = ordinal::wine ) extract_eq(mass_ologit, wrap = TRUE, terms_per_line = 2) ``` #### Probit method ```{r mass-oprobit} mass_oprobit <- MASS::polr(rating ~ temp * contact, data = ordinal::wine, method = "probit" ) extract_eq(mass_oprobit, wrap = TRUE, terms_per_line = 2) ``` ### {ordinal} And we can do the same thing with the {ordinal} package #### Logit method ```{r ordinal-ologit} ordinal_ologit <- ordinal::clm(rating ~ temp * contact, data = ordinal::wine, link = "logit" ) extract_eq(ordinal_ologit, wrap = TRUE, terms_per_line = 2) ``` #### Probit method ```{r ordinal-oprobit} ordinal_probit <- ordinal::clm(rating ~ temp * contact, data = ordinal::wine, link = "probit" ) extract_eq(ordinal_probit, wrap = TRUE, terms_per_line = 2) ``` # Things the package does not yet do We are aware of a few things the package doesn't yet do, but that we hope to add later. These include: - Math functions (e.g., `log`, `exp`, `sqrt`) - Polynomial (e.g., `lm(y ~ poly(x, 3))`) - A range of other models, including multi-level (or mixed effects) models Regarding this last point, we are hopeful that we can incorporate essentially all the models covered by [broom](https://broom.tidymodels.org). Multilevel models are particularly high on our wish list. But we have not yet had the time to develop these. # Contributing We would LOVE to have you as a contributor! Is there a model that we don't currently fit that you want implemented? There are a few ways to go about this. You can either (a) fork the repository and implement the method on your own, then submit a PR, or (b) file an issue. If you file an issue it would be *really* helpful if you could provide an example of a fitted model and what the equation for that model should look like. We will try to get to these as soon as possible. Also, the next planned vignette (at this particular moment) is on contributing to the package, with a step-by-step example of implementing a new method. So stay tuned for that, if you're interested in (a) but not yet sure how to get started.