Title: | Transform Models into 'LaTeX' Equations |
---|---|
Description: | The goal of 'equatiomatic' is to reduce the pain associated with writing 'LaTeX' formulas from fitted models. The primary function of the package, extract_eq(), takes a fitted model object as its input and returns the corresponding 'LaTeX' code for the model. |
Authors: | Daniel Anderson [aut] , Andrew Heiss [aut] , Jay Sumners [aut], Joshua Rosenberg [ctb] , Jonathan Sidi [ctb] , Ellis Hughes [ctb] , Thomas Fung [ctb] , Reza Norouzian [ctb] , Indrajeet Patil [ctb] (<https://orcid.org/0000-0003-1995-6531>, @patilindrajeets), Quinn White [ctb] , David Kane [ctb], Philippe Grosjean [cre] |
Maintainer: | Philippe Grosjean <[email protected]> |
License: | CC BY 4.0 |
Version: | 0.3.4 |
Built: | 2024-12-22 05:05:08 UTC |
Source: | https://github.com/datalorax/equatiomatic |
Arrest data from Gelman & Hill's book, used in Chapter 6 (and others). The data have been aggregated by precinct and race/ethnicity, with the sum of prior arrests and stops calculated. You can download the original data here: http://www.stat.columbia.edu/~gelman/arm/examples/police/
arrests
arrests
A tibble with 225 rows and 4 variables:
An integer denoting the precinct identification number.
A factor with the coded race/ethnicity
The number of police stops
The number of prior arrests (this is used as an offset variable in the book)
Extract the variable names from a model to produce a 'LaTeX' equation.
Supports any model where there is a broom::tidy()
method. This is a generic
function with methods for lmerMod objects obtained with lme4::lmer()
,
glmerMod objects with lme4::glmer()
, forecast_ARIMA with
forecast::Arima()
and default, with the later further covering most "base"
R models implemented in broom::tidy()
like lm objects with stats::lm()
,
glm objects with stats::glm()
or polr objects with MASS::polr()
. The
default method also supports clm objects obtained with ordinal::clm()
.
extract_eq( model, intercept = "alpha", greek = "beta", greek_colors = NULL, subscript_colors = NULL, var_colors = NULL, var_subscript_colors = NULL, raw_tex = FALSE, swap_var_names = NULL, swap_subscript_names = NULL, ital_vars = FALSE, label = NULL, index_factors = FALSE, show_distribution = FALSE, wrap = FALSE, terms_per_line = 4, operator_location = "end", align_env = "aligned", use_coefs = FALSE, coef_digits = 2, fix_signs = TRUE, font_size = NULL, mean_separate = NULL, return_variances = FALSE, se_subscripts = FALSE, ... )
extract_eq( model, intercept = "alpha", greek = "beta", greek_colors = NULL, subscript_colors = NULL, var_colors = NULL, var_subscript_colors = NULL, raw_tex = FALSE, swap_var_names = NULL, swap_subscript_names = NULL, ital_vars = FALSE, label = NULL, index_factors = FALSE, show_distribution = FALSE, wrap = FALSE, terms_per_line = 4, operator_location = "end", align_env = "aligned", use_coefs = FALSE, coef_digits = 2, fix_signs = TRUE, font_size = NULL, mean_separate = NULL, return_variances = FALSE, se_subscripts = FALSE, ... )
model |
A fitted model |
intercept |
How should the intercept be displayed? Default is |
greek |
What notation should be used for
coefficients? Currently only accepts |
greek_colors |
The colors of the greek notation in the equation. Must be a single color (named or HTML hex code) or a vector of colors (which will be recycled if smaller than the number of terms in the model). When rendering to PDF, I suggest using HTML hex codes, as not all named colors are recognized by LaTeX, but equatiomatic will internally create the color definitions for you if HTML codes are supplied. Note that this is not yet implemented for mixed effects models (lme4). |
subscript_colors |
The colors of the subscripts for the greek notation.
The argument structure is equivalent to |
var_colors |
The color of the variable names. This takes a named vector
of the form |
var_subscript_colors |
The colors of the factor subscripts for
categorical variables. The interface for this is equivalent to
|
raw_tex |
Logical. Is the greek code being passed to denote coefficients raw tex code? |
swap_var_names |
A vector of the form c("old_var_name" = "new name"). For example: c("bill_length_mm" = "Bill Length (MM)"). |
swap_subscript_names |
A vector of the form c("old_subscript_name" = "new name"). For example: c("f" = "Female"). |
ital_vars |
Logical, defaults to |
label |
A label for the equation, which can then be used for in-text
references. See example here.
Note that this only works for PDF output. The in-text references also
must match the label exactly, and must be formatted as
|
index_factors |
Logical, defaults to |
show_distribution |
Logical. When fitting a logistic or probit
regression, should the binomial distribution be displayed? Defaults to
|
wrap |
Logical, defaults to |
terms_per_line |
Integer, defaults to 4. The number of right-hand side
terms to include per line. Used only when |
operator_location |
Character, one of “end” (the default) or
“start”. When terms are split across multiple lines, they are split
at mathematical operators like |
align_env |
TeX environment to wrap around equation. Must be one of
|
use_coefs |
Logical, defaults to |
coef_digits |
Integer, defaults to 2. The number of decimal places to round to when displaying model estimates. |
fix_signs |
Logical, defaults to |
font_size |
The font size of the equation. Defaults to default of the output format. Takes any of the standard LaTeX arguments (see here). |
mean_separate |
Currently only support for |
return_variances |
Logical. When |
se_subscripts |
Logical. If |
... |
Additional arguments (for future development; not currently used). |
The different methods all use the same arguments, but not all arguments are suitable to all models. Check here above to determine if a feature is implemented for a given model.
A character of class “equation”.
# Simple model mod1 <- lm(mpg ~ cyl + disp, mtcars) extract_eq(mod1) # Include all variables mod2 <- lm(mpg ~ ., mtcars) extract_eq(mod2) # Works for categorical variables too, putting levels as subscripts mod3 <- lm(body_mass_g ~ bill_length_mm + species, penguins) extract_eq(mod3) set.seed(8675309) d <- data.frame( cat1 = rep(letters[1:3], 100), cat2 = rep(LETTERS[1:3], each = 100), cont1 = rnorm(300, 100, 1), cont2 = rnorm(300, 50, 5), out = rnorm(300, 10, 0.5) ) mod4 <- lm(out ~ ., d) extract_eq(mod4) # Don't italicize terms extract_eq(mod1, ital_vars = FALSE) # Wrap equations in an "aligned" environment extract_eq(mod2, wrap = TRUE) # Wider equation wrapping extract_eq(mod2, wrap = TRUE, terms_per_line = 4) # Include model estimates instead of Greek letters extract_eq(mod2, wrap = TRUE, terms_per_line = 2, use_coefs = TRUE) # Don't fix doubled-up "+ -" signs extract_eq(mod2, wrap = TRUE, terms_per_line = 4, use_coefs = TRUE, fix_signs = FALSE) # Use indices for factors instead of subscripts extract_eq(mod2, wrap = TRUE, terms_per_line = 4, index_factors = TRUE) # Use other model types, like glm set.seed(8675309) d <- data.frame( out = sample(0:1, 100, replace = TRUE), cat1 = rep(letters[1:3], 100), cat2 = rep(LETTERS[1:3], each = 100), cont1 = rnorm(300, 100, 1), cont2 = rnorm(300, 50, 5) ) mod5 <- glm(out ~ ., data = d, family = binomial(link = "logit")) extract_eq(mod5, wrap = TRUE)
# Simple model mod1 <- lm(mpg ~ cyl + disp, mtcars) extract_eq(mod1) # Include all variables mod2 <- lm(mpg ~ ., mtcars) extract_eq(mod2) # Works for categorical variables too, putting levels as subscripts mod3 <- lm(body_mass_g ~ bill_length_mm + species, penguins) extract_eq(mod3) set.seed(8675309) d <- data.frame( cat1 = rep(letters[1:3], 100), cat2 = rep(LETTERS[1:3], each = 100), cont1 = rnorm(300, 100, 1), cont2 = rnorm(300, 50, 5), out = rnorm(300, 10, 0.5) ) mod4 <- lm(out ~ ., d) extract_eq(mod4) # Don't italicize terms extract_eq(mod1, ital_vars = FALSE) # Wrap equations in an "aligned" environment extract_eq(mod2, wrap = TRUE) # Wider equation wrapping extract_eq(mod2, wrap = TRUE, terms_per_line = 4) # Include model estimates instead of Greek letters extract_eq(mod2, wrap = TRUE, terms_per_line = 2, use_coefs = TRUE) # Don't fix doubled-up "+ -" signs extract_eq(mod2, wrap = TRUE, terms_per_line = 4, use_coefs = TRUE, fix_signs = FALSE) # Use indices for factors instead of subscripts extract_eq(mod2, wrap = TRUE, terms_per_line = 4, index_factors = TRUE) # Use other model types, like glm set.seed(8675309) d <- data.frame( out = sample(0:1, 100, replace = TRUE), cat1 = rep(letters[1:3], 100), cat2 = rep(LETTERS[1:3], each = 100), cont1 = rnorm(300, 100, 1), cont2 = rnorm(300, 50, 5) ) mod5 <- glm(out ~ ., data = d, family = binomial(link = "logit")) extract_eq(mod5, wrap = TRUE)
Format 'LaTeX' equations built with extract_eq
.
## S3 method for class 'equation' format(x, ..., latex = knitr::is_latex_output())
## S3 method for class 'equation' format(x, ..., latex = knitr::is_latex_output())
x |
'LaTeX' equation built with |
... |
not used |
latex |
Logical, whether the output is LaTeX or not. The default
value uses |
A character string with the equation formatted either as proper
LaTeX code, or as a display equation tag (surrounded by $$...$$
) for R
Markdown or Quarto documents.
This is the dataset used throughout Raudenbush & Bryk (2002).
hsb
hsb
A tibble with 7185 rows and 8 variables:
An integer denoting the school identification number. There are 160 unique schools.
Individual students' math score.
The number of students in the school.
A dummy variable (integer) denoting whether the school is public (sector = 0) or catholic (sector = 1). There are 90 public schools and 70 catholic.
A group-mean centered SES variable at the school level
A dummy variable indicating if the student was coded as white (minority = 0) or not (minority = 1).
A dummy variable indicating if the student was coded as female (female = 1) or not (female = 0).
A student-level composite variable indicating the students' socio-economic status.
Print 'LaTeX' equations built with extract_eq
nicely in R Markdown environments.
## S3 method for class 'equation' knit_print( x, ..., tex_packages = "\\renewcommand*\\familydefault{\\rmdefault}" )
## S3 method for class 'equation' knit_print( x, ..., tex_packages = "\\renewcommand*\\familydefault{\\rmdefault}" )
x |
'LaTeX' equation built with |
... |
not used |
tex_packages |
A string with LaTeX code to include in the header, usually to include LaTeX packages in the output. |
A string with the equation formatted according to R Markdown's output format (different output for HTML, PDF, docx, gfm, markdown_strict). The format is detected automatically, so, you do not have to worry about it.
Data originally from palmerpenguins
. Includes
measurements for penguin species, island in Palmer Archipelago,
size (flipper length, body mass, bill dimensions), and sex.
penguins
penguins
A tibble with 344 rows and 8 variables:
a factor denoting penguin species (Adélie, Chinstrap and Gentoo)
a factor denoting island in Palmer Archipelago, Antarctica (Biscoe, Dream or Torgersen)
a number denoting bill length (millimeters)
a number denoting bill depth (millimeters)
an integer denoting flipper length (millimeters)
an integer denoting body mass (grams)
a factor denoting penguin sex (female, male)
an integer denoting the study year (2007, 2008, or 2009)
Adélie penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Adélie penguins (Pygoscelis adeliae) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative doi:10.6073/pasta/98b16d7d563f265cb52372c8ca99e60f
Gentoo penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Gentoo penguin (Pygoscelis papua) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 5. Environmental Data Initiative doi:10.6073/pasta/7fca67fb28d56ee2ffa3d9370ebda689
Chinstrap penguins: Palmer Station Antarctica LTER and K. Gorman. 2020. Structural size measurements and isotopic signatures of foraging among adult male and female Chinstrap penguin (Pygoscelis antarcticus) nesting along the Palmer Archipelago near Palmer Station, 2007-2009 ver 6. Environmental Data Initiative doi:10.6073/pasta/c14dfcfada8ea13a17536e73eb6fbe9e
Originally published in: Gorman KB, Williams TD, Fraser WR (2014) Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS ONE 9(3): e90081. doi:10.1371/journal.pone.0090081
This is the dataset used in Gelman & Hill's book, Data Analysis Using Regression and Multilevel/Hierarchical Models. They are polling data on the presidential election from 1988, collected one week before the election. You can download all the data from the book here: http://www.stat.columbia.edu/~gelman/arm/examples/ARM_Data.zip. Note that this is only a few of the variables from the original data supplied with the book.
polls
polls
A tibble with 13,544 rows and 7 variables:
An integer denoting the state identification number.
An ordered factor stating the education level of the respondent
An unordered factor stating the age of range of the respondent
A dummy variable (integer) denoting whether the respondent was coded as male (female = 0) or female (female = 1).
A dummy variable (integer) denoting whether the respondent was coded as Black (black = 1) or not Black (black = 0).
A sampling weight
Whether the respondent stated they were in favor of voting for George Bush Sr.
Print 'LaTeX' equations built with extract_eq
.
## S3 method for class 'equation' print(x, ...)
## S3 method for class 'equation' print(x, ...)
x |
'LaTeX' equation built with |
... |
not used |
The unmodified object 'x' is returned invisibly. The function is used for its side effect of printing the equation.
These are a set of functions designed to help render equations in shiny applications (see the vignette about Shiny).
renderEq(expr, env = parent.frame(), quoted = FALSE, outputArgs = list()) eqOutput(outputId)
renderEq(expr, env = parent.frame(), quoted = FALSE, outputArgs = list()) eqOutput(outputId)
expr |
An R expression, specifically a call to |
env |
The environment |
quoted |
Is the expresion quoted? |
outputArgs |
list of output arguments |
outputId |
The identifier of the output from the server. Should be passed as a string. |
Render the equation in a suitable way for Shiny for renderEq()
in
an eqOutput()
equation output element that can be included in a panel.
renderEq()
: Rendering function
eqOutput()
: Output function
Data are simulated to be similar to longitudinal data collected within schools/districts.
sim_longitudinal
sim_longitudinal
A tibble with 1000 rows and 8 variables:
An integer denoting the individual student. There are 100 students.
An integer denoting the school There are 15 schools.
An integer denoting the school district. There are 5 districts.
A character variable denoting the instructional level of the student, low, medium, or high.
A factor indicating whether the student received the intervention treatment (0 = no treatment received; 1 = treatment received).
The proportion of student in the school in the low instructional group.
The assessment wave. Each student has nine waves of data collection
The individual students' score at the given wave.
Output from set.seed(42); simple_ts <- ts(rnorm(1000),freq = 4)
.
This is included primarily for unit testing.
simple_ts
simple_ts
A tibble with 1000 rows and 8 variables:
First quarter simulated values.
Second quarter simulated values.
Third quarter simulated values.
Fourth quarter simulated values.
Output from set.seed(42); ts_reg_list <- list(x1 = rnorm(1000), x2 = rnorm(1000), ts_rnorm = rnorm(1000))
.
ts_reg_list
ts_reg_list
A tibble with 1000 rows and 8 variables:
Random normal simulated data.
Random normal simulated data.
Random normal simulated data.