Package 'svBase' reference manual

Title:	Base Objects like Data Frames for 'SciViews::R'
Description:	Functions to manipulated the three main classes of "data frames" for 'SciViews::R': data.frame, data.table and tibble. Allow to select the preferred one, and to convert more carefully between the three, taking care of correct presentation of row names and data.table's keys. More homogeneous way of creating these three data frames and of printing them on the R console.
Authors:	Philippe Grosjean [aut, cre]
Maintainer:	Philippe Grosjean <[email protected]>
License:	MIT + file LICENSE
Version:	1.4.0
Built:	2025-04-01 04:32:41 UTC
Source:	https://github.com/SciViews/svBase

Base Objects like Data Frames for 'SciViews::R'

Description

The {svBase} package sets up the way data frames (with objects like R base's data.frame, data.table and tibble tbl_df) are managed in SciViews::R. The user can select the class of object it uses by default and many other SciViews::R functions return that format. Conversion from one to the other is made easier, including for the management of data.frame's row names or data.table's keys. Also homogeneous ways to create a data frame or to print it are also provided.

Important functions

dtx() creates a data frame in the preferred format, with the following functions dtbl(), dtf() and dtt() that force respectively the creation of a data frame in one of the specified three formats. Use getOption("SciViews.as_dtx", default = as_dtt) to specify which function to use to convert into the preferred format.

Alternate assignment (multiple and/or collect results from dplyr)

Description

These alternate assignment operators can be used to perform multiple assignment (also known as destructuring assignment). These are imported from the {zeallot} package (see the corresponding help page at zeallot::operator for complete description). They also performs a dplyr::collect() allowing to get results from dplyr extensions like {dtplyr} for data.tables, or {dbplyr} for databases. Finally these two assignment operators also make sure that the preferred data frame object is returned by using default_dtx().

Usage

value %->% x

x %<-% value

## Default S3 method:
collect(x, ...)
value %->% x

x %<-% value

## Default S3 method:
collect(x, ...)

Arguments

`value`	The object to be assigned.
`x`	A name, or a name structure for multiple (deconstructing) assignment, or any object that does not have a specific [dplyr::collect[]) method for `collect.default()`.
`...`	further arguments passed to the method (not used for the default one)

Details

These assignation operator are overloaded to get interesting properties in the context of {tidyverse} pipelines and to make sure to always return our preferred data frame object (data.frame, data.table, or tibble). Thus, before being assigned, value is modified by calling dplyr::collect() on it and by applying default_dtx().

Value

These operators invisibly return value. collect.default() simply return x.

Examples

# The alternate assignment operator performs three steps:
# 1) Collect results from dbplyr or dtplyr
library(dplyr)
library(data.table)
library(dtplyr)
library(svBase)
dtt <- data.table(x = 1:5, y = rnorm(5))
dtt |>
  mutate(x2 = x^2) |>
  select(x2, y) ->
  res

print(res)
class(res) # This is a data frame

dtt |>
  lazy_dt() |>
  mutate(x2 = x^2) |>
  select(x2, y) ->
  res

print(res)
class(res) # This is NOT a data frame

# Same pipeline, but assigning with %->%
dtt |>
  lazy_dt() |>
  mutate(x2 = x^2) |>
  select(x2, y) %->%
  res

print(res)
class(res) # res is the preferred data frame (data.table by default)

# 2) Convert data frame in the chosen format using default_dtx()
dtf <- data.frame(x = 1:5, y = rnorm(5))
class(dtf)
res %<-% dtf
class(res) # A data.table by default
# but it can be changed with options("SciViews.as_dtx)

# 3) If the zeallot syntax is used, make multiple assignment
c(X, Y) %<-% dtf # Variables of dtf assigned to different names
X
Y

# The %->% is meant to be used in pipelines, otherwise it does the same
# The alternate assignment operator performs three steps:
# 1) Collect results from dbplyr or dtplyr
library(dplyr)
library(data.table)
library(dtplyr)
library(svBase)
dtt <- data.table(x = 1:5, y = rnorm(5))
dtt |>
  mutate(x2 = x^2) |>
  select(x2, y) ->
  res

print(res)
class(res) # This is a data frame

dtt |>
  lazy_dt() |>
  mutate(x2 = x^2) |>
  select(x2, y) ->
  res

print(res)
class(res) # This is NOT a data frame

# Same pipeline, but assigning with %->%
dtt |>
  lazy_dt() |>
  mutate(x2 = x^2) |>
  select(x2, y) %->%
  res

print(res)
class(res) # res is the preferred data frame (data.table by default)

# 2) Convert data frame in the chosen format using default_dtx()
dtf <- data.frame(x = 1:5, y = rnorm(5))
class(dtf)
res %<-% dtf
class(res) # A data.table by default
# but it can be changed with options("SciViews.as_dtx)

# 3) If the zeallot syntax is used, make multiple assignment
c(X, Y) %<-% dtf # Variables of dtf assigned to different names
X
Y

# The %->% is meant to be used in pipelines, otherwise it does the same

Coerce objects into data.frames, data.tables, tibbles or matrices

Description

Objects are coerced into the desired class. For as_dtx(), the desired class is obtained from getOption("SciViews.as_dtx"), with a default value producing a data.table object. If the data are grouped with dplyr::group_by(), the resulting data frame is also dplyr::ungroup()ed in the process.

Usage

as_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE)

as_dtf(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL)

as_dtt(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE)

as_dtbl(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL)

default_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE)

## S3 method for class 'tbl_df'
as.matrix(x, row.names = NULL, optional = FALSE, ...)

as_matrix(x, rownames = NULL, ...)
as_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE)

as_dtf(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL)

as_dtt(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE)

as_dtbl(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL)

default_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE)

## S3 method for class 'tbl_df'
as.matrix(x, row.names = NULL, optional = FALSE, ...)

as_matrix(x, rownames = NULL, ...)

Arguments

`x`	An object.
`...`	Further arguments passed to the methods (not used yet).
`rownames`	The name of the column with row names. If `NULL`, it is assessed from `getOptions("SciViews.dtx.rownames")`.
`keep.key`	Do we keep the data.table key into a "key" attribute or do we restore `data.table`key from the attribute?
`byref`	If `TRUE`, the object is modified by reference when converted into a `data.table` (faster, but not conventional). This is `FALSE` by default, or `NULL` if the argument does not apply in the context.
`row.names`	Same as `rownames`, but for base R functions.
`optional`	logical, If `TRUE`, setting row names and converting column names to syntactically correct names is optional.

Value

The coerced object. For as_dtx(), the coercion is determined from getOption("SciViews.as_dtx") which must return one of the three other as_dt...() functions (as_dtt by default). The default_dtx() does the same as as_dtx() if the object is a data.frame, a data.table, or a tibble, but it return the unmodified object for any other class (including subclassed data frames). This is a convenient function to force conversion only between those three objects classes.

Note

Use as_matrix() instead of base::as.matrix(): it has different default arguments to better account for rownames in data.table and tibble!

Examples

# A data.frame
dtf <- dtf(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE))

# Convert into a tibble
(dtbl <- as_dtbl(dtf))
# Since row names are trivial (1 -> 5), a .rownames column is not added

dtf2 <- dtf
rownames(dtf2) <- letters[1:5]
dtf2

# Now, the conversion into a tibble adds .rownames
(dtbl2 <- as_dtbl(dtf2))
# and data frame row names are set again when converted bock to dtf
as_dtf(dtbl2)

# It also work for conversions data.frame <-> data.table
(dtt2 <- as_dtt(dtf2))
as_dtf(dtt2)

# It does not work when converting a tibble or a data.table into a matrix
# with as.matrix()
as.matrix(dtbl2)
# ... but as_matrix() does the job!
as_matrix(dtbl2)

# The name for row in dtt and dtbl is in:
# (data.frame's row names are converted into a column with this name)
getOption("SciViews.dtx.rownames", default = ".rownames")

# Convert into the preferred data frame object (data.table by default)
(dtx2 <- as_dtx(dtf2))
class(dtx2)

# The default data frame object used:
getOption("SciViews.as_dtx", default = as_dtt)

# default_dtx() does the same as as_dtx(),
# but it also does not change other objects
# So, it is safe to use whaterver the object you pass to it
(dtx2 <- default_dtx(dtf2))
class(dtx2)
# Any other object than data.frame, data.table or tbl_df is not converted
res <- default_dtx(1:5)
class(res)
# No conversion if the data frame is subclassed
dtf3 <- dtf2
class(dtf3) <- c("subclassed", "data.frame")
class(default_dtx(dtf3))

# data.table keys are converted into a 'key' attribute and back
library(data.table)
setkey(dtt2, 'x')
haskey(dtt2)
key(dtt2)

(dtf3 <- as_dtf(dtt2))
attributes(dtf3)
# Key is restored when converted back into a data.table (also from a tibble)
(dtt3 <- as_dtt(dtf3))
haskey(dtt3)
key(dtt3)

# Grouped tibbles are ungrouped with as_dtbl() or as_dtx()/default_dtx()!
mtcars |> dplyr::group_by(cyl) -> mtcars_grouped
class(mtcars_grouped)
mtcars2 <- as_dtbl(mtcars_grouped)
class(mtcars2)
# A data.frame
dtf <- dtf(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE))

# Convert into a tibble
(dtbl <- as_dtbl(dtf))
# Since row names are trivial (1 -> 5), a .rownames column is not added

dtf2 <- dtf
rownames(dtf2) <- letters[1:5]
dtf2

# Now, the conversion into a tibble adds .rownames
(dtbl2 <- as_dtbl(dtf2))
# and data frame row names are set again when converted bock to dtf
as_dtf(dtbl2)

# It also work for conversions data.frame <-> data.table
(dtt2 <- as_dtt(dtf2))
as_dtf(dtt2)

# It does not work when converting a tibble or a data.table into a matrix
# with as.matrix()
as.matrix(dtbl2)
# ... but as_matrix() does the job!
as_matrix(dtbl2)

# The name for row in dtt and dtbl is in:
# (data.frame's row names are converted into a column with this name)
getOption("SciViews.dtx.rownames", default = ".rownames")

# Convert into the preferred data frame object (data.table by default)
(dtx2 <- as_dtx(dtf2))
class(dtx2)

# The default data frame object used:
getOption("SciViews.as_dtx", default = as_dtt)

# default_dtx() does the same as as_dtx(),
# but it also does not change other objects
# So, it is safe to use whaterver the object you pass to it
(dtx2 <- default_dtx(dtf2))
class(dtx2)
# Any other object than data.frame, data.table or tbl_df is not converted
res <- default_dtx(1:5)
class(res)
# No conversion if the data frame is subclassed
dtf3 <- dtf2
class(dtf3) <- c("subclassed", "data.frame")
class(default_dtx(dtf3))

# data.table keys are converted into a 'key' attribute and back
library(data.table)
setkey(dtt2, 'x')
haskey(dtt2)
key(dtt2)

(dtf3 <- as_dtf(dtt2))
attributes(dtf3)
# Key is restored when converted back into a data.table (also from a tibble)
(dtt3 <- as_dtt(dtf3))
haskey(dtt3)
key(dtt3)

# Grouped tibbles are ungrouped with as_dtbl() or as_dtx()/default_dtx()!
mtcars |> dplyr::group_by(cyl) -> mtcars_grouped
class(mtcars_grouped)
mtcars2 <- as_dtbl(mtcars_grouped)
class(mtcars2)

Force computation of a lazy tidyverse object

Description

When {dplyr} or {tidyr} verbs are applied to a data.table or a database connection, they do not output data frames but objects like dtplyr_step or tbl_sql that are called lazy data frames. The actual process is triggered by using as_dtx(), or more explicitly with dplyr::collect() which coerces the result to a tibble. If you want the default {svBase} data frame object instead, use collect_dtx(), or if you want a specific object, use one of the other variants.

Usage

collect_dtx(x, ...)

collect_dtf(x, ...)

collect_dtt(x, ...)

collect_dtbl(x, ...)
collect_dtx(x, ...)

collect_dtf(x, ...)

collect_dtt(x, ...)

collect_dtbl(x, ...)

Arguments

`x`	A data.frame, data.table, tibble or a lazy data frame (dtplyr_step, tbl_sql...).
`...`	Arguments passed on to methods for `dplyr::collect()`.

Value

A data frame (data.frame, data.table or tibble's tbl_df), the default version for collect_dtx().

Examples

# Assuming the default data frame for svBase is a data.table
mtcars_dtt <- as_dtt(mtcars)
library(dplyr)
library(dtplyr)
# A lazy data frame, not a "real" data frame!
mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> class()
# A data frame
mtcars |> select(mpg:disp) |> class()
# A data table
mtcars_dtt |> select(mpg:disp) |> class()
# A tibble, always!
mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect() |> class()
# The data frame object you want, default one specified for svBase
mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect_dtx() |> class()
# Assuming the default data frame for svBase is a data.table
mtcars_dtt <- as_dtt(mtcars)
library(dplyr)
library(dtplyr)
# A lazy data frame, not a "real" data frame!
mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> class()
# A data frame
mtcars |> select(mpg:disp) |> class()
# A data table
mtcars_dtt |> select(mpg:disp) |> class()
# A tibble, always!
mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect() |> class()
# The data frame object you want, default one specified for svBase
mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect_dtx() |> class()

Create a data frame (base's data.frame, data.table or tibble's tbl_df)

Description

Create a data frame (base's data.frame, data.table or tibble's tbl_df)

Usage

dtx(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))

dtbl(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))

dtf(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))

dtt(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))
dtx(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))

dtbl(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))

dtf(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))

dtt(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))

Arguments

`...`	A set of name-value pairs. The content of the data frame. See `tibble()` for more details on the way dynamic-dots are processed.
`.name_repair`	The way problematic column names are treated, see also `tibble()` for details.

Value

A data frame as a tbl_df object for dtbl(), a data.frame for dtf() and a data.table for dtt().

Note

data.table and tibble's tbl_df do no use row names. However, you can add a column named .rownames(by default), or the name that is in getOption("SciViews.dtx.rownames") and it will be automatically set as row names when the object is converted into a data.frame with as_dtf(). For dtf(), just create a column of this name and it is directly used as row names for the resulting data.frame object.

Examples

dtbl1 <- dtbl(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE)
)
class(dtbl1)

dtf1 <- dtf(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE)
)
class(dtf1)

dtt1 <- dtt(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE))
class(dtt1)

# Using dtx(), one construct the preferred data frame object
# (a data.table by default, can be changed with options(SciViews.as_dtx = ...))
dtx1 <- dtx(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE))
class(dtx1) # data.table by default

# With svBase data.table and data.frame objects have the same nice print as tibbles
dtbl1
dtf1
dtt1

# Use tribble() inside dtx() to easily create a data frame:
library(tibble)
dtx2 <- dtx(tribble(
  ~x, ~y, ~f,
   1,  3, 'a',
   2,  4, 'b'
))
dtx2
class(dtx2)

# This is how you specify row names for dtf (data.frame)
dtf(x = 1:3, y = 4:6, .rownames = letters[1:3])
dtbl1 <- dtbl(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE)
)
class(dtbl1)

dtf1 <- dtf(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE)
)
class(dtf1)

dtt1 <- dtt(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE))
class(dtt1)

# Using dtx(), one construct the preferred data frame object
# (a data.table by default, can be changed with options(SciViews.as_dtx = ...))
dtx1 <- dtx(
  x = 1:5,
  y = rnorm(5),
  f = letters[1:5],
  l = sample(c(TRUE, FALSE), 5, replace = TRUE))
class(dtx1) # data.table by default

# With svBase data.table and data.frame objects have the same nice print as tibbles
dtbl1
dtf1
dtt1

# Use tribble() inside dtx() to easily create a data frame:
library(tibble)
dtx2 <- dtx(tribble(
  ~x, ~y, ~f,
   1,  3, 'a',
   2,  4, 'b'
))
dtx2
class(dtx2)

# This is how you specify row names for dtf (data.frame)
dtf(x = 1:3, y = 4:6, .rownames = letters[1:3])

Row-wise creation of a data frame

Description

The presentation of the data (see examples) is easier to read than with the traditional column-wise entry in dtx(). This could be used to enter small tables in R, but do not abuse of it!

Usage

dtx_rows(...)

dtf_rows(...)

dtt_rows(...)

dtbl_rows(...)
dtx_rows(...)

dtf_rows(...)

dtt_rows(...)

dtbl_rows(...)

Arguments

...

Specify the structure of the data frame by using formulas for variable names like ~x for variable x. Then, use one argument per value in the data frame. It is possible to unquote with ⁠!!⁠ and to unquote-splice with ⁠!!!⁠.

Value

A data frame of class data.frame for dtf_rows(), data.table for dtt_rows(), tibble tbl_df for dtbl_rows() and the default object with dtx_rows().

Examples

df <- dtx_rows(
  ~x, ~y, ~group,
   1,  3,    "A",
   6,  2,    "A",
   10, 4,    "B"
)
df
df <- dtx_rows(
  ~x, ~y, ~group,
   1,  3,    "A",
   6,  2,    "A",
   10, 4,    "B"
)
df

Fast (flexible and friendly) statistical functions (mainly from collapse) for matrix-like and data frame objects

Description

The fast statistical function, or fast-flexible-friendly statistical functions are prefixed with "f". These vectorized functions supersede the no-f functions, bringing the capacity to work smoothly on matrix-like and data frame objects. Most of them are defined in the {collapse} package For instance, base mean() operates on a vector, but not on a data frame. A matrix is recognized as a vector and a single mean is returned. On, the contrary, fmean() calculates one mean per column. It does the same for a data frame, and it does so usually quicker than base functions. No need for colMeans(), a separate function to do so. Fast statistical functions also recognize grouping with fgroup_by(), sgroup_by() or group_by() and calculate the mean by group in this case. Again, no need for a different function like stats::ave(). Finally, these functions also have a ⁠TRA=⁠ argument that computes, for instance, if TRA = "-", ⁠(x f(x))⁠ very efficiently (for instance to calculate residuals by subtracting the mean). Another particularity is the ⁠na.rm=⁠ argument that is TRUE by default, while it is FALSE by default for mean(). These are generic functions with methods for matrix, data.frame, grouped_df and a default method used for simple numeric vectors. Most of them are defined in the {collapse} package, but there are a couple more here, together with an alternate syntax to replace ⁠TRA=⁠ with ⁠%_f%⁠.

Usage

list_fstat_functions()

fn(x, ...)

fna(x, ...)

x %replacef% expr

x %replace_fillf% expr

x %-f% expr

x %+f% expr

x %-+f% expr

x %/f% expr

x %/*100f% expr

x %*f% expr

x %modf% expr

x %-modf% expr
list_fstat_functions()

fn(x, ...)

fna(x, ...)

x %replacef% expr

x %replace_fillf% expr

x %-f% expr

x %+f% expr

x %-+f% expr

x %/f% expr

x %/*100f% expr

x %*f% expr

x %modf% expr

x %-modf% expr

Arguments

`x`	A numeric vector, matrix, data frame or grouped data frame (class 'grouped_df').
`...`	Further arguments passed to the method, like `⁠w=⁠`, a numeric vector of (non-negative) weights that may contain missing values, or `⁠TRA=⁠`, a quoted operator indicating the transformation to perform: `"replace"` to get a vector of same size of `x` with results, `"replace_fill"` idem but also replace missing data, `"-"` to subtract, `"+"` to add, `"-+"` to subtract and add the global statistic, `"/"` to divide, `"%"` to divide and multiply by 100 (percent), `"*"` to multiply, `"%%"` to take the modulus (remainder from division by the statistic) and `"-%%"` to subtract modulus ('i.e., to floor the data by the statistic), see `collapse::TRA()`. Also `⁠na.rm=⁠`, a logical indicating if we skip missing values in `x` if `TRUE`(by default). If `FALSE` for any missing data in `x`, `NA`is returned. For details and other arguments, see the corresponding help page in the collapse package.
`expr`	The expression to evaluate as RHS of the `⁠%__f%⁠` operators.

Value

The number of all observations for fn() or the number of missing observations for fna(). list_fstat_functions() returns a list of all the known fast statistical functions.

Note

The page collapse::fast-statistical-functions gives more details. fn() count all observations, including NAs, fna() counts only NAs, where fnobs() counts non-missing observations. Instead of ⁠TRA=⁠ one can use the ⁠%__f%⁠ functions where ⁠__⁠ is replace, replace_fill, -, +, ⁠-+⁠, /, ⁠/*100⁠ for TRA="%", *, mod for TRA="%%", or -mod for TRA="-%%". See example.

Examples

library(collapse)
data(iris)
iris_num <- iris[, -5] # Only numerical variables
mean(iris$Sepal.Length) # OK, but mean(iris_num does not work)
colMeans(iris_num)
# Same
fmean(iris_num)
# Idem, but mean by group for all 4 numerical variables
iris |> fgroup_by(Species) |> fmean()
# Residuals (x - mean(x)) by group
iris |> fgroup_by(Species) |> fmean(TRA = "-")
# The same calculation, in a little bit more expressive way
iris |> fgroup_by(Species) %-f% fmean()
# or:
iris_num %-f% fmean(g = iris$Species)
library(collapse)
data(iris)
iris_num <- iris[, -5] # Only numerical variables
mean(iris$Sepal.Length) # OK, but mean(iris_num does not work)
colMeans(iris_num)
# Same
fmean(iris_num)
# Idem, but mean by group for all 4 numerical variables
iris |> fgroup_by(Species) |> fmean()
# Residuals (x - mean(x)) by group
iris |> fgroup_by(Species) |> fmean(TRA = "-")
# The same calculation, in a little bit more expressive way
iris |> fgroup_by(Species) %-f% fmean()
# or:
iris_num %-f% fmean(g = iris$Species)

Test if the object is a data frame (data.frame, data.table or tibble)

Description

Test if the object is a data frame (data.frame, data.table or tibble)

Usage

is_dtx(x, strict = TRUE)

is_dtf(x, strict = TRUE)

is_dtt(x, strict = TRUE)

is_dtbl(x, strict = TRUE)
is_dtx(x, strict = TRUE)

is_dtf(x, strict = TRUE)

is_dtt(x, strict = TRUE)

is_dtbl(x, strict = TRUE)

Arguments

`x`	An object
`strict`	Should this be strictly the corresponding class `TRUE`, by default, or could it be subclassed too (`FALSE`). With `strict = TRUE`, the grouped_df tibbles and grouped_ts tsibbles are also considered (tibbles or tsibbles where `dplyr::group_by()` was applied).

Value

These functions return TRUE if the object is of the correct class, otherwise they return FALSE. is_dtx() return TRUE if x is one of a data.frame, data.table or tibble.

Examples

# data(mtcars)
is_dtf(mtcars) # TRUE
is_dtx(mtcars) # Also TRUE
is_dtt(mtcars) # FALSE
is_dtbl(mtcars) # FALSE
# but...
is_dtt(as_dtt(mtcars)) # TRUE
is_dtx(as_dtt(mtcars)) # TRUE
is_dtbl(as_dtbl(mtcars)) # TRUE
is_dtx(as_dtbl(mtcars)) # TRUE
is_dtx(as_dtbl(mtcars) |> dplyr::group_by(cyl)) # TRUE (special case)

is_dtx("some string") # FALSE
# data(mtcars)
is_dtf(mtcars) # TRUE
is_dtx(mtcars) # Also TRUE
is_dtt(mtcars) # FALSE
is_dtbl(mtcars) # FALSE
# but...
is_dtt(as_dtt(mtcars)) # TRUE
is_dtx(as_dtt(mtcars)) # TRUE
is_dtbl(as_dtbl(mtcars)) # TRUE
is_dtx(as_dtbl(mtcars)) # TRUE
is_dtx(as_dtbl(mtcars) |> dplyr::group_by(cyl)) # TRUE (special case)

is_dtx("some string") # FALSE

Speedy functions (mainly from collapse and data.table) to manipulate data frames

Description

The Tidyverse defines a coherent set of tools to manipulate data frames that use a non-standard evaluation and sometimes require extra care. These functions, like mutate() or summarise() are defined in the {dplyr} and {tidyr} packages. The {collapse} package proposes a couple of functions with similar interface, but with different and much faster code. For instance, fselect() is similar to select(), or fsummarise() is similar to summarise(). Not all functions are implemented, arguments and argument names differ, and the behavior may be very different, like frename() which uses old_name = new_name, while rename() uses new_name = old_name! The speedy functions all are prefixed with an "s", like smutate(), and build on the work initiated in {collapse} to propose a series of paired functions with the tidy ones. So, smutate() and mutate() are "speedy" and 'tidy" counterparts and they are used in a very similar, if not identical way. This notation using a "s" prefix is there to draw the attention on their particularities. Their classes are function and speedy_fn. Avoid mixing tidy, speedy and non-tidy/speedy functions in the same pipeline. This is a global page to present all the speedy functions in one place. It is not meant to be a clear and detailed help page of all individual "s" functions. Please, refer to the corresponding help page of the non-"s" paired function for more details! You can use the {svMisc}'s .?smutate syntax to go to the help page of the non-"s" function with a message.

Usage

list_speedy_functions()

sgroup_by(.data, ...)

sungroup(.data, ...)

srename(.data, ...)

srename_with(.data, .fn, .cols = everything(), ...)

sfilter(.data, ...)

sfilter_ungroup(.data, ...)

sselect(.data, ...)

smutate(.data, ..., .keep = "all")

smutate_ungroup(.data, ..., .keep = "all")

stransmute(.data, ...)

stransmute_ungroup(.data, ...)

ssummarise(.data, ...)

sfull_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sleft_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sright_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sinner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sbind_rows(..., .id = NULL)

scount(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = dplyr::group_by_drop_default(x),
  sort_cat = TRUE,
  decreasing = FALSE
)

stally(
  x,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sadd_count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sadd_tally(
  x,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sbind_cols(
  ...,
  .name_repair = c("unique", "universal", "check_unique", "minimal")
)

sarrange(.data, ..., .by_group = FALSE)

spull(.data, var = -1, name = NULL, ...)

sdistinct(.data, ..., .keep_all = FALSE)

sdrop_na(data, ...)

sreplace_na(data, replace, ...)

spivot_longer(data, cols, names_to = "name", values_to = "value", ...)

spivot_wider(data, names_from = name, values_from = value, ...)

suncount(data, weights, .remove = TRUE, .id = NULL)

sunite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)

sseparate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)

sseparate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE)

sfill(data, ..., .direction = c("down", "up", "downup", "updown"))

sextract(
  data,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)
list_speedy_functions()

sgroup_by(.data, ...)

sungroup(.data, ...)

srename(.data, ...)

srename_with(.data, .fn, .cols = everything(), ...)

sfilter(.data, ...)

sfilter_ungroup(.data, ...)

sselect(.data, ...)

smutate(.data, ..., .keep = "all")

smutate_ungroup(.data, ..., .keep = "all")

stransmute(.data, ...)

stransmute_ungroup(.data, ...)

ssummarise(.data, ...)

sfull_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sleft_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sright_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sinner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sbind_rows(..., .id = NULL)

scount(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = dplyr::group_by_drop_default(x),
  sort_cat = TRUE,
  decreasing = FALSE
)

stally(
  x,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sadd_count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sadd_tally(
  x,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sbind_cols(
  ...,
  .name_repair = c("unique", "universal", "check_unique", "minimal")
)

sarrange(.data, ..., .by_group = FALSE)

spull(.data, var = -1, name = NULL, ...)

sdistinct(.data, ..., .keep_all = FALSE)

sdrop_na(data, ...)

sreplace_na(data, replace, ...)

spivot_longer(data, cols, names_to = "name", values_to = "value", ...)

spivot_wider(data, names_from = name, values_from = value, ...)

suncount(data, weights, .remove = TRUE, .id = NULL)

sunite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)

sseparate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)

sseparate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE)

sfill(data, ..., .direction = c("down", "up", "downup", "updown"))

sextract(
  data,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

`.data`	A data frame (data.frame, data.table or tibble's tbl_df)
`...`	Arguments dependent to the context of the function and most of the time, not evaluated in a standard way (cf. the tidyverse approach).
`.fn`	A function to use.
`.cols`	The list of the column where to apply the transformation. For the moment, only all existing columns, which means `.cols = everything()` is implemented
`.keep`	Which columns to keep. The default is `"all"`, possible values are `"used"`, `"unused"`, or `"none"` (see `mutate()`).
`x`	A data frame (data.frame, data.table or tibble's tbl_df).
`y`	A second data frame.
`by`	A list of names of the columns to use for joining the two data frames.
`suffix`	The suffix to the column names to use to differentiate the columns that come from the first or the second data frame. By default it is `c(".x", ".y")`.
`copy`	This argument is there for compatibility with the "t" matching functions, but it is not used here.
`.id`	The name of the column for the origin id, either names if all other arguments are named, or numbers.
`wt`	Frequency weights. Can be `NULL` or a variable. Use data masking.
`sort`	If `TRUE` largest group will be shown on top.
`name`	The name of the new column in the output (`n` by default, and no existing column must have this name, or an error is generated).4
`.drop`	Are levels with no observations dropped (`TRUE` by default).
`sort_cat`	Are levels sorted (`TRUE` by default).
`decreasing`	Is sorting done in decreasing order (`FALSE` by default)?
`.name_repair`	How should the name be "repaired" to avoid duplicate column names? See `dplyr::bind_cols()` for more details.
`.by_group`	Logical. If `TRUE` rows are first arranger by the grouping variables in any. `FALSE` by default.
`var`	A variable specified as a name, a positive or a negative integer (counting from the end). The default is `-1` and returns last variable.
`.keep_all`	If `TRUE` keep all variables in `.data`.
`data`	A data frame, or for `replace_na()` a vector or a data frame.
`replace`	If `data` is a vector, a unique value to replace `NA`s, otherwise, a list of values, one per column of the data frame.
`cols`	A selection of the columns using tidy-select syntax, see`tidyr::pivot_longer()`.
`names_to`	A character vector with the name or names of the columns for the names.
`values_to`	A string with the name of the column that receives the values.
`names_from`	The column or columns containing the names (use tidy selection and do not quote the names).
`values_from`	Idem for the column or columns that contain the values.
`weights`	A vector of weight to use to "uncount" `data`.
`.remove`	If `TRUE`, and `weights` is the name of a column, that column is removed from `data`.
`col`	The name quoted or not of the new column with united variable.
`sep`	Separator to use between values for united or separated columns.
`remove`	If `TRUE` the initial columns that are separated are also removed from `data`.
`na.rm`	If `TRUE`, `NA`s are eliminated before uniting the values.
`into`	Name of the new column to put separated variables. Use `NA` for items to drop.
`convert`	If `⁠'TRUE⁠` resulting values are converted into numeric, integer or logical.
`.direction`	Direction in which to fill missing data: `"down"` (by default), `"up"`, or `"downup"` (first down, then up), `"updown"` (the opposite).
`regex`	A regular expression used to extract the desired values (use one group with `(` and `⁠)⁠` for each element of `into`).

Value

See corresponding "non-s" function for the full help page with indication of the return values.

Note

The ssummarise() function does not support n() as does dplyr::summarise(). You can use fn() instead, but then, you must give a variable name as argument. The fn() alternative can also be used in summarise() for homogeneous syntax between the two. From {dplyr}, the slice() and slice_xxx() functions are not added yet because they are not available for {dbplyr}. Also anti_join(), semi_join() and nest_join() are not implemented yet. From {tidyr} expand(), chop(), unchop(), nest(), unnest(), unnest_longer(), unnest_wider(), hoist(), pack() and unpack() are not implemented yet.

Examples

# TODO...
# TODO...

Tidy functions (mainly from dplyr and tidyr) to manipulate data frames

Description

The Tidyverse defines a coherent set of tools to manipulate data frames that use a non-standard evaluation and sometimes require extra care. These functions, like mutate() or summarise() are defined in the {dplyr} and {tidyr} packages. When using variants, like {dtplyr} for data.frame objects, or {dbplyr} to work with external databases, successive commands in a pipeline are pooled together but not computed. One has to collect() the result to get its final form. Most of the tidy functions that have their "speedy" counterpart prefixed with "s" are listed withlist_tidy_functions(). Their main usages are (excluding less used arguments, or those that are not compatibles with the speedy "s" counterpart functions):

group_by(.data, ...)
ungroup(.data)
rename(.data, ...)
rename_with(.data, .fn, .cols = everything(), ...)
filter(.data, ...)
select(.data, ...)
mutate(.data, ..., .keep = "all")
transmute(.data, ...)
summarise(.data, ...)
full_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
left_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
right_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
inner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
bind_rows(..., .id = NULL)
bind_cols(..., .name_repair = c("unique", "universal", "check_unique", "minimal"))
arrange(.data, ..., .by_group = FALSE)
count(x, ..., wt = NULL, sort = FALSE, name = NULL)
tally(x, wt = NULL, sort = FALSE, name = NULL)
add_count(x, ..., wt = NULL, sort = FALSE, name = NULL)
add_tally(x, wt = NULL, sort = FALSE, name = NULL)
pull(.data, var = -1, name = NULL)
distinct(.data, ..., .keep_all = FALSE)
drop_na(data, ...)
replace_na(data, replace)
pivot_longer(data, cols, names_to = "name", values_to = "value")
pivot_wider(data, names_from = name, values_from = value)
uncount(data, weights, .remove = TRUE, .id = NULL)
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE)
separate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE)
fill(data, ..., .direction = c("down", "up", "downup", "updown"))
extract(data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE) plus the functions defined here under.

Usage

list_tidy_functions()

filter_ungroup(.data, ...)

mutate_ungroup(.data, ..., .keep = "all")

transmute_ungroup(.data, ...)
list_tidy_functions()

filter_ungroup(.data, ...)

mutate_ungroup(.data, ..., .keep = "all")

transmute_ungroup(.data, ...)

Arguments

`.data`	A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See `mutate()` for more details.
`...`	Arguments dependent to the context of the function and most of the time, not evaluated in a standard way (cf. the tidyverse approach).
`.keep`	Which columns to keep. The default is `"all"`, possible values are `"used"`, `"unused"`, or `"none"` (see `mutate()`).

Value

See corresponding "non-t" function for the full help page with indication of the return values. list_tidy_functions() returns a list of all the tidy(verse) functions that have their speedy "s" counterpart, see speedy_functions.

Note

The help page here is very basic and it aims mainly to list all the tidy functions. For more complete help, see the {dplyr} or {tidyr} packages. From {dplyr}, the slice() and slice_xxx() functions are not added yet because they are not available for {dbplyr}. Also anti_join(), semi_join() and nest_join() are not implemented yet. From {dplyr}, the slice() and slice_xxx() functions are not added yet because they are not available for {dbplyr}. Also anti_join(), semi_join() and nest_join() are not implemented yet. From {tidyr} expand(), chop(), unchop(), nest(), unnest(), unnest_longer(), unnest_wider(), hoist(), pack() and unpack() are not implemented yet.

Examples

# TODO...
# TODO...

Package 'svBase'

Help Index

Base Objects like Data Frames for 'SciViews::R'

Description

Important functions

Alternate assignment (multiple and/or collect results from dplyr)

Description

Usage

Arguments

Details

Value

Examples

Coerce objects into data.frames, data.tables, tibbles or matrices

Description

Usage

Arguments

Value

Note

Examples

Force computation of a lazy tidyverse object

Description

Usage

Arguments

Value

Examples

Create a data frame (base's data.frame, data.table or tibble's tbl_df)

Description

Usage

Arguments

Value

Note

Examples

Row-wise creation of a data frame

Description

Usage

Arguments

Value

Examples

Fast (flexible and friendly) statistical functions (mainly from collapse) for matrix-like and data frame objects

Description

Usage

Arguments

Value

Note

Examples

Test if the object is a data frame (data.frame, data.table or tibble)

Description

Usage

Arguments

Value

Examples

Speedy functions (mainly from collapse and data.table) to manipulate data frames

Description

Usage

Arguments

Value

Note

Examples

Tidy functions (mainly from dplyr and tidyr) to manipulate data frames

Description

Usage

Arguments

Value

Note

See Also

Examples