Title: | Base Objects like Data Frames for 'SciViews::R' |
---|---|
Description: | Functions to manipulated the three main classes of "data frames" for 'SciViews::R': data.frame, data.table and tibble. Allow to select the preferred one, and to convert more carefully between the three, taking care of correct presentation of row names and data.table's keys. More homogeneous way of creating these three data frames and of printing them on the R console. |
Authors: | Philippe Grosjean [aut, cre] |
Maintainer: | Philippe Grosjean <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.4.0 |
Built: | 2024-11-02 04:43:23 UTC |
Source: | https://github.com/SciViews/svBase |
The {svBase} package sets up the way data frames (with objects like R base's data.frame, data.table and tibble tbl_df) are managed in SciViews::R. The user can select the class of object it uses by default and many other SciViews::R functions return that format. Conversion from one to the other is made easier, including for the management of data.frame's row names or data.table's keys. Also homogeneous ways to create a data frame or to print it are also provided.
dtx()
creates a data frame in the preferred format, with the
following functions dtbl()
, dtf()
and dtt()
that force respectively
the creation of a data frame in one of the specified three formats. Use
getOption("SciViews.as_dtx", default = as_dtt)
to specify which function to
use to convert into the preferred format.
These alternate assignment operators can be used to perform
multiple assignment (also known as destructuring assignment). These are
imported from the {zeallot} package (see the corresponding help page at zeallot::operator for complete description). They also performs a dplyr::collect()
allowing to get results from dplyr extensions like {dtplyr} for data.tables, or {dbplyr} for databases. Finally these two assignment operators also make sure that the preferred data frame object is returned by using default_dtx()
.
value %->% x x %<-% value ## Default S3 method: collect(x, ...)
value %->% x x %<-% value ## Default S3 method: collect(x, ...)
value |
The object to be assigned. |
x |
A name, or a name structure for multiple (deconstructing)
assignment, or any object that does not have a specific [dplyr::collect[])
method for |
... |
further arguments passed to the method (not used for the default one) |
These assignation operator are overloaded to get interesting
properties in the context of {tidyverse} pipelines and to make sure to always
return our preferred data frame object (data.frame, data.table, or tibble).
Thus, before being assigned, value
is modified by calling
dplyr::collect()
on it and by applying default_dtx()
.
These operators invisibly return value
. collect.default()
simply
return x
.
# The alternate assignment operator performs three steps: # 1) Collect results from dbplyr or dtplyr library(dplyr) library(data.table) library(dtplyr) library(svBase) dtt <- data.table(x = 1:5, y = rnorm(5)) dtt |> mutate(x2 = x^2) |> select(x2, y) -> res print(res) class(res) # This is a data frame dtt |> lazy_dt() |> mutate(x2 = x^2) |> select(x2, y) -> res print(res) class(res) # This is NOT a data frame # Same pipeline, but assigning with %->% dtt |> lazy_dt() |> mutate(x2 = x^2) |> select(x2, y) %->% res print(res) class(res) # res is the preferred data frame (data.table by default) # 2) Convert data frame in the chosen format using default_dtx() dtf <- data.frame(x = 1:5, y = rnorm(5)) class(dtf) res %<-% dtf class(res) # A data.table by default # but it can be changed with options("SciViews.as_dtx) # 3) If the zeallot syntax is used, make multiple assignment c(X, Y) %<-% dtf # Variables of dtf assigned to different names X Y # The %->% is meant to be used in pipelines, otherwise it does the same
# The alternate assignment operator performs three steps: # 1) Collect results from dbplyr or dtplyr library(dplyr) library(data.table) library(dtplyr) library(svBase) dtt <- data.table(x = 1:5, y = rnorm(5)) dtt |> mutate(x2 = x^2) |> select(x2, y) -> res print(res) class(res) # This is a data frame dtt |> lazy_dt() |> mutate(x2 = x^2) |> select(x2, y) -> res print(res) class(res) # This is NOT a data frame # Same pipeline, but assigning with %->% dtt |> lazy_dt() |> mutate(x2 = x^2) |> select(x2, y) %->% res print(res) class(res) # res is the preferred data frame (data.table by default) # 2) Convert data frame in the chosen format using default_dtx() dtf <- data.frame(x = 1:5, y = rnorm(5)) class(dtf) res %<-% dtf class(res) # A data.table by default # but it can be changed with options("SciViews.as_dtx) # 3) If the zeallot syntax is used, make multiple assignment c(X, Y) %<-% dtf # Variables of dtf assigned to different names X Y # The %->% is meant to be used in pipelines, otherwise it does the same
Objects are coerced into the desired class. For as_dtx()
, the
desired class is obtained from getOption("SciViews.as_dtx")
, with a default
value producing a data.table object. If the data are grouped with
dplyr::group_by()
, the resulting data frame is also dplyr::ungroup()
ed
in the process.
as_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE) as_dtf(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL) as_dtt(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE) as_dtbl(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL) default_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE) ## S3 method for class 'tbl_df' as.matrix(x, row.names = NULL, optional = FALSE, ...) as_matrix(x, rownames = NULL, ...)
as_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE) as_dtf(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL) as_dtt(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE) as_dtbl(x, ..., rownames = NULL, keep.key = TRUE, byref = NULL) default_dtx(x, ..., rownames = NULL, keep.key = TRUE, byref = FALSE) ## S3 method for class 'tbl_df' as.matrix(x, row.names = NULL, optional = FALSE, ...) as_matrix(x, rownames = NULL, ...)
x |
An object. |
... |
Further arguments passed to the methods (not used yet). |
rownames |
The name of the column with row names. If |
keep.key |
Do we keep the data.table key into a "key" attribute or do we restore |
byref |
If |
row.names |
Same as |
optional |
logical, If |
The coerced object. For as_dtx()
, the coercion is determined from getOption("SciViews.as_dtx")
which must return one of the three other as_dt...()
functions (as_dtt
by default). The default_dtx()
does the same as as_dtx()
if the object is a data.frame, a data.table, or a tibble, but it return the unmodified object for any other class (including subclassed data frames). This is a convenient function to force conversion only between those three objects classes.
Use as_matrix()
instead of base::as.matrix()
: it has different default
arguments to better account for rownames
in data.table and tibble!
# A data.frame dtf <- dtf( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE)) # Convert into a tibble (dtbl <- as_dtbl(dtf)) # Since row names are trivial (1 -> 5), a .rownames column is not added dtf2 <- dtf rownames(dtf2) <- letters[1:5] dtf2 # Now, the conversion into a tibble adds .rownames (dtbl2 <- as_dtbl(dtf2)) # and data frame row names are set again when converted bock to dtf as_dtf(dtbl2) # It also work for conversions data.frame <-> data.table (dtt2 <- as_dtt(dtf2)) as_dtf(dtt2) # It does not work when converting a tibble or a data.table into a matrix # with as.matrix() as.matrix(dtbl2) # ... but as_matrix() does the job! as_matrix(dtbl2) # The name for row in dtt and dtbl is in: # (data.frame's row names are converted into a column with this name) getOption("SciViews.dtx.rownames", default = ".rownames") # Convert into the preferred data frame object (data.table by default) (dtx2 <- as_dtx(dtf2)) class(dtx2) # The default data frame object used: getOption("SciViews.as_dtx", default = as_dtt) # default_dtx() does the same as as_dtx(), # but it also does not change other objects # So, it is safe to use whaterver the object you pass to it (dtx2 <- default_dtx(dtf2)) class(dtx2) # Any other object than data.frame, data.table or tbl_df is not converted res <- default_dtx(1:5) class(res) # No conversion if the data frame is subclassed dtf3 <- dtf2 class(dtf3) <- c("subclassed", "data.frame") class(default_dtx(dtf3)) # data.table keys are converted into a 'key' attribute and back library(data.table) setkey(dtt2, 'x') haskey(dtt2) key(dtt2) (dtf3 <- as_dtf(dtt2)) attributes(dtf3) # Key is restored when converted back into a data.table (also from a tibble) (dtt3 <- as_dtt(dtf3)) haskey(dtt3) key(dtt3) # Grouped tibbles are ungrouped with as_dtbl() or as_dtx()/default_dtx()! mtcars |> dplyr::group_by(cyl) -> mtcars_grouped class(mtcars_grouped) mtcars2 <- as_dtbl(mtcars_grouped) class(mtcars2)
# A data.frame dtf <- dtf( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE)) # Convert into a tibble (dtbl <- as_dtbl(dtf)) # Since row names are trivial (1 -> 5), a .rownames column is not added dtf2 <- dtf rownames(dtf2) <- letters[1:5] dtf2 # Now, the conversion into a tibble adds .rownames (dtbl2 <- as_dtbl(dtf2)) # and data frame row names are set again when converted bock to dtf as_dtf(dtbl2) # It also work for conversions data.frame <-> data.table (dtt2 <- as_dtt(dtf2)) as_dtf(dtt2) # It does not work when converting a tibble or a data.table into a matrix # with as.matrix() as.matrix(dtbl2) # ... but as_matrix() does the job! as_matrix(dtbl2) # The name for row in dtt and dtbl is in: # (data.frame's row names are converted into a column with this name) getOption("SciViews.dtx.rownames", default = ".rownames") # Convert into the preferred data frame object (data.table by default) (dtx2 <- as_dtx(dtf2)) class(dtx2) # The default data frame object used: getOption("SciViews.as_dtx", default = as_dtt) # default_dtx() does the same as as_dtx(), # but it also does not change other objects # So, it is safe to use whaterver the object you pass to it (dtx2 <- default_dtx(dtf2)) class(dtx2) # Any other object than data.frame, data.table or tbl_df is not converted res <- default_dtx(1:5) class(res) # No conversion if the data frame is subclassed dtf3 <- dtf2 class(dtf3) <- c("subclassed", "data.frame") class(default_dtx(dtf3)) # data.table keys are converted into a 'key' attribute and back library(data.table) setkey(dtt2, 'x') haskey(dtt2) key(dtt2) (dtf3 <- as_dtf(dtt2)) attributes(dtf3) # Key is restored when converted back into a data.table (also from a tibble) (dtt3 <- as_dtt(dtf3)) haskey(dtt3) key(dtt3) # Grouped tibbles are ungrouped with as_dtbl() or as_dtx()/default_dtx()! mtcars |> dplyr::group_by(cyl) -> mtcars_grouped class(mtcars_grouped) mtcars2 <- as_dtbl(mtcars_grouped) class(mtcars2)
When {dplyr} or {tidyr} verbs are applied to a data.table or a database connection, they do not output data frames but objects like dtplyr_step or tbl_sql that are called lazy data frames. The actual process is triggered by using as_dtx()
, or more explicitly with dplyr::collect()
which coerces the result to a tibble. If you want the default {svBase} data frame object instead, use collect_dtx()
, or if you want a specific object, use one of the other variants.
collect_dtx(x, ...) collect_dtf(x, ...) collect_dtt(x, ...) collect_dtbl(x, ...)
collect_dtx(x, ...) collect_dtf(x, ...) collect_dtt(x, ...) collect_dtbl(x, ...)
x |
A data.frame, data.table, tibble or a lazy data frame (dtplyr_step, tbl_sql...). |
... |
Arguments passed on to methods for |
A data frame (data.frame, data.table or tibble's tbl_df), the default version for collect_dtx()
.
# Assuming the default data frame for svBase is a data.table mtcars_dtt <- as_dtt(mtcars) library(dplyr) library(dtplyr) # A lazy data frame, not a "real" data frame! mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> class() # A data frame mtcars |> select(mpg:disp) |> class() # A data table mtcars_dtt |> select(mpg:disp) |> class() # A tibble, always! mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect() |> class() # The data frame object you want, default one specified for svBase mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect_dtx() |> class()
# Assuming the default data frame for svBase is a data.table mtcars_dtt <- as_dtt(mtcars) library(dplyr) library(dtplyr) # A lazy data frame, not a "real" data frame! mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> class() # A data frame mtcars |> select(mpg:disp) |> class() # A data table mtcars_dtt |> select(mpg:disp) |> class() # A tibble, always! mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect() |> class() # The data frame object you want, default one specified for svBase mtcars_dtt |> lazy_dt() |> select(mpg:disp) |> collect_dtx() |> class()
Create a data frame (base's data.frame, data.table or tibble's tbl_df)
dtx(..., .name_repair = c("check_unique", "unique", "universal", "minimal")) dtbl(..., .name_repair = c("check_unique", "unique", "universal", "minimal")) dtf(..., .name_repair = c("check_unique", "unique", "universal", "minimal")) dtt(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))
dtx(..., .name_repair = c("check_unique", "unique", "universal", "minimal")) dtbl(..., .name_repair = c("check_unique", "unique", "universal", "minimal")) dtf(..., .name_repair = c("check_unique", "unique", "universal", "minimal")) dtt(..., .name_repair = c("check_unique", "unique", "universal", "minimal"))
... |
A set of name-value pairs. The content of the data frame. See
|
.name_repair |
The way problematic column names are treated, see also
|
A data frame as a tbl_df object for dtbl()
, a data.frame for
dtf()
and a data.table for dtt()
.
data.table and tibble's tbl_df do no use row names. However, you can
add a column named .rownames
(by default), or the name that is in
getOption("SciViews.dtx.rownames")
and it will be automatically set as row
names when the object is converted into a data.frame with as_dtf()
. For
dtf()
, just create a column of this name and it is directly used as row
names for the resulting data.frame object.
dtbl1 <- dtbl( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE) ) class(dtbl1) dtf1 <- dtf( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE) ) class(dtf1) dtt1 <- dtt( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE)) class(dtt1) # Using dtx(), one construct the preferred data frame object # (a data.table by default, can be changed with options(SciViews.as_dtx = ...)) dtx1 <- dtx( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE)) class(dtx1) # data.table by default # With svBase data.table and data.frame objects have the same nice print as tibbles dtbl1 dtf1 dtt1 # Use tribble() inside dtx() to easily create a data frame: library(tibble) dtx2 <- dtx(tribble( ~x, ~y, ~f, 1, 3, 'a', 2, 4, 'b' )) dtx2 class(dtx2) # This is how you specify row names for dtf (data.frame) dtf(x = 1:3, y = 4:6, .rownames = letters[1:3])
dtbl1 <- dtbl( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE) ) class(dtbl1) dtf1 <- dtf( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE) ) class(dtf1) dtt1 <- dtt( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE)) class(dtt1) # Using dtx(), one construct the preferred data frame object # (a data.table by default, can be changed with options(SciViews.as_dtx = ...)) dtx1 <- dtx( x = 1:5, y = rnorm(5), f = letters[1:5], l = sample(c(TRUE, FALSE), 5, replace = TRUE)) class(dtx1) # data.table by default # With svBase data.table and data.frame objects have the same nice print as tibbles dtbl1 dtf1 dtt1 # Use tribble() inside dtx() to easily create a data frame: library(tibble) dtx2 <- dtx(tribble( ~x, ~y, ~f, 1, 3, 'a', 2, 4, 'b' )) dtx2 class(dtx2) # This is how you specify row names for dtf (data.frame) dtf(x = 1:3, y = 4:6, .rownames = letters[1:3])
The presentation of the data (see examples) is easier to read
than with the traditional column-wise entry in dtx()
. This could be used
to enter small tables in R, but do not abuse of it!
dtx_rows(...) dtf_rows(...) dtt_rows(...) dtbl_rows(...)
dtx_rows(...) dtf_rows(...) dtt_rows(...) dtbl_rows(...)
... |
Specify the structure of the data frame by using formulas for
variable names like |
A data frame of class data.frame for dtf_rows()
, data.table
for dtt_rows()
, tibble tbl_df for dtbl_rows()
and the default object
with dtx_rows()
.
df <- dtx_rows( ~x, ~y, ~group, 1, 3, "A", 6, 2, "A", 10, 4, "B" ) df
df <- dtx_rows( ~x, ~y, ~group, 1, 3, "A", 6, 2, "A", 10, 4, "B" ) df
The fast statistical function, or fast-flexible-friendly
statistical functions are prefixed with "f". These vectorized functions
supersede the no-f functions, bringing the capacity to work smoothly on
matrix-like and data frame objects. Most of them are defined in the
{collapse} package
For instance, base mean()
operates on a vector, but not on a data frame. A
matrix is recognized as a vector and a single mean is returned. On, the
contrary, fmean()
calculates one mean per column. It does the same for a
data frame, and it does so usually quicker than base functions. No need for
colMeans()
, a separate function to do so. Fast statistical functions also
recognize grouping with fgroup_by()
, sgroup_by()
or group_by()
and
calculate the mean by group in this case. Again, no need for a different
function like stats::ave()
.
Finally, these functions also have a TRA=
argument that computes, for
instance, if TRA = "-"
, (x f(x))
very efficiently (for instance to
calculate residuals by subtracting the mean).
Another particularity is the na.rm=
argument that is TRUE
by default,
while it is FALSE
by default for mean()
.
These are generic functions with methods for matrix, data.frame,
grouped_df and a default method used for simple numeric vectors. Most
of them are defined in the {collapse} package, but there are a couple more
here, together with an alternate syntax to replace TRA=
with %_f%
.
list_fstat_functions() fn(x, ...) fna(x, ...) x %replacef% expr x %replace_fillf% expr x %-f% expr x %+f% expr x %-+f% expr x %/f% expr x %/*100f% expr x %*f% expr x %modf% expr x %-modf% expr
list_fstat_functions() fn(x, ...) fna(x, ...) x %replacef% expr x %replace_fillf% expr x %-f% expr x %+f% expr x %-+f% expr x %/f% expr x %/*100f% expr x %*f% expr x %modf% expr x %-modf% expr
x |
A numeric vector, matrix, data frame or grouped data frame (class 'grouped_df'). |
... |
Further arguments passed to the method, like |
expr |
The expression to evaluate as RHS of the |
The number of all observations for fn()
or the number of
missing observations for fna()
. list_fstat_functions()
returns a list of
all the known fast statistical functions.
The page collapse::fast-statistical-functions gives more details.
fn()
count all observations, including NA
s, fna()
counts
only NA
s, where fnobs()
counts non-missing observations.
Instead of TRA=
one can use the %__f%
functions where __
is replace
,
replace_fill
, -
, +
, -+
, /
, /*100
for TRA="%"
, *
, mod
for
TRA="%%"
, or -mod
for TRA="-%%"
. See example.
library(collapse) data(iris) iris_num <- iris[, -5] # Only numerical variables mean(iris$Sepal.Length) # OK, but mean(iris_num does not work) colMeans(iris_num) # Same fmean(iris_num) # Idem, but mean by group for all 4 numerical variables iris |> fgroup_by(Species) |> fmean() # Residuals (x - mean(x)) by group iris |> fgroup_by(Species) |> fmean(TRA = "-") # The same calculation, in a little bit more expressive way iris |> fgroup_by(Species) %-f% fmean() # or: iris_num %-f% fmean(g = iris$Species)
library(collapse) data(iris) iris_num <- iris[, -5] # Only numerical variables mean(iris$Sepal.Length) # OK, but mean(iris_num does not work) colMeans(iris_num) # Same fmean(iris_num) # Idem, but mean by group for all 4 numerical variables iris |> fgroup_by(Species) |> fmean() # Residuals (x - mean(x)) by group iris |> fgroup_by(Species) |> fmean(TRA = "-") # The same calculation, in a little bit more expressive way iris |> fgroup_by(Species) %-f% fmean() # or: iris_num %-f% fmean(g = iris$Species)
Test if the object is a data frame (data.frame, data.table or tibble)
is_dtx(x, strict = TRUE) is_dtf(x, strict = TRUE) is_dtt(x, strict = TRUE) is_dtbl(x, strict = TRUE)
is_dtx(x, strict = TRUE) is_dtf(x, strict = TRUE) is_dtt(x, strict = TRUE) is_dtbl(x, strict = TRUE)
x |
An object |
strict |
Should this be strictly the corresponding class |
These functions return TRUE
if the object is of the correct class, otherwise they return FALSE
. is_dtx()
return TRUE
if x
is one of a data.frame, data.table or tibble.
# data(mtcars) is_dtf(mtcars) # TRUE is_dtx(mtcars) # Also TRUE is_dtt(mtcars) # FALSE is_dtbl(mtcars) # FALSE # but... is_dtt(as_dtt(mtcars)) # TRUE is_dtx(as_dtt(mtcars)) # TRUE is_dtbl(as_dtbl(mtcars)) # TRUE is_dtx(as_dtbl(mtcars)) # TRUE is_dtx(as_dtbl(mtcars) |> dplyr::group_by(cyl)) # TRUE (special case) is_dtx("some string") # FALSE
# data(mtcars) is_dtf(mtcars) # TRUE is_dtx(mtcars) # Also TRUE is_dtt(mtcars) # FALSE is_dtbl(mtcars) # FALSE # but... is_dtt(as_dtt(mtcars)) # TRUE is_dtx(as_dtt(mtcars)) # TRUE is_dtbl(as_dtbl(mtcars)) # TRUE is_dtx(as_dtbl(mtcars)) # TRUE is_dtx(as_dtbl(mtcars) |> dplyr::group_by(cyl)) # TRUE (special case) is_dtx("some string") # FALSE
The Tidyverse defines a coherent set of tools to manipulate
data frames that use a non-standard evaluation and sometimes require extra
care. These functions, like mutate()
or summarise()
are defined in the
{dplyr} and {tidyr} packages. The {collapse} package proposes a couple
of functions with similar interface, but with different and much faster code.
For instance, fselect()
is similar to select()
, or fsummarise()
is
similar to summarise()
. Not all functions are implemented, arguments and
argument names differ, and the behavior may be very different, like
frename()
which uses old_name = new_name
, while rename()
uses
new_name = old_name
! The speedy functions all are prefixed with an "s",
like smutate()
, and build on the work initiated in {collapse} to propose
a series of paired functions with the tidy ones. So, smutate()
and
mutate()
are "speedy" and 'tidy" counterparts and they are used in a very
similar, if not identical way. This notation using a "s" prefix is there to
draw the attention on their particularities. Their classes are function
and speedy_fn. Avoid mixing tidy, speedy and non-tidy/speedy functions in
the same pipeline.
This is a global page to present all the speedy functions in one place.
It is not meant to be a clear and detailed help page of all individual "s"
functions. Please, refer to the corresponding help page of the non-"s" paired
function for more details! You can use the {svMisc}'s .?smutate
syntax to
go to the help page of the non-"s" function with a message.
list_speedy_functions() sgroup_by(.data, ...) sungroup(.data, ...) srename(.data, ...) srename_with(.data, .fn, .cols = everything(), ...) sfilter(.data, ...) sfilter_ungroup(.data, ...) sselect(.data, ...) smutate(.data, ..., .keep = "all") smutate_ungroup(.data, ..., .keep = "all") stransmute(.data, ...) stransmute_ungroup(.data, ...) ssummarise(.data, ...) sfull_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sleft_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sright_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sinner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sbind_rows(..., .id = NULL) scount( x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = dplyr::group_by_drop_default(x), sort_cat = TRUE, decreasing = FALSE ) stally( x, wt = NULL, sort = FALSE, name = NULL, sort_cat = TRUE, decreasing = FALSE ) sadd_count( x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = NULL, sort_cat = TRUE, decreasing = FALSE ) sadd_tally( x, wt = NULL, sort = FALSE, name = NULL, sort_cat = TRUE, decreasing = FALSE ) sbind_cols( ..., .name_repair = c("unique", "universal", "check_unique", "minimal") ) sarrange(.data, ..., .by_group = FALSE) spull(.data, var = -1, name = NULL, ...) sdistinct(.data, ..., .keep_all = FALSE) sdrop_na(data, ...) sreplace_na(data, replace, ...) spivot_longer(data, cols, names_to = "name", values_to = "value", ...) spivot_wider(data, names_from = name, values_from = value, ...) suncount(data, weights, .remove = TRUE, .id = NULL) sunite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE) sseparate( data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, ... ) sseparate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE) sfill(data, ..., .direction = c("down", "up", "downup", "updown")) sextract( data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE, ... )
list_speedy_functions() sgroup_by(.data, ...) sungroup(.data, ...) srename(.data, ...) srename_with(.data, .fn, .cols = everything(), ...) sfilter(.data, ...) sfilter_ungroup(.data, ...) sselect(.data, ...) smutate(.data, ..., .keep = "all") smutate_ungroup(.data, ..., .keep = "all") stransmute(.data, ...) stransmute_ungroup(.data, ...) ssummarise(.data, ...) sfull_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sleft_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sright_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sinner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...) sbind_rows(..., .id = NULL) scount( x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = dplyr::group_by_drop_default(x), sort_cat = TRUE, decreasing = FALSE ) stally( x, wt = NULL, sort = FALSE, name = NULL, sort_cat = TRUE, decreasing = FALSE ) sadd_count( x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = NULL, sort_cat = TRUE, decreasing = FALSE ) sadd_tally( x, wt = NULL, sort = FALSE, name = NULL, sort_cat = TRUE, decreasing = FALSE ) sbind_cols( ..., .name_repair = c("unique", "universal", "check_unique", "minimal") ) sarrange(.data, ..., .by_group = FALSE) spull(.data, var = -1, name = NULL, ...) sdistinct(.data, ..., .keep_all = FALSE) sdrop_na(data, ...) sreplace_na(data, replace, ...) spivot_longer(data, cols, names_to = "name", values_to = "value", ...) spivot_wider(data, names_from = name, values_from = value, ...) suncount(data, weights, .remove = TRUE, .id = NULL) sunite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE) sseparate( data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, ... ) sseparate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE) sfill(data, ..., .direction = c("down", "up", "downup", "updown")) sextract( data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE, ... )
.data |
A data frame (data.frame, data.table or tibble's tbl_df) |
... |
Arguments dependent to the context of the function and most of the time, not evaluated in a standard way (cf. the tidyverse approach). |
.fn |
A function to use. |
.cols |
The list of the column where to apply the transformation. For
the moment, only all existing columns, which means |
.keep |
Which columns to keep. The default is |
x |
A data frame (data.frame, data.table or tibble's tbl_df). |
y |
A second data frame. |
by |
A list of names of the columns to use for joining the two data frames. |
suffix |
The suffix to the column names to use to differentiate the
columns that come from the first or the second data frame. By default it is
|
copy |
This argument is there for compatibility with the "t" matching functions, but it is not used here. |
.id |
The name of the column for the origin id, either names if all other arguments are named, or numbers. |
wt |
Frequency weights. Can be |
sort |
If |
name |
The name of the new column in the output ( |
.drop |
Are levels with no observations dropped ( |
sort_cat |
Are levels sorted ( |
decreasing |
Is sorting done in decreasing order ( |
.name_repair |
How should the name be "repaired" to avoid duplicate
column names? See |
.by_group |
Logical. If |
var |
A variable specified as a name, a positive or a negative integer
(counting from the end). The default is |
.keep_all |
If |
data |
A data frame, or for |
replace |
If |
cols |
A selection of the columns using tidy-select syntax, see |
names_to |
A character vector with the name or names of the columns for the names. |
values_to |
A string with the name of the column that receives the values. |
names_from |
The column or columns containing the names (use tidy selection and do not quote the names). |
values_from |
Idem for the column or columns that contain the values. |
weights |
A vector of weight to use to "uncount" |
.remove |
If |
col |
The name quoted or not of the new column with united variable. |
sep |
Separator to use between values for united or separated columns. |
remove |
If |
na.rm |
If |
into |
Name of the new column to put separated variables. Use |
convert |
If |
.direction |
Direction in which to fill missing data: |
regex |
A regular expression used to extract the desired values (use one
group with |
See corresponding "non-s" function for the full help page with indication of the return values.
The ssummarise()
function does not support n()
as does
dplyr::summarise()
. You can use fn()
instead, but then, you must give a
variable name as argument. The fn()
alternative can also be used in
summarise()
for homogeneous syntax between the two.
From {dplyr}, the slice()
and slice_xxx()
functions are not added yet
because they are not available for {dbplyr}. Also anti_join()
,
semi_join()
and nest_join()
are not implemented yet.
From {tidyr} expand()
, chop()
, unchop()
, nest()
, unnest()
,
unnest_longer()
, unnest_wider()
, hoist()
, pack()
and unpack()
are
not implemented yet.
# TODO...
# TODO...
The Tidyverse defines a coherent set of tools to manipulate
data frames that use a non-standard evaluation and sometimes require extra
care. These functions, like mutate()
or summarise()
are defined in the
{dplyr} and {tidyr} packages. When using variants, like {dtplyr} for
data.frame objects, or {dbplyr} to work with external databases,
successive commands in a pipeline are pooled together but not computed. One
has to collect()
the result to get its final form. Most of the tidy
functions that have their "speedy" counterpart prefixed with "s" are listed
withlist_tidy_functions()
. Their main usages are (excluding less used
arguments, or those that are not compatibles with the speedy "s" counterpart
functions):
group_by(.data, ...)
ungroup(.data)
rename(.data, ...)
rename_with(.data, .fn, .cols = everything(), ...)
filter(.data, ...)
select(.data, ...)
mutate(.data, ..., .keep = "all")
transmute(.data, ...)
summarise(.data, ...)
full_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
left_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
right_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
inner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
bind_rows(..., .id = NULL)
bind_cols(..., .name_repair = c("unique", "universal", "check_unique", "minimal"))
arrange(.data, ..., .by_group = FALSE)
count(x, ..., wt = NULL, sort = FALSE, name = NULL)
tally(x, wt = NULL, sort = FALSE, name = NULL)
add_count(x, ..., wt = NULL, sort = FALSE, name = NULL)
add_tally(x, wt = NULL, sort = FALSE, name = NULL)
pull(.data, var = -1, name = NULL)
distinct(.data, ..., .keep_all = FALSE)
drop_na(data, ...)
replace_na(data, replace)
pivot_longer(data, cols, names_to = "name", values_to = "value")
pivot_wider(data, names_from = name, values_from = value)
uncount(data, weights, .remove = TRUE, .id = NULL)
unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE)
separate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE)
fill(data, ..., .direction = c("down", "up", "downup", "updown"))
extract(data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE)
plus the functions defined here under.
list_tidy_functions() filter_ungroup(.data, ...) mutate_ungroup(.data, ..., .keep = "all") transmute_ungroup(.data, ...)
list_tidy_functions() filter_ungroup(.data, ...) mutate_ungroup(.data, ..., .keep = "all") transmute_ungroup(.data, ...)
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy
data frame (e.g. from dbplyr or dtplyr). See |
... |
Arguments dependent to the context of the function and most of the time, not evaluated in a standard way (cf. the tidyverse approach). |
.keep |
Which columns to keep. The default is |
See corresponding "non-t" function for the full help page with
indication of the return values. list_tidy_functions()
returns a list of
all the tidy(verse) functions that have their speedy "s" counterpart, see
speedy_functions.
The help page here is very basic and it aims mainly to list all the
tidy functions. For more complete help, see the {dplyr} or {tidyr}
packages. From {dplyr}, the slice()
and slice_xxx()
functions are not
added yet because they are not available for {dbplyr}. Also anti_join()
,
semi_join()
and nest_join()
are not implemented yet.
From {dplyr}, the slice()
and slice_xxx()
functions are not added yet
because they are not available for {dbplyr}. Also anti_join()
,
semi_join()
and nest_join()
are not implemented yet.
From {tidyr} expand()
, chop()
, unchop()
, nest()
, unnest()
,
unnest_longer()
, unnest_wider()
, hoist()
, pack()
and unpack()
are
not implemented yet.
collapse::num_vars()
to easily keep only numeric columns from a
data frame, collapse::fscale()
for scaling and centering matrix-like objects and data frames.
# TODO...
# TODO...