Title: | Data Analysis Work Flow and Pipeline Operator for 'SciViews::R' |
---|---|
Description: | Data work flow analysis using 'proto' objects and pipe operator that integrates non-standard evaluation and the 'lazyeval' mechanism. |
Authors: | Philippe Grosjean [aut, cre] , Louis Kates [ctb] (Author of 'proto' package, whose 'svFlow' is derived), Thomas Petzoldt [ctb] (Author of 'proto' package, whose 'svFlow' is derived) |
Maintainer: | Philippe Grosjean <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.1 |
Built: | 2024-11-14 06:00:52 UTC |
Source: | https://github.com/SciViews/svFlow |
Data (work)flow analysis using proto objects (see proto()
) and a pipe
operator that integrates non-standard evaluation and the tidyeval mechanism
in a most transparent way.
%>.%
and %>_%
are two alternate pipe
operators designed to supplement magrittr's \
tidyverse and elsewhere. They are provided for good reasons.
%>.%
requires explicit indication of the position of .
in the pipeline expression all the time. The expression is not
modified. As a consequence, it can never surprise you with an unexpected
behavior, and all valid R expressions are usable in the pipeline. Another
consequence: it is very fast. %>_%
works with Flow
objects that allow for encapsulation of satellite objects (data or
functions) within the pipeline. It is self-contained. The pipeline can be
interrupted and restarted at any time. It also allows for a class-less
object-oriented approach with single inheritance (could be useful to test
easily different scenarios on the same pipeline and to prototype objects
that are "pipe-aware"). It also manages the tidyeval mechanism for
non-standard expressions in the most transparent way: the only "rule" to
remember is to suffix the name of variables that needs special treatment
with an underscore (_
) and the pipe operator manages the rest for you.
debug_flow()
provides a convenient way to debug problematic pipelines
build with our own pipe operators %>.%
and
%>_%
in a comfortable way. Everything from the step that
raised a error is available: the piped data, the expression to be
evaluated, and possibly, the last state of the Flow object. Everything
can be inspected, modified, and the expression can be rerun as if you were
still right in the middle of the pipeline evaluation.
flow()
constructs a Flow
object that is pipe-aware and tidyeval-aware.
This opens new horizons in your analysis workflow. You start building a
simple ad hoc pipeline, then you can include satellite data or functions
right inside it, perhaps also test different scenarios by using the object
inheritance features of Flow (common parts are shared among the
different scenarios, thus reducing the memory footprint). While your
pipeline matures you gradually and naturally move towards either a
functional sequence or a dedicated object. The functional sequence pathway
consists in building a reusable function to recycle you pipeline in a
different context. The object pathway is not fully developed yet in the
present version. But in the future, the object-oriented nature of Flow
will also be leveraged, so that you could automatically translate your "flow
pipeline" into an S3 or R6 object with satellite data becoming object
attributes, and satellite functions becoming methods. The pipeline itself
would then become the default method for that object. Of course, both
functions and objects derived from a "flow pipeline" will be directly
compatible with the tidyeval mechanism, as they will be most
tidyverse-friendly as possible per construction.
str.Flow()
compactly displays the content of a Flow object.
as.quosure()
, and unary +
and -
operators combined with formula
objects provide an alternate way to create quosures.
quos_underscore()
automatically converts arguments whose name ends with
_
into quosures, and this mechanism is used by our flow pipe
operator to implement the tidyeval mechanism most transparently inside
"flow pipelines".
Pass first argument as dot to run code in second argument for pipe operators that do not natively support dot-replacement scheme (base R pipe operator)
._(x, expr)
._(x, expr)
x |
Object to pass to |
expr |
Expression to execute, containing |
The function has a side-effect to assign x
as .
and unevaluated expr
as .call
in the calling environment. Therefore, make sure you do not use .
or .call
there for something else. In case expr
fails in the middle of a series of chained pipes, you can inspect .
and .call
or possibly rerun a modified version of the instruction that failed on it for easier debugging purpose.
The result from executing expr
in the parent environment.
# The function is really supposed to be use in a pipe instruction # This example only runs on R >= 4.1 ## Not run: # lm has data = as second argument, which does not fit well with the pipe |> # In R 4.1, one should write: iris |> \(.)(lm(data = ., Sepal.Length ~ Petal.Length + Species))() # which is not very elegant ! With ._() it is more concise and straighforward iris |> ._(lm(data = ., Sepal.Length ~ Petal.Length + Species)) ## End(Not run)
# The function is really supposed to be use in a pipe instruction # This example only runs on R >= 4.1 ## Not run: # lm has data = as second argument, which does not fit well with the pipe |> # In R 4.1, one should write: iris |> \(.)(lm(data = ., Sepal.Length ~ Petal.Length + Species))() # which is not very elegant ! With ._() it is more concise and straighforward iris |> ._(lm(data = ., Sepal.Length ~ Petal.Length + Species)) ## End(Not run)
Flow objects, as explicitly created by flow()
, or implicitly by the
%>_%
pipe operator are proto objects (class-less objects
with possible inheritance) that can be combined nicely with pipelines using
the specialized flow pipe operator %>_%
(or by using $
).
They allow for encapsulating satellite objects/variables related to the
pipeline, and they deal with non-standard evaluations using the tidyeval
mechanism automatically with minimal changes required by the user.
flow(. = NULL, .value = NULL, ...) enflow(.value, env = caller_env(), objects = ls(env)) is.flow(x) is_flow(x) as.flow(x, ...) as_flow(x, ...) ## S3 method for class 'Flow' x$name ## S3 replacement method for class 'Flow' x$name <- value
flow(. = NULL, .value = NULL, ...) enflow(.value, env = caller_env(), objects = ls(env)) is.flow(x) is_flow(x) as.flow(x, ...) as_flow(x, ...) ## S3 method for class 'Flow' x$name ## S3 replacement method for class 'Flow' x$name <- value
. |
If a Flow object is provided, inherit from it, otherwise,
create a new Flow object inheriting from |
.value |
The pipe value to pass to the object (used instead of |
... |
For |
env |
The environment to use for populating the Flow object. All
objects from this environment are injected into it, with the objects not
starting with a dot and ending with an underscore ( |
objects |
A character string with the name of the objects from |
x |
An object (a Flow object, or anything to test if it is a
Flow object in |
name |
The name of the item to get from a Flow object. If |
value |
The value or expression to assign to |
enflow()
creates a Flow object in the head of a "flow pipeline" in the
context of a functional sequence, that is a function that converts an
ad hoc, single use pipeline into a function reusable in a different
context. Satellite data become arguments of the function.
When a Flow object is created from scratch, it always inherits
from .GlobalEnv
, no matter where the expression was executed (in fact, it
inherits from an empty root Flow object itself inheriting from
.GlobalEnv
). This is a deliberate design choice to overcome some
difficulties and limitations of proto objects, see proto()
.
enflow()
creates a Flow object and populates it automatically with all
the objects that are present in env=
(by default, the calling environment).
It is primarily intended to be used inside a function, as first instruction
of a "flow pipeline". Hence, it collects all function arguments inside that
pipeline in a most convenient way.
str.Flow, quos_underscore, %>_%
library(svFlow) library(dplyr) data(iris) foo <- function(data, x_ = Sepal.Length, y_ = log_SL, fun_ = mean, na_rm = TRUE) enflow(data) %>_% mutate(., y_ = log(x_)) %>_% summarise(., fun_ = fun_(y_, na.rm = na_rm_)) %>_% . foo(iris) foo(iris, x_ = Petal.Width) foo(iris, x_ = Petal.Width, fun_ = median) # Unfortunately, this does not work, due to limitations of tidyeval's := #foo(iris, x_ = Petal.Width, fun_ = stats::median) foo2 <- function(., x_ = Sepal.Length, y_ = log_SL, na_rm = TRUE) enflow(.) foo2 foo2(1:10) -> foo_obj ls(foo_obj)
library(svFlow) library(dplyr) data(iris) foo <- function(data, x_ = Sepal.Length, y_ = log_SL, fun_ = mean, na_rm = TRUE) enflow(data) %>_% mutate(., y_ = log(x_)) %>_% summarise(., fun_ = fun_(y_, na.rm = na_rm_)) %>_% . foo(iris) foo(iris, x_ = Petal.Width) foo(iris, x_ = Petal.Width, fun_ = median) # Unfortunately, this does not work, due to limitations of tidyeval's := #foo(iris, x_ = Petal.Width, fun_ = stats::median) foo2 <- function(., x_ = Sepal.Length, y_ = log_SL, na_rm = TRUE) enflow(.) foo2 foo2(1:10) -> foo_obj ls(foo_obj)
A graph showing all Flow objects heritage is calculated, and displayed.
graph_flow(env = .GlobalEnv, child_to_parent = TRUE, plotit = TRUE, ...)
graph_flow(env = .GlobalEnv, child_to_parent = TRUE, plotit = TRUE, ...)
env |
The environment to look for Flow objects. By default, it is
|
child_to_parent |
Do the arrows go from child to parent (by default), or in the other direction? |
plotit |
Do we plot the graph (by default)? |
... |
Further parameters passed to |
An igraph object (returned invisibly if plotit = TRUE
.
a <- flow() b <- a$flow() c <- b$flow() d <- a$flow() # Use of custom names e <- flow(.name = "parent") f <- e$flow(.name = "child") graph_flow() # Arrows pointing from childs to parents, and do not plot it g <- graph_flow(child_to_parent = FALSE, plotit = FALSE) g plot(g)
a <- flow() b <- a$flow() c <- b$flow() d <- a$flow() # Use of custom names e <- flow(.name = "parent") f <- e$flow(.name = "child") graph_flow() # Arrows pointing from childs to parents, and do not plot it g <- graph_flow(child_to_parent = FALSE, plotit = FALSE) g plot(g)
Pipe operators. %>.%
is a very simple and efficient pipe
operator. %>_%
is more complex. It forces conversion to
a Flow object inside a pipeline and automatically manage non-standard
evaluation through creation and unquoting of quosures for named arguments
whose name ends with _
.
x %>.% expr x %>_% expr debug_flow()
x %>.% expr x %>_% expr debug_flow()
x |
Value or Flow object to pass to the pipeline. |
expr |
Expression to evaluation in the pipeline. |
With %>.%
, the value must be explicitly indicated with a
.
inside the expression. The expression is not modified, but the value
is first assigned into the calling environment as .
(warning! possibly
replacing any existing value... do not use .
to name other objects).
Also the expression is saved as .call
in the calling environment so that
debug_flow()
can retrieve are rerun it easily. If a Flow object is used
with %>.%
, the .value
is extracted from it into .
first (and
thus the Flow object is lost).
In the case of %>_%
the Flow object is passed or created, it is
also assigned in the calling environment as ..
. This can be used to refer
to Flow object content within the pipeline expressions (e.g., ..$var
).
For %>_%
, the expression is reworked in such a way that a suitable
lazyeval syntax is constructed for each variable whose name ends with _
,
and that variable is explicitly searched starting from ..
. Thus, x_
is
replaced by !!..$x
. For such variables appearing at left of an =
sign, it
is also replaced by :=
to keep correct R syntax (var_ =
=>
!!..$var :=
). This way, you just need to follow special variables by _
,
both in the flow()
function arguments (to create quosures), and to the
NSE expressions used inside the pipeline to get the job done! The raw
expression is saved as .call_raw
, while the reworked call is saved as
.call
for possible further inspection and debugging.
Finally, for %>_%
, if expr
is .
, then, the last value from the
pipe is extracted from the Flow object and returned. It is equivalent,
thus, to flow_obj$.value
.
You can mix %>.%
and %>_%
within the same pipeline. In case
you use %>.%
with a flow pipeline, it "unflows" it, extracting
.value
from the Flow object and further feeding it to the pipeline.
# A simple pipeline with %>.% (explicit position of '.' required) library(svFlow) library(dplyr) data(iris) iris2 <- iris %>.% mutate(., log_SL = log(Sepal.Length)) %>.% filter(., Species == "setosa") # The %>.% operator is much faster than magrittr's %>% # (although this has no noticeable impact in most situations when the # pipeline in used in an ad hoc way, outside of loops or other constructs # that call it a larger number of times)
# A simple pipeline with %>.% (explicit position of '.' required) library(svFlow) library(dplyr) data(iris) iris2 <- iris %>.% mutate(., log_SL = log(Sepal.Length)) %>.% filter(., Species == "setosa") # The %>.% operator is much faster than magrittr's %>% # (although this has no noticeable impact in most situations when the # pipeline in used in an ad hoc way, outside of loops or other constructs # that call it a larger number of times)
The expressions provided for all arguments whose names end with _
are
automatically converted into quosures, and also assigned to a name
without the training _
. The other arguments are evaluated in an usual way.
quos_underscore(...)
quos_underscore(...)
... |
The named arguments provided to be either converted into quosures or evaluated. |
An object of class quosures is returned. It can be used directly in tidyeval-aware contexts.
foo <- function(...) quos_underscore(...) foo(x = 1:10, # "Normal" argument y_ = 1:10, # Transformed into a quosure z_ = non_existing_name) # Expressions in quosures are not evaluated
foo <- function(...) quos_underscore(...) foo(x = 1:10, # "Normal" argument y_ = 1:10, # Transformed into a quosure z_ = non_existing_name) # Expressions in quosures are not evaluated
Quosures are defined in {rlang} package as part of the tidy
evaluation of non-standard expressions (see quo()
). Here, we provide an
alternate mechanism using -~expr
as a synonym of quo(expr)
. Also,
+quo_obj
is equivalent to !!quo_obj
in {rlang}, and ++quo_obj both
unquotes and evaluates it in the right environment. Quosures are keystone
objects in the tidy evaluation mechanism. So, they deserve a special, clean
and concise syntax to create and manipulate them.
The as_xxx()
and is_xxx()
further ease the manipulation of quosures
or related objects.
## S3 method for class 'formula' e1 - e2 ## S3 method for class 'formula' e1 + e2 ## S3 method for class 'quosure' e1 ^ e2 ## S3 method for class 'quosure' e1 + e2 ## S3 method for class 'unquoted' e1 + e2 ## S3 method for class 'unquoted' print(x, ...) as.quosure(x, env = caller_env()) is.quosure(x) is.formula(x) is.bare_formula(x) `!!`(x)
## S3 method for class 'formula' e1 - e2 ## S3 method for class 'formula' e1 + e2 ## S3 method for class 'quosure' e1 ^ e2 ## S3 method for class 'quosure' e1 + e2 ## S3 method for class 'unquoted' e1 + e2 ## S3 method for class 'unquoted' print(x, ...) as.quosure(x, env = caller_env()) is.quosure(x) is.formula(x) is.bare_formula(x) `!!`(x)
e1 |
Unary operator member, or first member of a binary operator. |
e2 |
Second member of a binary operator (not used here, except for |
x |
An expression |
... |
Further arguments passed to the |
env |
An environment specified for scoping of the quosure. |
-
is defined as an unary minus operator for formula objects
(which is not defined in base R, hence, not supposed to be used otherwise).
Thus, -~expr
just converts a formula build using the base ~expr
instruction into a quosure. as.quosure()
does the same, when the
expression is provided directly, and allows also to define the enclosing
environment (by default, it is the environment where the code is evaluated,
and it is also the case when using -~expr
).
Similarly, the unary +
operator is defined for quosure in order to
easily "reverse" the mechanism of quoting an expression with a logical
complementary operator. It does something similar to !!
in {rlang}, but
it can be used outside of tidy eval expressions. Since unary +
has higher
syntax precedence than !
in R, it is less susceptible to require
parentheses (only ^
for exponentiation, indexing/subsetting operators like
$
or [
, and namespace operators ::
and :::
have higher precedence). A
specific ^
operator for quosures solves the precedence issue. ::
or :::
are very unlikely used in the context.
++quosure
is indeed a two-steps operation (+(+quosure)
). It first
unquotes the quosure, returning an unquoted object. Then, the second +
evaluates the unquoted object. This allows for fine-graded manipulation
of quosures: you can unquote at one place, and evaluate the unquoted
object elsewhere (and, of course, the contained expression is always
evaluated in the right environment, despite all these manipulations).
!!
and just evaluates its argument and passes the result. It is only useful
inside a quasi-quoted argument, see quasiquotation
.
These functions build or manipulated quosures and return such
objects. +quosure
creates an unquoted object. The +
unary operator
applied to unquoted objects evaluate the expression contained in the
quosure in the right environment.
x <- 1:10 # Create a quosure (same as quo(x)) x_quo <- -~x x_quo # Unquote it (same as !!x, but usable everywhere) +x_quo # Unquote and evaluate the quosure ++x_quo # Syntax precedence issues (^ has higher precedence than unary +) # is solved by redefining ^ for unquoted objects: ++x_quo^2 # acts like if ++ had higher precedence than ^, thus like if it was (++x_quo)^2 # Assign the unquoted expression x_unquo <- +x_quo # ... and use x_unquo in a different context foo <- function(x) +x foo(x_unquo)
x <- 1:10 # Create a quosure (same as quo(x)) x_quo <- -~x x_quo # Unquote it (same as !!x, but usable everywhere) +x_quo # Unquote and evaluate the quosure ++x_quo # Syntax precedence issues (^ has higher precedence than unary +) # is solved by redefining ^ for unquoted objects: ++x_quo^2 # acts like if ++ had higher precedence than ^, thus like if it was (++x_quo)^2 # Assign the unquoted expression x_unquo <- +x_quo # ... and use x_unquo in a different context foo <- function(x) +x foo(x_unquo)
Print short informative strings about the Flow object and all it contains, plus possibly, inheritance information.
## S3 method for class 'Flow' str( object, max.level = 1L, nest.lev = 0L, indent.str = paste(rep.int(" ", max(0L, nest.lev + 1L)), collapse = ".."), ... )
## S3 method for class 'Flow' str( object, max.level = 1L, nest.lev = 0L, indent.str = paste(rep.int(" ", max(0L, nest.lev + 1L)), collapse = ".."), ... )
object |
A Flow object. |
max.level |
The maximum nesting level to use for displaying nested structures. |
nest.lev |
Used internally for pretty printing nested objects (you probably don't want to change default value). |
indent.str |
Idem. |
... |
Further arguments passed to |
# A Flow object data(iris) fl <- flow(iris, x = 1:10, var_ = Sepal.Length) fl # Shows the .value contained into fl str(fl) # Provides compact information about satellite data contained in fl
# A Flow object data(iris) fl <- flow(iris, x = 1:10, var_ = Sepal.Length) fl # Shows the .value contained into fl str(fl) # Provides compact information about satellite data contained in fl