This document still need substantial editing! It is left here because it may still be useful in its current state.
Here is a pipeline:
library(dplyr)
library(svFlow)
threshold <- 1.5
iris %>.%
filter(., Petal.Length > threshold) %>.%
mutate(., log_var = log(Petal.Length)) %>.%
head(.) %>.% .
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species log_var
#> 1 5.4 3.9 1.7 0.4 setosa 0.5306283
#> 2 4.8 3.4 1.6 0.2 setosa 0.4700036
#> 3 5.7 3.8 1.7 0.3 setosa 0.5306283
#> 4 5.4 3.4 1.7 0.2 setosa 0.5306283
#> 5 5.1 3.3 1.7 0.5 setosa 0.5306283
#> 6 4.8 3.4 1.9 0.2 setosa 0.6418539
Use of flow()
to add local variables inside the
pipeline, and to have convenient and transparent resolution of the
lazyeval mechanism:
flow(iris, var_ = Petal.Length, threshold = 1.5) %>_%
filter(., var_ > threshold_) %>_%
{..$tab <- mutate(., log_var = log(var_))} %>_%
head(.) %>_% .
#> Warning: Assigning non-quosure objects to quosure lists is deprecated as of rlang 0.3.0.
#> Please coerce to a bare list beforehand with `as.list()`
#> This warning is displayed once every 8 hours.
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species log_var
#> 1 5.4 3.9 1.7 0.4 setosa 0.5306283
#> 2 4.8 3.4 1.6 0.2 setosa 0.4700036
#> 3 5.7 3.8 1.7 0.3 setosa 0.5306283
#> 4 5.4 3.4 1.7 0.2 setosa 0.5306283
#> 5 5.1 3.3 1.7 0.5 setosa 0.5306283
#> 6 4.8 3.4 1.9 0.2 setosa 0.6418539
Convert this into a reusable function by replacing
flow()
by function()
and starting the pipeline
with enflow()
:
my_process <- function(data, var_ = Petal.Length, threshold = 1.5)
enflow(data) %>_%
filter(., var_ > threshold_) %>_%
{..$tab <- mutate(., log_var = log(var_))} %>_%
head(.) %>_% .
Then, you use it just as a plain function. The arguments ending with
_
are also a good way to immediately spot those who are
treated specially by the tidyeval mechanism! Here, we redo the
analysis:
my_process(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species log_var
#> 1 5.4 3.9 1.7 0.4 setosa 0.5306283
#> 2 4.8 3.4 1.6 0.2 setosa 0.4700036
#> 3 5.7 3.8 1.7 0.3 setosa 0.5306283
#> 4 5.4 3.4 1.7 0.2 setosa 0.5306283
#> 5 5.1 3.3 1.7 0.5 setosa 0.5306283
#> 6 4.8 3.4 1.9 0.2 setosa 0.6418539
Here, we change the variable and the threshold:
my_process(iris, var_ = Sepal.Width, threshold = 0.5)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species log_var
#> 1 5.1 3.5 1.4 0.2 setosa 1.252763
#> 2 4.9 3.0 1.4 0.2 setosa 1.098612
#> 3 4.7 3.2 1.3 0.2 setosa 1.163151
#> 4 4.6 3.1 1.5 0.2 setosa 1.131402
#> 5 5.0 3.6 1.4 0.2 setosa 1.280934
#> 6 5.4 3.9 1.7 0.4 setosa 1.360977
The Flow objects can be subclassed. This is a little
bit similar to branches of git repositories (although with automatic
updates from master branch). It could be nice to keep this comparison as
close as possible and to make both approaches conceptually similar, so
that one can work similarly with flow()
and with git?! We
need tools to create, delete, switch to, merge (into master only?), and
rebase + diff.
The biggest difference is that branches in git terms do not
dynamically inherit objects from parents but
proto/Flow objects do (in this case,
main branch is indeed the ancestor). So, we could
create a function branch()
that does something like
this:
TODO: in an older version of the
proto package, there was a nice
graph.proto()
function, and I made my own one with other
dependencies => reimplement it for {svFlow} in order to show the
workflow in a similar way a git repository is often depicted.