Here we introduce the Bioconductor toolchain for usage and development of reproducible bioinformatics pipelines using packages of Rcwl and RcwlPipelines.
Rcwl provides a simple way to wrap command line tools and build CWL
data analysis pipelines programmatically within R. It increases the
ease of use, development, and maintenance of CWL
pipelines. RcwlPipelines manages a collection of more than a hundred
of pre-built and tested CWL tools and pipelines, which are highly
modularized with easy customization to meet different bioinformatics
data analysis needs.
In this vignette, we will introduce how to build and run CWL pipelines
within R/Bioconductor using Rcwlpackage. More details about CWL
can be found at https://www.commonwl.org.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Rcwl")
The development version with most up-to-date functionalities is also available from GitHub.
BiocManager::install("rworkflow/Rcwl")
library(Rcwl)
cwlProcess is the main constructor function to wrap a command line
tool into an R tool as a cwlProcess object (S4 class). Let’s
start with a simple example to wrap the echo command and execute
echo hello world in R.
First, we need to define the input parameter for the base command
echo, here it is a string without a prefix. An id argument is
required here.
input1 <- InputParam(id = "sth")
Second, we can construct a cwlProcess object by specifying the
baseCommand for the command line tool, and InputParamList for the
input parameters.
echo <- cwlProcess(baseCommand = "echo", inputs = InputParamList(input1))
Now we have converted the command line tool echo into an R tool:
an R object of class cwlProcess with the name of echo. We can
take a look at the this R object and use some utility functions to
extract specific information.
echo
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: echo
## inputs:
## sth (string):
## outputs:
## output:
## type: stdout
class(echo)
## [1] "cwlProcess"
## attr(,"package")
## [1] "Rcwl"
cwlClass(echo)
## [1] "CommandLineTool"
cwlVersion(echo)
## [1] "v1.0"
baseCommand(echo)
## [1] "echo"
inputs(echo)
## inputs:
## sth (string):
outputs(echo)
## outputs:
## output:
## type: stdout
The inputs(echo) will show the value once it is assigned in next
step. Since we didn’t define the outputs for this tool, it will stream
standard output to a temporary file by default.
The third step is to assign values (here is “Hello World!”) for the input parameters.
echo$sth <- "Hello World!"
inputs(echo)
## inputs:
## sth (string): Hello World!
Now this R version of command line tool echo is ready to be
executed.
We can install cwltool first to make sure a cwl-runner is
available.
invisible(install_cwltool())
## + /home/biocbuild/.cache/R/basilisk/1.18.0/0/bin/conda create --yes --prefix /home/biocbuild/.cache/R/basilisk/1.18.0/Rcwl/1.22.0/env_Rcwl 'python=3.11' --quiet -c conda-forge --override-channels
## + /home/biocbuild/.cache/R/basilisk/1.18.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.18.0/Rcwl/1.22.0/env_Rcwl 'python=3.11' -c conda-forge --override-channels
## + /home/biocbuild/.cache/R/basilisk/1.18.0/0/bin/conda install --yes --prefix /home/biocbuild/.cache/R/basilisk/1.18.0/Rcwl/1.22.0/env_Rcwl -c conda-forge 'python=3.11' 'python=3.11' --override-channels
The function runCWL runs the tools in R and returns a list of: 1)
actual command line that was executed, 2) filepath to the output, and
3) running logs. The output directory by default takes the working
directory, but can be specified in outdir argument.
r1 <- runCWL(echo, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
r1
## List of length 3
## names(3): command output logs
r1$command
## [1] "\033[1;30mINFO\033[0m [job echo.cwl] /home/biocbuild/bbs-3.20-bioc/tmpdir/l7gwztsf$ echo \\"
## [2] " 'Hello World!' > /home/biocbuild/bbs-3.20-bioc/tmpdir/l7gwztsf/a6830052b55968f63ce2cac256926f6a2544ce19"
readLines(r1$output)
## [1] "Hello World!"
r1$logs
## [1] "\033[1;30mINFO\033[0m /home/biocbuild/.cache/R/basilisk/1.18.0/Rcwl/1.22.0/env_Rcwl/bin/cwltool 3.1.20230719185429"
## [2] "\033[1;30mINFO\033[0m Resolved '/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/filea47732d0485ce/echo.cwl' to 'file:///home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/filea47732d0485ce/echo.cwl'"
## [3] "\033[1;30mINFO\033[0m [job echo.cwl] /home/biocbuild/bbs-3.20-bioc/tmpdir/l7gwztsf$ echo \\"
## [4] " 'Hello World!' > /home/biocbuild/bbs-3.20-bioc/tmpdir/l7gwztsf/a6830052b55968f63ce2cac256926f6a2544ce19"
## [5] "\033[1;30mINFO\033[0m [job echo.cwl] completed success"
## [6] "{"
## [7] " \"output\": {"
## [8] " \"location\": \"file:///home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/a6830052b55968f63ce2cac256926f6a2544ce19\","
## [9] " \"basename\": \"a6830052b55968f63ce2cac256926f6a2544ce19\","
## [10] " \"class\": \"File\","
## [11] " \"checksum\": \"sha1$a0b65939670bc2c010f4d5d6a0b3e4e4590fb92b\","
## [12] " \"size\": 13,"
## [13] " \"path\": \"/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/a6830052b55968f63ce2cac256926f6a2544ce19\""
## [14] " }"
## [15] "}\033[1;30mINFO\033[0m Final process status is success"
Users can also have the log printed out by specifying showLog = TRUE.
r1 <- runCWL(echo, outdir = tempdir(), showLog = TRUE)
## }
A utility function writeCWL converts the cwlProcess object into 2
files: a .cwl file for the command and .yml file for the inputs,
which are the internal cwl files to be executed when runCWL is
invoked. The internal execution requires a cwl-runner (e.g.,
cwltool), which will be installed automatically with runCWL.
writeCWL(echo)
## cwlout
## "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/filea477370efb423/echo.cwl"
## ymlout
## "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/filea477370efb423/echo.yml"
The package provides functions to define a CWL syntax for Command Line Tools in an intuitive way. The functions were developed based on the CWL Command Line Tool Description (v1.0). More details can be found in the official document: https://www.commonwl.org/v1.0/CommandLineTool.html.
For the input parameters, three options need to be defined usually, id, type, and prefix. The type can be string, int, long, float, double, and so on. More detail can be found at: https://www.commonwl.org/v1.0/CommandLineTool.html#CWLType.
Here is an example from CWL user
guide. Here we defined
an echo with different type of input parameters by InputParam. The
stdout option can be used to capture the standard output stream to a
file.
e1 <- InputParam(id = "flag", type = "boolean", prefix = "-f")
e2 <- InputParam(id = "string", type = "string", prefix = "-s")
e3 <- InputParam(id = "int", type = "int", prefix = "-i")
e4 <- InputParam(id = "file", type = "File", prefix = "--file=", separate = FALSE)
echoA <- cwlProcess(baseCommand = "echo",
inputs = InputParamList(e1, e2, e3, e4),
stdout = "output.txt")
Then we give it a try by setting values for the inputs.
echoA$flag <- TRUE
echoA$string <- "Hello"
echoA$int <- 1
tmpfile <- tempfile()
write("World", tmpfile)
echoA$file <- tmpfile
r2 <- runCWL(echoA, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
r2$command
## [1] "\033[1;30mINFO\033[0m [job echoA.cwl] /home/biocbuild/bbs-3.20-bioc/tmpdir/8hp5wec5$ echo \\"
## [2] " --file=/home/biocbuild/bbs-3.20-bioc/tmpdir/5sh7bb1k/stg5f0b4982-4d52-4b9c-9590-a16a067cea78/filea4773552e1032 \\"
## [3] " -f \\"
## [4] " -i \\"
## [5] " 1 \\"
## [6] " -s \\"
## [7] " Hello > /home/biocbuild/bbs-3.20-bioc/tmpdir/8hp5wec5/output.txt"
The command shows the parameters work as we defined. The parameter order is in alphabetical by default, but the option of “position” can be used to fix the orders.
A similar example to CWL user guide. We can define three different type of array as inputs.
a1 <- InputParam(id = "A", type = "string[]", prefix = "-A")
a2 <- InputParam(id = "B",
type = InputArrayParam(items = "string",
prefix="-B=", separate = FALSE))
a3 <- InputParam(id = "C", type = "string[]", prefix = "-C=",
itemSeparator = ",", separate = FALSE)
echoB <- cwlProcess(baseCommand = "echo",
inputs = InputParamList(a1, a2, a3))
Then set values for the three inputs.
echoB$A <- letters[1:3]
echoB$B <- letters[4:6]
echoB$C <- letters[7:9]
echoB
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: echo
## inputs:
## A (string[]): -A a b c
## B:
## type: array
## prefix: -B= d e f
## C (string[]): -C= g h i
## outputs:
## output:
## type: stdout
Now we can check whether the command behaves as we expected.
r3 <- runCWL(echoB, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
r3$command
## [1] "\033[1;30mINFO\033[0m [job echoB.cwl] /home/biocbuild/bbs-3.20-bioc/tmpdir/x07t77nc$ echo \\"
## [2] " -A \\"
## [3] " a \\"
## [4] " b \\"
## [5] " c \\"
## [6] " -B=d \\"
## [7] " -B=e \\"
## [8] " -B=f \\"
## [9] " -C=g,h,i > /home/biocbuild/bbs-3.20-bioc/tmpdir/x07t77nc/c9b62fbaac837fe1cb161c6486ec3147c18a533b"
The outputs, similar to the inputs, is a list of output parameters. Three options id, type and glob can be defined. The glob option is used to define a pattern to find files relative to the output directory.
Here is an example to unzip a compressed gz file. First, we generate a compressed R script file.
zzfil <- file.path(tempdir(), "sample.R.gz")
zz <- gzfile(zzfil, "w")
cat("sample(1:10, 5)", file = zz, sep = "\n")
close(zz)
We define a cwlProcess object to use “gzip” to uncompress a input file.
ofile <- "sample.R"
z1 <- InputParam(id = "uncomp", type = "boolean", prefix = "-d")
z2 <- InputParam(id = "out", type = "boolean", prefix = "-c")
z3 <- InputParam(id = "zfile", type = "File")
o1 <- OutputParam(id = "rfile", type = "File", glob = ofile)
gz <- cwlProcess(baseCommand = "gzip",
inputs = InputParamList(z1, z2, z3),
outputs = OutputParamList(o1),
stdout = ofile)
Now the gz object can be used to uncompress the previous generated compressed file.
gz$uncomp <- TRUE
gz$out <- TRUE
gz$zfile <- zzfil
r4 <- runCWL(gz, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
r4$output
## [1] "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/sample.R"
Or we can use arguments to set some default parameters.
z1 <- InputParam(id = "zfile", type = "File")
o1 <- OutputParam(id = "rfile", type = "File", glob = ofile)
Gz <- cwlProcess(baseCommand = "gzip",
arguments = list("-d", "-c"),
inputs = InputParamList(z1),
outputs = OutputParamList(o1),
stdout = ofile)
Gz
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: gzip
## arguments: -d -c
## inputs:
## zfile (File):
## outputs:
## rfile:
## type: File
## outputBinding:
## glob: sample.R
## stdout: sample.R
Gz$zfile <- zzfil
r4a <- runCWL(Gz, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
To make it for general usage, we can define a pattern with javascript
to glob the output, which require node from “nodejs” to be installed in your
system PATH.
pfile <- "$(inputs.zfile.path.split('/').slice(-1)[0].split('.').slice(0,-1).join('.'))"
Or we can use the CWL built in file property, nameroot, directly.
pfile <- "$(inputs.zfile.nameroot)"
o2 <- OutputParam(id = "rfile", type = "File", glob = pfile)
req1 <- requireJS()
GZ <- cwlProcess(baseCommand = "gzip",
arguments = list("-d", "-c"),
requirements = list(), ## assign list(req1) if node installed.
inputs = InputParamList(z1),
outputs = OutputParamList(o2),
stdout = pfile)
GZ$zfile <- zzfil
r4b <- runCWL(GZ, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
We can also capture multiple output files with glob pattern.
a <- InputParam(id = "a", type = InputArrayParam(items = "string"))
b <- OutputParam(id = "b", type = OutputArrayParam(items = "File"),
glob = "*.txt")
touch <- cwlProcess(baseCommand = "touch", inputs = InputParamList(a),
outputs = OutputParamList(b))
touch$a <- c("a.txt", "b.log", "c.txt")
r5 <- runCWL(touch, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
r5$output
## [1] "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/a.txt"
## [2] "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/c.txt"
The “touch” command generates three files, but the output only collects
two files with “.txt” suffix as defined in the OutputParam using the
“glob” option.
The CWL can work with docker to simplify your software management and
communicate files between host and container. The docker container can
be defined by the hints or requirements option.
d1 <- InputParam(id = "rfile", type = "File")
req1 <- requireDocker("r-base")
doc <- cwlProcess(baseCommand = "Rscript",
inputs = InputParamList(d1),
stdout = "output.txt",
hints = list(req1))
doc$rfile <- r4$output
r6 <- runCWL(doc)
The tools defined with docker requirements can also be run locally by
disabling the docker option. In case your Rscript depends some local
libraries to run, an option from cwltools,
“–preserve-entire-environment”, can be used to pass all environment
variables.
r6a <- runCWL(doc, docker = FALSE, outdir = tempdir(),
cwlArgs = "--preserve-entire-environment")
## }[1;30mINFO[0m Final process status is success
The CWL can also work in high performance clusters with batch-queuing
system, such as SGE, PBS, SLURM and so on, using the Bioconductor
package BiocParallel. Here is an example to submit jobs with
“Multicore” and “SGE”.
library(BiocParallel)
sth.list <- as.list(LETTERS)
names(sth.list) <- LETTERS
## submit with multicore
result1 <- runCWLBatch(cwl = echo, outdir = tempdir(), inputList = list(sth = sth.list),
BPPARAM = MulticoreParam(26))
## submit with SGE
result2 <- runCWLBatch(cwl = echo, outdir = tempdir(), inputList = list(sth = sth.list),
BPPARAM = BatchtoolsParam(workers = 26, cluster = "sge",
resources = list(queue = "all.q")))
We can connect multiple tools together into a pipeline. Here is an
example to uncompress an R script and execute it with Rscript.
Here we define a simple Rscript tool without using docker.
d1 <- InputParam(id = "rfile", type = "File")
Rs <- cwlProcess(baseCommand = "Rscript",
inputs = InputParamList(d1))
Rs
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: Rscript
## inputs:
## rfile (File):
## outputs:
## output:
## type: stdout
Test run:
Rs$rfile <- r4$output
tres <- runCWL(Rs, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
readLines(tres$output)
## [1] "[1] 7 6 10 4 2"
The pipeline includes two steps, decompressing with predefined
cwlProcess of GZ and compiling with cwlProcess of Rs. The
input file is a compressed file for the first “Uncomp” step.
i1 <- InputParam(id = "cwl_zfile", type = "File")
s1 <- cwlStep(id = "Uncomp", run = GZ,
In = list(zfile = "cwl_zfile"))
s2 <- cwlStep(id = "Compile", run = Rs,
In = list(rfile = "Uncomp/rfile"))
In step 1 (‘s1’), the pipeline runs the cwlProcess of GZ, where
the input zfile is defined in ‘i1’ with id of “cwl_zfile”. In step 2
(‘s2’), the pipeline runs the cwlProcess of Rs, where the input
rfile is from the output of the step 1 (“Uncomp/rfile”) using the
format of <step>/<output>.
The pipeline output will be defined as the output of the step 2
(“Compile/output”) using the format of <step>/<output> as shown
below.
o1 <- OutputParam(id = "cwl_cout", type = "File",
outputSource = "Compile/output")
The cwlWorkflow function is used to initiate the pipeline by
specifying the inputs and outputs. Then we can simply use + to
connect all steps to build the final pipeline.
cwl <- cwlWorkflow(inputs = InputParamList(i1),
outputs = OutputParamList(o1))
cwl <- cwl + s1 + s2
cwl
## class: cwlWorkflow
## cwlClass: Workflow
## cwlVersion: v1.0
## inputs:
## cwl_zfile (File):
## outputs:
## cwl_cout:
## type: File
## outputSource: Compile/output
## steps:
## Uncomp:
## run: Uncomp.cwl
## in:
## zfile: cwl_zfile
## out:
## - rfile
## Compile:
## run: Compile.cwl
## in:
## rfile: Uncomp/rfile
## out:
## - output
Let’s run the pipeline.
cwl$cwl_zfile <- zzfil
r7 <- runCWL(cwl, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
readLines(r7$output)
## [1] "[1] 3 5 8 9 6"
Tips: Sometimes, we need to adjust some arguments of certain tools in
a pipeline besides of parameter inputs. The function arguments can
help to modify arguments for a tool, tool in a pipeline, or even tool
in a sub-workflow. For example,
arguments(cwl, step = "Uncomp") <- list("-d", "-c", "-f")
runs(cwl)$Uncomp
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: gzip
## arguments: -d -c -f
## inputs:
## zfile (File): /home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/sample.R.gz
## outputs:
## rfile:
## type: File
## outputBinding:
## glob: $(inputs.zfile.nameroot)
## stdout: $(inputs.zfile.nameroot)
The scattering feature can specifies the associated workflow step or
subworkflow to execute separately over a list of input elements. To
use this feature, ScatterFeatureRequirement must be specified in the
workflow requirements. Different scatter methods can be used in the
associated step to decompose the input into a discrete set of
jobs. More details can be found at:
https://www.commonwl.org/v1.0/Workflow.html#WorkflowStep.
Here is an example to execute multiple R scripts. First, we need to
set the input and output types to be array of “File”, and add the
requirements. In the “Compile” step, the scattering input is required
to be set with the scatter option.
i2 <- InputParam(id = "cwl_rfiles", type = "File[]")
o2 <- OutputParam(id = "cwl_couts", type = "File[]", outputSource = "Compile/output")
req1 <- requireScatter()
cwl2 <- cwlWorkflow(requirements = list(req1),
inputs = InputParamList(i2),
outputs = OutputParamList(o2))
s1 <- cwlStep(id = "Compile", run = Rs,
In = list(rfile = "cwl_rfiles"),
scatter = "rfile")
cwl2 <- cwl2 + s1
cwl2
## class: cwlWorkflow
## cwlClass: Workflow
## cwlVersion: v1.0
## requirements:
## - class: ScatterFeatureRequirement
## inputs:
## cwl_rfiles (File[]):
## outputs:
## cwl_couts:
## type: File[]
## outputSource: Compile/output
## steps:
## Compile:
## run: Compile.cwl
## in:
## rfile: cwl_rfiles
## out:
## - output
## scatter: rfile
Multiple R scripts can be assigned to the workflow inputs and executed.
cwl2$cwl_rfiles <- c(r4b$output, r4b$output)
r8 <- runCWL(cwl2, outdir = tempdir())
## }[1;30mINFO[0m Final process status is success
r8$output
## [1] "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/12fd933790cc9ba3c4880ef10f858803d284c62c"
## [2] "/home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpJtCSDC/12fd933790cc9ba3c4880ef10f858803d284c62c_2"
The function plotCWL can be used to visualize the relationship of
inputs, outputs and the analysis for a tool or pipeline.
plotCWL(cwl)
Here we build a tool with different types of input parameters.
e1 <- InputParam(id = "flag", type = "boolean",
prefix = "-f", doc = "boolean flag")
e2 <- InputParam(id = "string", type = "string", prefix = "-s")
e3 <- InputParam(id = "option", type = "string", prefix = "-o")
e4 <- InputParam(id = "int", type = "int", prefix = "-i", default = 123)
e5 <- InputParam(id = "file", type = "File",
prefix = "--file=", separate = FALSE)
e6 <- InputParam(id = "array", type = "string[]", prefix = "-A",
doc = "separated by comma")
mulEcho <- cwlProcess(baseCommand = "echo", id = "mulEcho",
label = "Test parameter types",
inputs = InputParamList(e1, e2, e3, e4, e5, e6),
stdout = "output.txt")
mulEcho
## class: cwlProcess
## cwlClass: CommandLineTool
## cwlVersion: v1.0
## baseCommand: echo
## inputs:
## flag (boolean): -f
## string (string): -s
## option (string): -o
## int (int): -i 123
## file (File): --file=
## array (string[]): -A
## outputs:
## output:
## type: stdout
## stdout: output.txt
Some input parameters can be predefined in a list, which will be
converted to select options in the webapp. An upload parameter can
be used to defined whether to generate an upload interface for the
file type option. If FALSE, the upload field will be text input (file
path) instead of file input.
inputList <- list(option = c("option1", "option2"))
app <- cwlShiny(mulEcho, inputList, upload = TRUE)
runApp(app)
shinyApp
We can wrap an R function to cwlProcess object by simply assigning the R function to baseCommand. This could be useful to summarize results from other tools in a pipeline. It can also be used to benchmark different parameters for a method written in R. Please note that this feature is only implemented by Rcwl, but not available in the common workflow language.
fun1 <- function(x)x*2
testFun <- function(a, b){
cat(fun1(a) + b^2, sep="\n")
}
assign("fun1", fun1, envir = .GlobalEnv)
assign("testFun", testFun, envir = .GlobalEnv)
p1 <- InputParam(id = "a", type = "int", prefix = "a=", separate = F)
p2 <- InputParam(id = "b", type = "int", prefix = "b=", separate = F)
o1 <- OutputParam(id = "o", type = "File", glob = "rout.txt")
TestFun <- cwlProcess(baseCommand = testFun,
inputs = InputParamList(p1, p2),
outputs = OutputParamList(o1),
stdout = "rout.txt")
TestFun$a <- 1
TestFun$b <- 2
r1 <- runCWL(TestFun, cwlArgs = "--preserve-entire-environment")
## }[1;30mINFO[0m Final process status is success
readLines(r1$output)
## [1] "6"
The runCWL function wrote the testFun function and its
dependencies into an R script file automatically and call Rscript to
run the script with parameters. Each parameter requires a prefix from
corresponding argument in the R function with “=” and without a
separator. Here we assigned the R function and its dependencies into
the global environment because it will start a new environment when
the vignette is compiled.
The Rcwl package can be utilized to develop pipelines for best
practices of reproducible research, especially for Bioinformatics
study. Multiple Bioinformatics pipelines, such as RNA-seq alignment,
quality control and quantification, DNA-seq alignment and variant
calling, have been developed based on the tool in an R package
RcwlPipelines, which contains the CWL recipes and the scripts to
create the pipelines. Examples to analyze real data are also included.
The package is currently available in GitHub.
To install the package.
BiocManager::install("rworkflow/RcwlPipelines")
The project website https://rcwl.org/ serves as a central hub for all related resources. It provides guidance for new users and tutorials for both users and developers. Specific resources are listed below.
The tutorial book provides detailed
instructions for developing Rcwl tools/pipelines, and also includes
examples of some commonly-used tools and pipelines that covers a wide
range of Bioinformatics data analysis needs.
The R scripts to build the CWL tools and pipelines are now residing
in a dedicated GitHub
repository, which is
intended to be a community effort to collect and contribute
Bioinformatics tools and pipelines using Rcwl and CWL.
Plenty of Bioinformatics tools and workflows can be found from GitHub
in CWL format. They can be imported to cwlProcess object by
readCWL function, or can be used directly.
Most of the Bioinformatics software are available in docker containers, which can be very convenient to be adopted to build portable CWL tools and pipelines.
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Rcwl_1.22.0 S4Vectors_0.44.0 BiocGenerics_0.52.0
## [4] yaml_2.3.10 BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] dir.expiry_1.14.0 xfun_0.48 bslib_0.8.0
## [4] htmlwidgets_1.6.4 visNetwork_2.1.2 lattice_0.22-6
## [7] batchtools_0.9.17 vctrs_0.6.5 tools_4.4.1
## [10] generics_0.1.3 base64url_1.4 parallel_4.4.1
## [13] tibble_3.2.1 fansi_1.0.6 pkgconfig_2.0.3
## [16] R.oo_1.26.0 Matrix_1.7-1 data.table_1.16.2
## [19] checkmate_2.3.2 RColorBrewer_1.1-3 lifecycle_1.0.4
## [22] stringr_1.5.1 compiler_4.4.1 progress_1.2.3
## [25] codetools_0.2-20 httpuv_1.6.15 htmltools_0.5.8.1
## [28] sass_0.4.9 later_1.3.2 pillar_1.9.0
## [31] crayon_1.5.3 jquerylib_0.1.4 R.utils_2.12.3
## [34] BiocParallel_1.40.0 cachem_1.1.0 mime_0.12
## [37] basilisk_1.18.0 brew_1.0-10 tidyselect_1.2.1
## [40] digest_0.6.37 stringi_1.8.4 purrr_1.0.2
## [43] dplyr_1.1.4 bookdown_0.41 fastmap_1.2.0
## [46] grid_4.4.1 cli_3.6.3 magrittr_2.0.3
## [49] DiagrammeR_1.0.11 utf8_1.2.4 withr_3.0.2
## [52] prettyunits_1.2.0 filelock_1.0.3 promises_1.3.0
## [55] backports_1.5.0 rappdirs_0.3.3 rmarkdown_2.28
## [58] igraph_2.1.1 reticulate_1.39.0 png_0.1-8
## [61] R.methodsS3_1.8.2 hms_1.1.3 shiny_1.9.1
## [64] evaluate_1.0.1 knitr_1.48 basilisk.utils_1.18.0
## [67] rlang_1.1.4 Rcpp_1.0.13 xtable_1.8-4
## [70] glue_1.8.0 BiocManager_1.30.25 rstudioapi_0.17.1
## [73] debugme_1.2.0 jsonlite_1.8.9 R6_2.5.1