Use from R
Many ClarID workflows start as metadata tables. If those tables live in R, you can keep the familiar R data-frame workflow while still using the reference ClarID-Tools command-line implementation.
The pattern is intentionally simple: write a CSV from R, run the same
clarid-tools command you would run in a terminal, and read the encoded or
decoded CSV back into R. No R package or Perl-to-R bridge is required.
This is useful when R, R Markdown, Quarto, or an R-based pipeline prepares the metadata table but ClarID-Tools should remain the source of truth for encoding and decoding.
Before you start
Check whether clarid-tools is available in your shell:
Sys.which("clarid-tools")
If you are working from a repository checkout instead of an installed command, use the repository executable:
clarid_tools <- "bin/clarid-tools"
Otherwise:
clarid_tools <- "clarid-tools"
Minimal helper
Use system2() so each command-line option and value is passed as a separate
argument. It keeps the R code close to the CLI command while avoiding shell
quoting problems with paths and metadata values.
run_clarid <- function(args, output, executable = "clarid-tools") {
log <- system2(
executable,
args = args,
stdout = TRUE,
stderr = TRUE
)
status <- attr(log, "status")
if (!is.null(status) && status != 0) {
stop(paste(log, collapse = "\n"), call. = FALSE)
}
if (!file.exists(output)) {
stop(paste(log, collapse = "\n"), call. = FALSE)
}
output
}
Encode a table from R
Create a normal R data frame, write it to CSV, run clarid-tools code, and read
the encoded CSV back into R. The same approach works for larger tables generated
from Bioconductor objects, clinical metadata exports, or project spreadsheets.
clarid_tools <- "clarid-tools"
workdir <- tempfile("clarid-")
dir.create(workdir)
input_file <- file.path(workdir, "biosample.csv")
output_file <- file.path(workdir, "biosample_encoded.csv")
biosamples <- data.frame(
unique_id = c("samp001", "samp002"),
subject_id = c(1, 2),
project = c("TCGA-AML", "TCGA-AML"),
species = c("Human", "Mouse"),
tissue = c("Liver", "Brain"),
sample_type = c("Normal", "Tumor"),
assay = c("RNA_seq", "ChIP_seq"),
condition = c("Z77.22", "Z77.22"),
timepoint = c("Baseline", "Treatment"),
duration = c("P1D", "P7W"),
batch = c(1, 2),
replicate = c(5, 2),
check.names = FALSE
)
write.csv(biosamples, input_file, row.names = FALSE, quote = TRUE)
run_clarid(
args = c(
"code",
"--entity", "biosample",
"--format", "human",
"--action", "encode",
"--infile", input_file,
"--sep", ",",
"--outfile", output_file
),
output = output_file,
executable = clarid_tools
)
encoded <- read.csv(output_file, check.names = FALSE)
encoded$clar_id
Decode IDs from R
Use the same pattern for decoding. The input table needs a clar_id column for
human-format identifiers or a stub_id column for stub-format identifiers.
decode_input <- file.path(workdir, "biosample_to_decode.csv")
decode_output <- file.path(workdir, "biosample_decoded.csv")
write.csv(
encoded[, c("unique_id", "clar_id")],
decode_input,
row.names = FALSE,
quote = TRUE
)
run_clarid(
args = c(
"code",
"--entity", "biosample",
"--format", "human",
"--action", "decode",
"--infile", decode_input,
"--sep", ",",
"--outfile", decode_output
),
output = decode_output,
executable = clarid_tools
)
decoded <- read.csv(decode_output, check.names = FALSE)
decoded
Notes
- Add
--codebook /path/to/clarid-codebook.yamlwhen using a project-specific codebook. If omitted, ClarID-Tools uses the packaged default codebook. - Keep generated files in a project or temporary directory so R, shell scripts, and workflow managers can all inspect the same inputs and outputs.
- For Docker-based runs, use the same
system2()pattern but calldockeras the executable and pass the ClarID-Tools command as container arguments.