Skip to main content

Use from R

Many ClarID workflows start as metadata tables. If those tables live in R, you can keep the familiar R data-frame workflow while still using the reference ClarID-Tools command-line implementation.

The pattern is intentionally simple: write a CSV from R, run the same clarid-tools command you would run in a terminal, and read the encoded or decoded CSV back into R. No R package or Perl-to-R bridge is required.

This is useful when R, R Markdown, Quarto, or an R-based pipeline prepares the metadata table but ClarID-Tools should remain the source of truth for encoding and decoding.

Before you start

Check whether clarid-tools is available in your shell:

Sys.which("clarid-tools")

If you are working from a repository checkout instead of an installed command, use the repository executable:

clarid_tools <- "bin/clarid-tools"

Otherwise:

clarid_tools <- "clarid-tools"

Minimal helper

Use system2() so each command-line option and value is passed as a separate argument. It keeps the R code close to the CLI command while avoiding shell quoting problems with paths and metadata values.

run_clarid <- function(args, output, executable = "clarid-tools") {
log <- system2(
executable,
args = args,
stdout = TRUE,
stderr = TRUE
)

status <- attr(log, "status")
if (!is.null(status) && status != 0) {
stop(paste(log, collapse = "\n"), call. = FALSE)
}

if (!file.exists(output)) {
stop(paste(log, collapse = "\n"), call. = FALSE)
}

output
}

Encode a table from R

Create a normal R data frame, write it to CSV, run clarid-tools code, and read the encoded CSV back into R. The same approach works for larger tables generated from Bioconductor objects, clinical metadata exports, or project spreadsheets.

clarid_tools <- "clarid-tools"

workdir <- tempfile("clarid-")
dir.create(workdir)

input_file <- file.path(workdir, "biosample.csv")
output_file <- file.path(workdir, "biosample_encoded.csv")

biosamples <- data.frame(
unique_id = c("samp001", "samp002"),
subject_id = c(1, 2),
project = c("TCGA-AML", "TCGA-AML"),
species = c("Human", "Mouse"),
tissue = c("Liver", "Brain"),
sample_type = c("Normal", "Tumor"),
assay = c("RNA_seq", "ChIP_seq"),
condition = c("Z77.22", "Z77.22"),
timepoint = c("Baseline", "Treatment"),
duration = c("P1D", "P7W"),
batch = c(1, 2),
replicate = c(5, 2),
check.names = FALSE
)

write.csv(biosamples, input_file, row.names = FALSE, quote = TRUE)

run_clarid(
args = c(
"code",
"--entity", "biosample",
"--format", "human",
"--action", "encode",
"--infile", input_file,
"--sep", ",",
"--outfile", output_file
),
output = output_file,
executable = clarid_tools
)

encoded <- read.csv(output_file, check.names = FALSE)
encoded$clar_id

Decode IDs from R

Use the same pattern for decoding. The input table needs a clar_id column for human-format identifiers or a stub_id column for stub-format identifiers.

decode_input <- file.path(workdir, "biosample_to_decode.csv")
decode_output <- file.path(workdir, "biosample_decoded.csv")

write.csv(
encoded[, c("unique_id", "clar_id")],
decode_input,
row.names = FALSE,
quote = TRUE
)

run_clarid(
args = c(
"code",
"--entity", "biosample",
"--format", "human",
"--action", "decode",
"--infile", decode_input,
"--sep", ",",
"--outfile", decode_output
),
output = decode_output,
executable = clarid_tools
)

decoded <- read.csv(decode_output, check.names = FALSE)
decoded

Notes

  • Add --codebook /path/to/clarid-codebook.yaml when using a project-specific codebook. If omitted, ClarID-Tools uses the packaged default codebook.
  • Keep generated files in a project or temporary directory so R, shell scripts, and workflow managers can all inspect the same inputs and outputs.
  • For Docker-based runs, use the same system2() pattern but call docker as the executable and pass the ClarID-Tools command as container arguments.