Skip to main content

Benchmark

Setup

These numbers come from a simple local synthetic run against the bundled OMOP 5.4 DDL using valid PERSON.csv rows and --json.

  • Linux 5.4
  • 12 CPUs on the host
  • single validator process
  • no intra-file parallelism
  • success-path validation only
  • helper script in bench/

Results

RowsFile sizeDefault --json--json --turbo
50K4.8 MB13.36 s2.47 s
100K9.7 MB26.82 s4.75 s
250K24.5 MB67.07 s11.61 s
500K49.3 MB134.32 s23.29 s

Two-engine synthetic benchmark

Takeaway

The main result is straightforward: on this workload, --turbo is consistently much faster than the default engine.

For these local runs, the speedup was roughly:

  • 5.4x faster at 50K rows
  • 5.6x faster at 100K rows
  • 5.8x faster at 250K rows
  • 5.8x faster at 500K rows

So in practice:

  • default engine: safer baseline, slower
  • turbo engine: faster, but maintained as a second engine

Recommendation

Use the default engine when:

  • the CSV is not especially large
  • you want the most conservative path

Use --turbo when:

  • you are validating large files
  • runtime is the main reason to switch
  • you are staying within the current tested schema model

Caveat

Treat these numbers as a local synthetic reference, not a guarantee. Real throughput will move with disk speed, CPU, row width, error rate, and report mode.