Batch Plot Generation¶
- Author:
Added in version 1.7.0: The batch command with parallel processing support.
Overview¶
The batch command generates multiple plots from a TOML configuration file.
It supports parallel execution for improved performance when processing many files.
Usage¶
# Sequential processing (default)
rgpycrumbs chemgp batch --config plots.toml
# Parallel processing with 4 workers
rgpycrumbs chemgp batch --config plots.toml --parallel 4
# Short form
rgpycrumbs chemgp batch -c plots.toml -j 4
Configuration File Format¶
The TOML configuration file specifies plots to generate:
[defaults]
input_dir = "./data"
output_dir = "./figures"
[[plots]]
type = "surface"
input = "mb_surface.h5"
output = "mb_surface.pdf"
width = 7.0
height = 5.0
[[plots]]
type = "convergence"
input = "convergence.h5"
output = "convergence.pdf"
[[plots]]
type = "quality"
input = "gp_quality.h5"
output = "gp_quality.pdf"
n-points = [100, 200, 400]
Configuration Options¶
Defaults Section¶
Option |
Type |
Default |
Description |
|---|---|---|---|
inputdir |
string |
|
Base directory for input files |
outputdir |
string |
|
Base directory for output files |
Plot Entries¶
Each [[plots]] entry specifies a single plot:
Option |
Type |
Default |
Description |
|---|---|---|---|
type |
string |
Plot type (see below) |
|
input |
string |
Input HDF5 file (relative to inputdir) |
|
output |
string |
Output PDF file (relative to outputdir) |
|
width |
float |
7.0 |
Figure width in inches |
height |
float |
5.0 |
Figure height in inches |
dpi |
int |
300 |
Output resolution |
type-specific |
Additional options per plot type |
Plot Types¶
Available plot types correspond to individual chemgp commands:
surface- 2D PES contour plotconvergence- Force/energy convergence curvequality- GP surrogate quality progressionrff- RFF approximation qualitynll- MAP-NLL landscapesensitivity- Hyperparameter sensitivity gridtrust- Trust region illustrationvariance- GP variance overlayfps- FPS subset visualizationprofile- NEB energy profile
Parallel Processing¶
The --parallel (or -j) option enables concurrent plot generation:
# Use 4 parallel workers
rgpycrumbs chemgp batch -c plots.toml -j 4
# Use 8 workers for large batches
rgpycrumbs chemgp batch -c plots.toml -j 8
Performance¶
Parallel processing provides significant speedup for batch operations:
Workers |
Speedup |
Best For |
|---|---|---|
1 |
1x (baseline) |
Small batches (< 5 plots) |
2-4 |
2-3x |
Medium batches (5-20 plots) |
4-8 |
3-5x |
Large batches (20+ plots) |
Note: Speedup depends on I/O bandwidth and CPU cores available.
Examples¶
Basic Batch¶
Generate all plots from configuration:
rgpycrumbs chemgp batch -c my_plots.toml
Parallel Processing¶
Process 20 plots with 4 workers:
rgpycrumbs chemgp batch -c large_batch.toml -j 4
Custom Output Directory¶
[defaults]
input_dir = "./h5_data"
output_dir = "./publication/figures"
[[plots]]
type = "surface"
input = "reaction1.h5"
output = "reaction1_surface.pdf"
rgpycrumbs chemgp batch -c config.toml
Type-Specific Options¶
[[plots]]
type = "surface"
input = "mb.h5"
output = "mb_contour.pdf"
clamp-lo = -200.0
clamp-hi = 50.0
contour-step = 25.0
[[plots]]
type = "quality"
input = "gp.h5"
output = "gp_quality.pdf"
n-points = [50, 100, 200, 400]
Error Handling¶
The batch command continues processing remaining plots if one fails:
[OK] reaction1_surface.pdf
[FAIL] reaction2_convergence.pdf: Input not found: ./data/reaction2.h5
[OK] reaction3_quality.pdf
Batch complete: 2 ok, 1 failed
Exit code is 1 if any plots failed, 0 if all succeeded.
Performance Tips¶
**Use parallel processing** for batches > 5 plots
**Match workers to CPU cores** (typically 4-8 workers optimal)
**Group by input directory** to minimize I/O seeks
**Use SSD storage** for best I/O performance
See Also¶
Individual Plot Commands - Details on each plot type
HDF5 Schema - Expected HDF5 file structure
Quickstart Guide - Getting started with rgpycrumbs
Implementation Notes¶
The batch command uses concurrent.futures.ThreadPoolExecutor for parallel
processing. This pattern is adopted from the nebmmf_repro project’s
scripts/parse_results.py for consistent parallel file processing.
Progress tracking uses rich.progress for user-friendly output during
long-running batch operations.
Footnotes