Running Experiments
Using Reproducible Example Scripts
The project currently provides four reproducible risk cases via scripts
in examples/: R2, R9, R10, and R13. Each script loads a pre-configured
YAML file, runs the experiment, and writes results to results/.
# Tacit collusion (R2)
cd examples/R2
python run_r2.py --condition C1
# Strategic Misreporting (R9)
cd ../R9
python run_r9.py
# Normative Deadlock (R10)
cd ../R10
python run_r10.py --condition e1
# Excessive Rigidity to Initial Directives (R13)
cd ../R13
python run_r13.py
Each script loads configs from its configs/ subdirectory and writes
results to results/.
Using the Python API
For full programmatic control, use config_loader and
ExperimentRunner directly. This lets you modify components in code
before running, or integrate experiments into larger pipelines.
from risklab.experiments.config_loader import (
load_experiment_config,
build_experiment_from_config,
)
from risklab.experiments.runner import ExperimentRunner
# Load and build
config = load_experiment_config("path/to/config.yaml")
components = build_experiment_from_config(config)
# Run
runner = ExperimentRunner(**components)
results = runner.run() # returns list[dict], one per seed
# Access results
for result in results:
print(result["risk_results"]) # {risk_id: {detected, score, ...}}
print(result["metric_results"]) # {metric_name: value}
print(result["task_result"]) # task evaluation or None
Multi-Seed Runs
LLM outputs are stochastic. Running with multiple seeds produces independent repetitions so you can measure variance.
results = runner.run(num_seeds=5) # 5 independent runs
Note
In the current framework, seed is a run index recorded in outputs.
It is not guaranteed to map to a deterministic random seed in external
LLM provider APIs.
Output Structure
results/
├── ExperimentId_aggregate.json # all seed results
└── trajectories/
├── ExperimentId_seed0_cyclic.json
├── ExperimentId_seed1_cyclic.json
└── ExperimentId_seed2_cyclic.json
Aggregate file — list of result dicts with
risk_results,metric_results,task_result,num_rounds,seed,failureTrajectory files — full message logs for replay and analysis
Inspecting Configs
Validate and preview a config before running:
python -m risklab.inspect_config path/to/config.yaml -A
See CLI Reference for all inspector flags.