=====================
Running Experiments
=====================

Using Reproducible Example Scripts
----------------------------------

The project currently provides **four reproducible risk cases** via scripts
in ``examples/``: R2, R9, R10, and R13. Each script loads a pre-configured
YAML file, runs the experiment, and writes results to ``results/``.

.. code-block:: bash

   # Tacit collusion (R2)
   cd examples/R2
   python run_r2.py --condition C1

   # Strategic Misreporting (R9)
   cd ../R9
   python run_r9.py

   # Normative Deadlock (R10)
   cd ../R10
   python run_r10.py --condition e1

   # Excessive Rigidity to Initial Directives (R13)
   cd ../R13
   python run_r13.py

Each script loads configs from its ``configs/`` subdirectory and writes
results to ``results/``.

Using the Python API
--------------------

For full programmatic control, use ``config_loader`` and
``ExperimentRunner`` directly. This lets you modify components in code
before running, or integrate experiments into larger pipelines.

.. code-block:: python

   from risklab.experiments.config_loader import (
       load_experiment_config,
       build_experiment_from_config,
   )
   from risklab.experiments.runner import ExperimentRunner

   # Load and build
   config = load_experiment_config("path/to/config.yaml")
   components = build_experiment_from_config(config)

   # Run
   runner = ExperimentRunner(**components)
   results = runner.run()       # returns list[dict], one per seed

   # Access results
   for result in results:
       print(result["risk_results"])    # {risk_id: {detected, score, ...}}
       print(result["metric_results"])  # {metric_name: value}
       print(result["task_result"])     # task evaluation or None

Multi-Seed Runs
---------------

LLM outputs are stochastic. Running with multiple seeds produces
independent repetitions so you can measure variance.

.. code-block:: python

   results = runner.run(num_seeds=5)   # 5 independent runs

.. note::

   In the current framework, ``seed`` is a run index recorded in outputs.
   It is not guaranteed to map to a deterministic random seed in external
   LLM provider APIs.

Output Structure
----------------

.. code-block:: text

   results/
   ├── ExperimentId_aggregate.json         # all seed results
   └── trajectories/
       ├── ExperimentId_seed0_cyclic.json
       ├── ExperimentId_seed1_cyclic.json
       └── ExperimentId_seed2_cyclic.json

- **Aggregate file** — list of result dicts with ``risk_results``,
  ``metric_results``, ``task_result``, ``num_rounds``, ``seed``, ``failure``
- **Trajectory files** — full message logs for replay and analysis

Inspecting Configs
------------------

Validate and preview a config before running:

.. code-block:: bash

   python -m risklab.inspect_config path/to/config.yaml -A

See CLI Reference for all inspector flags.