Grid-Based Inference

BESTA includes a direct model-grid inference workflow that does not require CosmoSIS samplers. This mode is useful when you already have a finite model library and want fast posterior evaluation over that library.

Why use the grid module instead of CosmoSIS sampling?

Use the grid module when:

Your model space is already discretized (for example: precomputed SED/SSP libraries).
You want robust, repeatable inference with no MCMC tuning.
You need high throughput over many objects using candidate selection and parallel workers.
You want direct control over priors/likelihoods in Python.

Prefer the CosmoSIS pipeline (see Pipeline Manager) when:

Your parameter space is continuous and not naturally represented by a fixed grid.
You need sampler diagnostics and chain-level convergence checks.
You rely on existing module wiring in the CosmoSIS runtime.

Core Concepts

besta.grid.grid.ModelGrid: container for model observables/targets and metadata.
besta.grid.grid.GridFitter: computes posterior weights over grid models.
besta.grid.prob: prior and likelihood building blocks.
besta.grid.binning: candidate selectors to avoid evaluating the full grid for every object.

Minimal Workflow

Build or load a ModelGrid.
Create a GridFitter with a likelihood and prior.
Evaluate posterior summaries for one object, or run fit_batch() for many.

Example (single-object posterior on one target):

import numpy as np
from besta.grid import ModelGrid, GridFitter
from besta.grid.prob import GaussianProductLikelihood, FlatPrior

# N models, P observables, Q targets
grid = ModelGrid(
    observables=obs_models,                # shape (N, P)
    targets=target_models,                 # shape (N, Q)
    observable_names=["mag_g", "mag_r", "mag_i"],
    target_names=["logM", "age", "Z", "z"],
)

fitter = GridFitter(
    grid=grid,
    likelihood=GaussianProductLikelihood(),
    prior=FlatPrior(),
    use_standardised=True,
)

x = np.array([22.1, 21.5, 21.2])          # observed data (P,)
sx = np.array([0.03, 0.03, 0.04])         # observational errors (P,)
bins = np.linspace(7.0, 12.0, 101)        # bins for logM

post_logM, centers = fitter.posterior_over_target(
    x_native=x,
    sigma_native=sx,
    target_col="logM",
    bins=bins,
)

Batch Inference (many objects)

For catalogs, use fit_batch(). It supports:

optional candidate selection (binner=),
thread/process parallelism (n_jobs, backend),
posterior truncation controls (posterior_keep_mass, etc.),
optional per-target summary statistics (stats_for, stats_bins),
optional HDF5 output (output_hdf5_path).

from besta.grid.binning import KDTreeBinner

binner = KDTreeBinner(dims=[0, 1, 2]).fit(grid)  # select candidates in observable space

results = fitter.fit_batch(
    X_native=X_catalog,                    # shape (M, P)
    SIG_native=SIG_catalog,                # shape (M, P)
    binner=binner,
    n_jobs=8,
    backend="thread",
    stats_for=["logM", "age"],
    stats_bins=[np.linspace(7, 12, 120), np.linspace(0, 14, 120)],
    output_hdf5_path="grid_fit_results.h5",
    output_hdf5_group="/run1",
    return_mode="iter",
)

for r in results:
    # r contains: m, candidates, post_models, truncation, and optional stats
    pass

Priors and Likelihoods

The grid module is fully Bayesian; you choose the ingredients from besta.grid.prob:

priors: FlatPrior, CompositePrior, ObservableDependentPrior, and others,
likelihoods: GaussianProductLikelihood, CompositeLikelihood, etc.

This gives a similar statistical structure to sampling methods, but evaluated directly on a finite model set instead of drawing chains.

Input/Output and Reproducibility

ModelGrid supports multiple formats:

FITS tables: from_fits_table(), ModelGrid.to_fits_table(...)
HDF5: from_hdf5(), to_hdf5()
Pickle: from_pickle(), to_pickle()
automatic loader: load_auto()

Practical Tips

Start with FlatPrior + GaussianProductLikelihood as a baseline.
If the grid is large, use a binner to cut candidate counts before posterior evaluation.
Use return_mode=”iter” for low-memory streaming over large catalogs.
Write batch outputs to HDF5 for reproducible downstream post-processing.