.. _postprocessing:

Post-processing and Posterior Analysis
======================================

This module provides a high-level interface to **summarise, analyse, and export**
the results produced by BESTA / CosmoSIS runs. It is designed to sit *after*
sampling and focuses on:

* Robust posterior summaries (MAP, mean, covariance, correlation)
* Credible intervals (percentiles and HDIs)
* One- and two-dimensional posterior PDFs
* Diagnostic plots (PDFs, corner plots, chains)
* Bayesian evidence estimation
* Machine-readable outputs (FITS and JSON)

The core abstraction is the :class:`besta.postprocess.ResultsSummary` object, which collects all
derived quantities in a single, reusable container.

Overview
--------

Typical workflow:

1. Load a CosmoSIS results table (or pass one directly).
2. Call :func:`besta.postprocess.summarize_results`.
3. Inspect or plot results via :class:`besta.postprocess.ResultsSummary`.
4. Export summaries to FITS and/or JSON.

The module is **sampler-agnostic** and works with any posterior sample stored
as a table with a log-posterior column.

Input requirements
------------------

The input table must contain:

* A **log-posterior** column (default name: ``post``)
* One column per sampled parameter, typically named using the convention::

    section--parameter

For evidence estimation, the table must also contain:

* A **log-prior** column (e.g. ``prior``)

If a separate log-likelihood column is not present, it is automatically
reconstructed as::

    loglike = post - prior

Basic usage
-----------

Summarising a results table:

.. code-block:: python

    from besta.postprocess import summarize_results

    summary = summarize_results(
        table,
        output_fits="besta_summary.fits",
        output_json="besta_summary.json",
    )

The returned object is an instance of :class:`besta.postprocess.ResultsSummary`.

Selecting parameters
--------------------

By default, all columns containing ``"--"`` are interpreted as sampled
parameters. You can override this behaviour:

.. code-block:: python

    summary = summarize_results(
        table,
        parameter_keys=[
            "sfh--tau",
            "sfh--age",
            "dust--av",
        ],
    )

Posterior summaries
-------------------

The following quantities are always computed:

* Maximum-a-posteriori (MAP) parameter vector
* Posterior-weighted mean
* Covariance matrix
* Correlation matrix

They are accessible as NumPy arrays:

.. code-block:: python

    summary.map
    summary.mean
    summary.covariance
    summary.correlation

Credible intervals
------------------

Percentiles
~~~~~~~~~~~

Percentiles are computed using **weighted quantiles**:

.. code-block:: python

    summary.percentiles
    summary.percentiles_values

By default, the percentiles are:

``[0.05, 0.16, 0.5, 0.84, 0.95]``

This can be customised:

.. code-block:: python

    summary = summarize_results(
        table,
        percentiles=[0.025, 0.16, 0.5, 0.84, 0.975],
    )

Highest-Density Intervals (HDI)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In addition to percentiles, the module computes **highest-density intervals**
(HDIs), which are often more informative for skewed or multi-modal posteriors.

By default, a 68% HDI is computed:

.. code-block:: python

    summary.hdi_intervals["tau"]

HDIs may contain **multiple disjoint intervals** if the posterior is multi-modal.

One-dimensional PDFs
--------------------

For each parameter, the module computes:

* A histogram-based PDF
* An optional KDE-based PDF

These are stored in:

.. code-block:: python

    summary.pdf_1d["tau"]["grid"]
    summary.pdf_1d["tau"]["hist_pdf"]
    summary.pdf_1d["tau"]["kde_pdf"]

You can disable KDE estimation:

.. code-block:: python

    summary = summarize_results(
        table,
        kde_1d=False,
    )

Two-dimensional PDFs
--------------------

Joint PDFs for parameter pairs can be computed as:

.. code-block:: python

    summary = summarize_results(
        table,
        compute_2d=True,
        parameter_key_pairs=[
            ("sfh--tau", "sfh--age"),
            ("dust--av", "sfh--age"),
        ],
    )

For each pair, the following are stored:

* Regular grid in both parameters
* Joint PDF
* Enclosed-fraction (HPD) map

Accessible as:

.. code-block:: python

    pdf = summary.pdf_2d[("tau", "age")]["pdf"]
    frac = summary.pdf_2d[("tau", "age")]["enclosed_fraction"]

The enclosed-fraction map allows easy extraction of 68% / 95% credible contours.

Plotting
--------

1D PDFs
~~~~~~~

.. code-block:: python

    summary.plot_1d_pdfs("plots/pdf1d")

2D PDFs
~~~~~~~

.. code-block:: python

    summary.plot_2d_pdfs("plots/pdf2d")

Corner plots
~~~~~~~~~~~~

A lightweight built-in corner plot is provided:

.. code-block:: python

    summary.corner_plot("corner.png")

This avoids external dependencies, but more advanced diagnostics can be added
later (e.g. via ``corner`` or ``arviz``).

Evidence estimation
-------------------

If the results table contains:

* ``post``  – log posterior
* ``prior`` – log prior

then the Bayesian evidence can be estimated.

Laplace approximation (recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    summary = summarize_results(
        table,
        estimate_evidence=True,
        evidence_method="laplace",
        logprior_key="prior",
    )

This computes:

.. math::

    Z \approx \mathcal{L}(\theta_\mathrm{MAP}) \,
              \pi(\theta_\mathrm{MAP}) \,
              (2\pi)^{d/2} \, |\Sigma|^{1/2}

where :math:`\Sigma` is the posterior covariance matrix.

Robust harmonic-mean estimator (sanity check)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    summary.estimate_evidence_from_table(
        table,
        method="hme_robust",
        logprior_key="prior",
    )

.. warning::

    The harmonic-mean estimator is **statistically unstable** and should only
    be used as a rough cross-check. The Laplace approximation is preferred.

The estimated evidence is available as:

.. code-block:: python

    summary.evidence.logz
    summary.evidence.method
    summary.evidence.details

Outputs
-------

FITS
~~~~~

The FITS output contains:

* Covariance and correlation matrices
* Percentile tables
* 1D PDFs
* 2D PDFs and enclosed-fraction maps
* Evidence (stored in the primary header)

JSON
~~~~

The JSON output provides a **fully serialised summary**, including:

* Posterior statistics
* Credible intervals
* PDFs
* Evidence
* User-provided metadata

This format is intended for:

* Dashboards
* Pipeline aggregation
* Downstream inference and validation

Design philosophy
-----------------

This module intentionally separates:

* **Sampling** (CosmoSIS / BESTA)
* **Inference** (this module)
* **Visualisation / reporting**

This makes posterior analysis reproducible, testable, and reusable across
projects and datasets.