tbnpy.inference¶
Overview¶
This module provides lightweight forward-sampling utilities for a Bayesian network defined by
a dictionary of probability objects (typically Cpt instances, but any object
with a compatible interface works).
A probability object P is assumed to expose:
P.childs: list of child variables (each variable has.name)P.parents: list of parent variables (each variable has.name)P.sample(...): sampling method (signature depends on whether parents exist)P.log_prob(...): log-probability evaluation for rows of[childs | parents](optionally)
P.sample_evidence(...): evidence-aligned sampling
The key idea is to:
collect all ancestors of query nodes,
order them topologically (parents before children),
forward-sample along that order, storing samples back into each probability object.
Glossary¶
probs:
dict[str, ProbObject]mapping node name → probability object.node name: a string key in
probs. In this module, node names are treated as variable names.Cs: sampled assignments stored as composite-state indices.
ps: stored per-sample probability values. In most usages here,
psis log probability.
Quick start¶
# probs: {"X": P(X), "Y": P(Y|X), ...}
ordered = get_ancestor_order(probs, query_nodes={"Y"})
probs_s = sample(probs, query_nodes={"Y"}, n_sample=10_000)
# probs_s["Y"].Cs contains samples for Y and its parents (if any)
# probs_s["Y"].ps contains per-sample (log) probabilities
Public API¶
Topological utilities¶
- get_ancestor_order(probs: dict, query_nodes: list[str] | set[str]) list[str]¶
Compute the set of all ancestors of the query nodes and return them in a valid topological order (parents appear before children).
Parameters¶
- probs
Mapping from node name → probability object. Each probability object must provide:
childs: list of child variablesparents: list of parent variables, each having.name
- query_nodes
Iterable of node names whose marginals (or descendant computations) are of interest.
Returns¶
- list[str]
Topologically sorted list of all ancestors of
query_nodes, including the query nodes.
Notes¶
The function performs validation and will raise
AssertionErrorif inputs are inconsistent (e.g., missing nodes, missing attributes).Cycles are detected indirectly via topological sorting consistency checks.
Forward sampling without evidence¶
- sample(probs: dict, query_nodes: list[str] | set[str], n_sample: int) dict¶
Forward-sample all ancestors of
query_nodesand return a deep-copied probability structure with stored samples and per-sample probabilities.Parameters¶
- probs
Mapping from node name → probability object.
- query_nodes
Node names to condition the ancestral subgraph selection.
- n_sample
Number of samples to generate.
Returns¶
- dict
A dictionary
{node_name: prob_object}restricted to the ancestral subgraph, in ancestor order. For each returned probability objectP:P.Csis a tensor with shape(n_sample, n_childs)or(n_sample, n_childs + n_parents)depending on the implementation ofP.sample.P.psis a tensor with shape(n_sample,)(often log-probabilities).
How sampling is performed¶
Compute ancestor order using
get_ancestor_order().Deep-copy the needed probability objects.
Build a lookup
var_to_sourceto find where each variable’s samples are stored.For each node in topological order:
if the node has no parents: call
P.sample(n_sample).if parents exist: assemble parent sample matrix
Cs_parof shape(n_sample, n_parents)and callP.sample(Cs_par).
Important¶
The module assumes each variable appears as a child of exactly one probability object. If a variable is a child in multiple objects, an
AssertionErroris raised.
Forward sampling with evidence¶
Evidence is provided as a table (typically a pandas DataFrame) whose columns are variable names and whose rows are evidence scenarios.
Two implementations are included:
sample_evidence_v0(): usesprob.sample_evidencewhen parents exist (vectorised), and usesprob.log_probfor observed children.sample_evidence(): uses onlyprob.sample(noprob.sample_evidence), which is sometimes easier to maintain/debug.
- sample_evidence_v0(probs: dict, query_nodes: list[str] | set[str], n_sample: int, evidence_df) dict¶
Forward-sample all ancestors of
query_nodesunder multiple evidence rows.Parameters¶
- probs
Mapping from node name → probability object.
- query_nodes
Node names of interest.
- n_sample
Number of samples per evidence row.
- evidence_df
A pandas-like DataFrame. Each column name must match a variable name. Shape
(n_evi, n_evidence_vars).
Returns¶
- dict
{node_name: prob_object}for the ancestral subgraph. Each returned object contains:prob_object.Csof shape(n_evi, n_sample, n_childs + n_parents)(or(n_evi, n_sample, n_childs)for root nodes / special cases)prob_object.psof shape(n_evi, n_sample)containing log-probabilities
Observed child handling¶
If a node is observed (its name is a column in
evidence_df), this function:sets child samples to the observed value repeated over samples, and
computes
psby evaluatingprob.log_probon the assembled[childs | parents]rows.
Parent handling¶
For each parent variable:
if the parent is observed in
evidence_df, the observed values are used,otherwise, sampled values from earlier nodes are used.
Notes¶
Evidence values are converted to torch tensors.
This implementation expects
prob.sample_evidence(Cs_pars)to accept parent samples of shape(n_evi, n_sample, n_parents).
- sample_evidence(probs: dict, query_nodes: list[str] | set[str], n_sample: int, evidence_df) dict¶
Forward-sample all ancestors of
query_nodesunder multiple evidence rows using onlyprob.sample(noprob.sample_evidence).Parameters¶
- probs
Mapping from node name → probability object.
- query_nodes
Node names of interest.
- n_sample
Number of samples per evidence row.
- evidence_df
A pandas-like DataFrame with evidence columns.
Returns¶
- dict
{node_name: prob_object}for the ancestral subgraph. Each returned object contains:prob_object.Csof shape(n_evi, n_sample, n_childs + n_parents)(non-root nodes) or(n_evi, n_sample, n_childs)(root / observed child cases)prob_object.psof shape(n_evi, n_sample)containing log-probabilities
Implementation sketch¶
Root nodes are sampled by generating
n_evi * n_samplesamples and reshaping to(n_evi, n_sample, ...).Non-root nodes build a parent matrix
Cs_par_flatof shape(n_evi*n_sample, n_parents), then callprob.sample(Cs_pars=Cs_par_flat), and reshape outputs back to evidence form.Observed nodes compute log-probability via
prob.log_probwithout sampling.