This is the first of a series of three vignettes introducing the R
package BCDAG
. In this vignette we focus on functions
rDAG()
and rDAGWishart()
which implement
random generation of DAG structures and DAG parameters under the
assumption that the joint distribution of variables X1,ββ¦,βXq
is Gaussian and the corresponding model (Choleski) parameters follow a
DAG-Wishart distribution. Finally, data generation from Gaussian DAG
models is described.
rDAG()
and
rDAGWishart()
Function rDAG()
can be used to randomly generate a DAG
structure πβ=β(V,βE),
where Vβ=β{1,ββ¦,βq}
and EβββVβ
Γβ
V is the
set of edges. rDAG()
has two arguments: the number of nodes
(variables) q and the prior
probability of edge inclusion wβββ[0,β1]; the latter can be tuned
to control the degree of sparsity in the resulting DAG. By fixing a
probability of edge inclusion wβ=β0.2, a DAG structure with qβ=β10 nodes can be generated as
follows:
DAG
#> 1 2 3 4 5 6 7 8 9 10
#> 1 0 0 0 0 0 0 0 0 0 0
#> 2 0 0 0 0 0 0 0 0 0 0
#> 3 0 0 0 0 0 0 0 0 0 0
#> 4 0 0 1 0 0 0 0 0 0 0
#> 5 1 0 0 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0 0 0 0
#> 7 1 0 1 0 0 0 0 0 0 0
#> 8 1 0 0 0 0 0 0 0 0 0
#> 9 0 0 0 1 0 0 1 0 0 0
#> 10 0 0 0 0 1 0 0 0 0 0
Output of rDAG()
is the 0-1 (q,βq) adjacency matrix of
the generated DAG, with element 1 at
position (u,βv)
indicating the presence of an edge uβββv. Notice that the
generated DAG is topologically ordered, meaning that edges are
allowed only from high to low nodes (nodes are labeled according to
rows/columns indexes); accordingly the DAG adjacency matrix is
lower-triangular.
Consider a Gaussian DAG model of the form where (L,βD) are model parameters providing the decomposition of the precision (inverse-covariance) matrix Ξ©β=βLDβ1Lβ€; specifically, L is a (q,βq) matrix of coefficients such that for each (u,βv)-element Luv with uββ βv, we have Luvββ β0 if and only if (u,βv)βββE, while Luuβ=β1 for each uβ=β1,ββ¦,βq; also, D is a (q,βq) diagonal matrix with (u,βu)-element Duu. The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG model:
where xβ=β(X1,ββ¦,βXq)β€; see also Castelletti & Mascaro (2021).
Function rDAGWishart
implements random sampling from
(L,βD)β|βπββΌβDAG-Wishart(acπ,βU),
where U is the rate
parameter (a (q,βq)
s.p.d. matrix) and acπ
(a (q,β1) vector) is the shape
parameter of the DAG-Wishart distribution. This class of distributions
was introduced by Ben David et al.Β (2015) as a conjugate prior for
Gaussian DAG model-parameters. In its compatible version (Peluso &
Consonni, 2020), elements of the vector parameter acπ
are uniquely determined from a single common shape parameter
aβ>βqβ
ββ
1.
Inputs of rDAGWishart
are: the number of samples n, the underlying DAG π, the common shape parameter a and the rate parameter U. Given the DAG π generated before, the following example
implements a single (nβ=β1)
draw from a compatible DAG-Wishart distribution with parameters aβ=βq, Uβ=βIq:
The output of rDAGWishart()
consists of two elements: a
(q,βq,βn)-dimensional
array collecting the n sampled
matrices L(1),ββ¦,βL(n)
and a (q,βq,βn)-dimensional
array collecting the n sampled
matrices D(1),ββ¦,βD(n).
We refer the reader to Castelletti & Mascaro (2021) and Castelletti
& Mascaro (2022+) for more details.
Data generation from a Gaussian DAG model is then straightforward. Recall that Ξ©β=βLDβ1Lβ€, where Ξ© is the inverse-covariance (precision) matrix of a multivariate Gaussian model satisfying the constraints imposed by a DAG. Accordingly, we can recover the precision and covariance matrices as:
Next, i.i.d. draws from a Gaussian DAG model can be obtained through
the function rmvnorm()
provided within the R package
mvtnorm
:
Ben-David E, Li T, Massam H, Rajaratnam B (2015). βHigh dimensional Bayesian inference for Gaussian directed acyclic graph models.β arXiv pre-print.
Cao X, Khare K, Ghosh M (2019). βPosterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.β The Annals of Statistics, 47(1), 319β348.
Castelletti F, Mascaro A (2021). βStructural learning and estimation of joint causal effects among network-dependent variables.β Statistical Methods & Applications, 30, 1289β1314.
Castelletti F, Mascaro A (2022). βBCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.β arXiv pre-print.
Peluso S, Consonni G (2020). βCompatible priors for model selection of high-dimensional Gaussian DAGs.β Electronic Journal of Statistics, 14(2), 4110β4132.