Title: | The Directed Prediction Index for Quasi-Causal Inference with Cross-Sectional Data |
Version: | 2025.9 |
Date: | 2025-09-20 |
Maintainer: | Han Wu Shuang Bao <baohws@foxmail.com> |
Description: | The Directed Prediction Index ('DPI') is a quasi-causal inference method for cross-sectional data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of possible confounders, it suggests a plausible (admissible) direction of influence from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/. |
License: | GPL-3 |
Encoding: | UTF-8 |
URL: | https://psychbruce.github.io/DPI/ |
BugReports: | https://github.com/psychbruce/DPI/issues |
Depends: | R (≥ 4.0.0) |
Imports: | glue, crayon, cli, ggplot2, cowplot, qgraph, bnlearn, MASS |
Suggests: | bruceR, aplot |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | no |
Packaged: | 2025-09-20 15:05:57 UTC; Bruce |
Author: | Han Wu Shuang Bao |
Repository: | CRAN |
Date/Publication: | 2025-09-20 15:20:02 UTC |
DPI: The Directed Prediction Index for Quasi-Causal Inference with Cross-Sectional Data
Description
The Directed Prediction Index ('DPI') is a quasi-causal inference method for cross-sectional data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of possible confounders, it suggests a plausible (admissible) direction of influence from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.
Author(s)
Maintainer: Han Wu Shuang Bao baohws@foxmail.com (ORCID)
See Also
Useful links:
The Directed Prediction Index (DPI).
Description
The Directed Prediction Index (DPI) is a quasi-causal inference method for cross-sectional data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) vs. predictor (X) variables in regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of possible confounders, it suggests a plausible (admissible) direction of influence from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.
Usage
DPI(
model,
y,
x,
data = NULL,
k.cov = 1,
n.sim = 1000,
alpha = 0.05,
seed = NULL,
progress,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
model |
Model object ( |
y |
Dependent (outcome) variable. |
x |
Independent (predictor) variable. |
data |
[Optional] Defaults to |
k.cov |
Number of random covariates (simulating potential omitted variables) added to each simulation sample.
|
n.sim |
Number of simulation samples. Defaults to |
alpha |
Significance level for computing the
|
seed |
Random seed for replicable results. Defaults to |
progress |
Show progress bar. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
Value
Return a data.frame of simulation results:
-
DPI
-
= Direction * Strength
-
= (R2.Y - R2.X) * (1 - tanh(p.beta.xy/alpha/2))
-
-
delta.R2
-
R2.Y - R2.X
-
-
R2.Y
-
R^2
of regression model predicting Y using X and all other covariates
-
-
R2.X
-
R^2
of regression model predicting X using Y and all other covariates
-
-
t.beta.xy
-
t value for coefficient of X predicting Y (always equal to t value for coefficient of Y predicting X) when controlling for all other covariates
-
-
p.beta.xy
-
p value for coefficient of X predicting Y (always equal to p value for coefficient of Y predicting X) when controlling for all other covariates
-
-
df.beta.xy
residual degree of freedom (df) of
t.beta.xy
-
r.partial.xy
partial correlation (always with the same t value as
t.beta.xy
) between X and Y when controlling for all other covariates
See Also
Examples
# input a fitted model
model = lm(Ozone ~ ., data=airquality)
DPI(model, y="Ozone", x="Solar.R", seed=1) # DPI > 0
DPI(model, y="Ozone", x="Wind", seed=1) # DPI > 0
DPI(model, y="Wind", x="Solar.R", seed=1) # unrelated
# input raw data, test with more random covs
DPI(data=airquality, y="Ozone", x="Solar.R", k.cov=10, seed=1)
DPI(data=airquality, y="Ozone", x="Wind", k.cov=10, seed=1)
DPI(data=airquality, y="Wind", x="Solar.R", k.cov=10, seed=1)
The DPI curve analysis.
Description
The DPI curve analysis.
Usage
DPI_curve(
model,
y,
x,
data = NULL,
k.covs = 1:10,
n.sim = 1000,
alpha = 0.05,
seed = NULL,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
model |
Model object ( |
y |
Dependent (outcome) variable. |
x |
Independent (predictor) variable. |
data |
[Optional] Defaults to |
k.covs |
An integer vector of number of random covariates
(simulating potential omitted variables)
added to each simulation sample.
Defaults to |
n.sim |
Number of simulation samples. Defaults to |
alpha |
Significance level for computing the
|
seed |
Random seed for replicable results. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
Value
Return a data.frame of DPI curve results.
See Also
Examples
model = lm(Ozone ~ ., data=airquality)
DPIs = DPI_curve(model, y="Ozone", x="Solar.R", seed=1)
plot(DPIs) # ggplot object
[S3 methods] for DPI()
and DPI_curve()
.
Description
summary(dpi)
-
Summarize DPI results. Return a list (class
summary.dpi
) of summarized results and raw DPI data.frame. print(summary.dpi)
-
Print DPI summary.
plot(dpi)
-
Plot DPI results. Return a
ggplot
object. print(dpi)
-
Print DPI summary and plot.
plot(dpi.curve)
-
Plot DPI curve analysis results. Return a
ggplot
object.
Usage
## S3 method for class 'dpi'
summary(object, ...)
## S3 method for class 'summary.dpi'
print(x, digits = 3, ...)
## S3 method for class 'dpi'
plot(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
## S3 method for class 'dpi'
print(x, digits = 3, ...)
## S3 method for class 'dpi.curve'
plot(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
Arguments
object |
Object (class |
... |
Other arguments (currently not used). |
x |
Object (class |
digits |
Number of decimal places. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
[S3 methods] for cor_network()
and dag_network()
.
Description
print(cor.net)
-
Plot (partial) correlation network results.
print(dag.net)
-
Plot Bayesian network (DAG) results.
Usage
## S3 method for class 'cor.net'
print(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
## S3 method for class 'dag.net'
print(
x,
file = NULL,
width = 6,
height = 4,
dpi = 500,
algorithm = names(x),
...
)
Arguments
x |
Object (class |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
... |
Other arguments (currently not used). |
algorithm |
[For |
Value
Invisibly return a grob
object ("Grid Graphical Object", or a list of them) that can be further reused in ggplot2::ggsave()
and cowplot::plot_grid()
.
Correlation and partial correlation networks.
Description
Correlation and partial correlation networks (also called Gaussian graphical models, GGMs).
Usage
cor_network(
data,
index = c("cor", "pcor"),
show.value = TRUE,
show.insig = FALSE,
show.cutoff = FALSE,
faded = FALSE,
node.text.size = 1.2,
node.group = NULL,
node.color = NULL,
edge.color.pos = "#0571B0",
edge.color.neg = "#CA0020",
edge.color.non = "#EEEEEEEE",
edge.label.mrg = 0.01,
title = NULL,
file = NULL,
width = 6,
height = 4,
dpi = 500,
...
)
Arguments
data |
Data. |
index |
Type of graph: |
show.value |
Show correlation coefficients and their significance on edges.
Defaults to |
show.insig |
Show edges with insignificant correlations (p > 0.05).
Defaults to |
show.cutoff |
Show cut-off values of correlations.
Defaults to |
faded |
Transparency of edges according to the effect size of correlation.
Defaults to |
node.text.size |
Scalar on the font size of node (variable) labels.
Defaults to |
node.group |
A list that indicates which nodes belong together, with each element of list as a vector of integers identifying the column numbers of variables that belong together. |
node.color |
A vector with a color for each element in |
edge.color.pos |
Color for (significant) positive values. Defaults to |
edge.color.neg |
Color for (significant) negative values. Defaults to |
edge.color.non |
Color for insignificant values. Defaults to |
edge.label.mrg |
Margin of the background box around the edge label. Defaults to |
title |
Plot title. |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
... |
Arguments passed on to |
Value
Return a list (class cor.net
) of (partial) correlation results and qgraph
object with its grob
(Grid Graphical Object).
See Also
Examples
# correlation network
cor_network(airquality)
cor_network(airquality, show.insig=TRUE)
# partial correlation network
cor_network(airquality, "pcor")
cor_network(airquality, "pcor", show.insig=TRUE)
Directed acyclic graphs (DAGs) via Bayesian networks (BNs).
Description
Directed acyclic graphs (DAGs) via Bayesian networks (BNs). It uses bnlearn::boot.strength()
to estimate the strength of each edge as its empirical frequency over a set of networks learned from bootstrap samples. It computes (1) the probability of each edge (modulo its direction) and (2) the probabilities of each edge's directions conditional on the edge being present in the graph (in either direction). Stability thresholds are usually set as 0.85
for strength (i.e., an edge appearing in more than 85% of BNs bootstrap samples) and 0.50
for direction (i.e., a direction appearing in more than 50% of BNs bootstrap samples) (Briganti et al., 2023). Finally, for each chosen algorithm, it returns the stable Bayesian network as the final DAG.
Usage
dag_network(
data,
algorithm = c("pc.stable", "hc", "rsmax2"),
algorithm.args = list(),
n.boot = 1000,
seed = NULL,
strength = 0.85,
direction = 0.5,
node.text.size = 1.2,
edge.width.max = 1.5,
edge.label.mrg = 0.01,
file = NULL,
width = 6,
height = 4,
dpi = 500,
verbose = TRUE,
...
)
Arguments
data |
Data. |
algorithm |
Structure learning algorithms for building Bayesian networks (BNs). Should be function name(s) from the Defaults to the most common algorithms:
|
algorithm.args |
An optional list of extra arguments passed to the algorithm. |
n.boot |
Number of bootstrap samples (for learning a more "stable" network structure). Defaults to |
seed |
Random seed for replicable results. Defaults to |
strength |
Stability threshold of edge strength: the minimum proportion (probability) of BNs (among the
|
direction |
Stability threshold of edge direction: the minimum proportion (probability) of BNs (among the
|
node.text.size |
Scalar on the font size of node (variable) labels.
Defaults to |
edge.width.max |
Maximum value of edge strength to scale all edge widths. Defaults to |
edge.label.mrg |
Margin of the background box around the edge label. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
verbose |
Print information about BN algorithm and number of bootstrap samples when running the analysis. Defaults to |
... |
Arguments passed on to |
Value
Return a list (class dag.net
) of Bayesian network results and qgraph
object with its grob
(Grid Graphical Object).
References
Briganti, G., Scutari, M., & McNally, R. J. (2023). A tutorial on Bayesian networks for psychopathology researchers. Psychological Methods, 28(4), 947–961. doi:10.1037/met0000479
Burger, J., Isvoranu, A.-M., Lunansky, G., Haslbeck, J. M. B., Epskamp, S., Hoekstra, R. H. A., Fried, E. I., Borsboom, D., & Blanken, T. F. (2023). Reporting standards for psychological network analyses in cross-sectional data. Psychological Methods, 28(4), 806–824. doi:10.1037/met0000471
Scutari, M., & Denis, J.-B. (2021). Bayesian networks: With examples in R (2nd ed.). Chapman and Hall/CRC. doi:10.1201/9780429347436
See Also
Examples
bn = dag_network(airquality, seed=1)
bn
# bn$pc.stable
# bn$hc
# bn$rsmax2
## All DAG objects can be directly plotted
## or saved with print(..., file="xxx.png")
# bn$pc.stable$DAG.edge
# bn$pc.stable$DAG.strength
# bn$pc.stable$DAG.direction
# bn$pc.stable$DAG
# ...
## Not run:
print(bn, file="airquality.png")
# will save three plots with auto-modified file names:
- "airquality_DAG.NET_BNs.01_pc.stable.png"
- "airquality_DAG.NET_BNs.02_hc.png"
- "airquality_DAG.NET_BNs.03_rsmax2.png"
# arrange multiple plots using aplot::plot_list()
# install.packages("aplot")
c1 = cor_network(airquality, "cor")
c2 = cor_network(airquality, "pcor")
bn = dag_network(airquality, seed=1)
p = aplot::plot_list(
c1$plot,
c2$plot,
bn$pc.stable$DAG$plot,
bn$hc$DAG$plot,
bn$rsmax2$DAG$plot,
design="111222
334455",
tag_levels="A"
) # return a patchwork object
ggsave(p, filename="p.png", width=12, height=8, dpi=500)
ggsave(p, filename="p.pdf", width=12, height=8)
## End(Not run)
Produce a symmetric correlation matrix from values.
Description
Produce a symmetric correlation matrix from values.
Usage
matrix_cor(...)
Arguments
... |
Correlation values to transform into the symmetric correlation matrix (by row). |
Value
Return a symmetric correlation matrix.
Examples
matrix_cor(
1.0, 0.7, 0.3,
0.7, 1.0, 0.5,
0.3, 0.5, 1.0
)
Simulate data from a multivariate normal distribution.
Description
Simulate data from a multivariate normal distribution.
Usage
sim_data(n, k, cor = NULL, exact = TRUE, seed = NULL)
Arguments
n |
Number of observations (cases). |
k |
Number of variables. Will be ignored if |
cor |
A correlation value or correlation matrix of the variables. Defaults to |
exact |
Ensure the sample correlation matrix to be exact as specified in |
seed |
Random seed for replicable results. Defaults to |
Value
Return a data.frame of simulated data.
See Also
Examples
d1 = sim_data(n=100, k=5, seed=1)
cor_network(d1)
d2 = sim_data(n=100, k=5, cor=0.2, seed=1)
cor_network(d2)
cor.mat = matrix_cor(
1.0, 0.7, 0.3,
0.7, 1.0, 0.5,
0.3, 0.5, 1.0
)
d3 = sim_data(n=100, cor=cor.mat, seed=1)
cor_network(d3)
Simulate experiment-like data with independent binary Xs.
Description
Simulate experiment-like data with independent binary Xs.
Usage
sim_data_exp(
n,
r.xy,
approx = TRUE,
tol = 0.01,
max.iter = 30,
verbose = FALSE,
seed = NULL
)
Arguments
n |
Number of observations (cases). |
r.xy |
A vector of expected correlations of each X (binary independent variable: 0 or 1) with Y. |
approx |
Make the sample correlation matrix approximate more to values as specified in |
tol |
Tolerance of absolute difference between specified and empirical correlations. Defaults to |
max.iter |
Maximum iterations for approximation. More iterations produce more approximate correlations, but the absolute differences will be convergent after about 30 iterations. Defaults to |
verbose |
Print information about iterations that satisfy tolerance. Defaults to |
seed |
Random seed for replicable results. Defaults to |
Value
Return a data.frame of simulated data.
See Also
Examples
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1)
cor(data) # tol = 0.01
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1,
verbose=TRUE)
cor(data) # print iteration information
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1,
verbose=TRUE, tol=0.001)
cor(data) # more approximate, though not exact
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1,
approx=FALSE)
cor(data) # far less exact