To generate causally-simulated data effectively, it is essential to
understand how to represent cause-and-effect relationships using causal
graphs. We use the igraph
package for creating and
manipulating network structures and the ggnetwork
package
for visualizing these networks within the ggplot2 framework of the
tidyverse
package. Below is how to load these libraries in
R:
library(igraph)
library(ggnetwork)
library(tidyverse)
We will use \(X\) and \(Y\) respectively to denote a cause and an effect. Both are represented as vertices in a graph. If there is a cause-and-effect relationship between both, then an edge is drawn from \(X\) to \(Y\). To do this, we need a 2-column data frame: (1) from, and (2) to.
# Create a data frame for the X and Y relationship
d <- data.frame(from = "X", to = "Y")
from | to |
---|---|
X | Y |
We convert the data frame into an igraph
object as a
directed graph.
# Convert the data frame into an igraph object.
g <- graph_from_data_frame(d, directed = TRUE)
print(g)
## IGRAPH ae95e2f DN-- 2 1 --
## + attr: name (v/c)
## + edge from ae95e2f (vertex names):
## [1] X->Y
To visualize the graph, we need to automatically determine the coordinates to plot \(X\) and \(Y\) and draw a line from \(X\) to \(Y\).
# Lay out the graph as a tree
g_layout <- layout_as_tree(g)
# Determine the coordinates with ggnetwork
set.seed(1)
g_coord <- ggnetwork(g, layout = g_layout)
x | y | name | xend | yend | |
---|---|---|---|---|---|
2 | 0.5 | 1 | X | 0.5 | 0.025 |
1 | 0.5 | 1 | X | 0.5 | 1.000 |
21 | 0.5 | 0 | Y | 0.5 | 0.000 |
To draw the graph, we utilize ggplot
for its flexibility
in customizing graph aesthetics. Use closed and curved arrows to clearly
indicate the direction of the causal effect and helps prevent
overlapping of arrows.
We may need to make the tree layout horizontal. This orientation helps in visualizing the causal flow from left to right, making it easier to follow.
# Define the plot area
g_plot <- ggplot(g_coord, aes(x, y, xend = xend, yend = yend))
# Draw edges with closed, curved arrows to emphasize direction
g_plot <- g_plot + geom_edges(arrow = arrow(type = "closed"), curvature = 0.15)
# Add node labels
g_plot <- g_plot + geom_nodelabel(aes(label = name))
# Make the tree layout horizontal
g_plot <- g_plot + coord_flip()
g_plot <- g_plot + scale_y_reverse()
# Apply a minimal theme
g_plot <- g_plot + theme_void()
# Display the graph
print(g_plot)
A variable is represented as a vertex, and a path is represented as 1 or more edges. Two variables may have a path between them. It means \(X\) and \(Y\) are dependent.
It is also possible for two variables to have no path between them. It means \(X\) and \(Y\) are independent.
If two variables have a path between them, there are four possible paths. First, a path consists of a single edge, which we call a causal path.