Introduction to Causal Graphs

To generate causally-simulated data effectively, it is essential to understand how to represent cause-and-effect relationships using causal graphs. We use the igraph package for creating and manipulating network structures and the ggnetwork package for visualizing these networks within the ggplot2 framework of the tidyverse package. Below is how to load these libraries in R:

library(igraph)
library(ggnetwork)
library(tidyverse)

We will use \(X\) and \(Y\) respectively to denote a cause and an effect. Both are represented as vertices in a graph. If there is a cause-and-effect relationship between both, then an edge is drawn from \(X\) to \(Y\). To do this, we need a 2-column data frame: (1) from, and (2) to.

# Create a data frame for the X and Y relationship
d <- data.frame(from = "X", to = "Y")
from to
X Y

We convert the data frame into an igraph object as a directed graph.

# Convert the data frame into an igraph object.
g <- graph_from_data_frame(d, directed = TRUE)
print(g)
## IGRAPH ae95e2f DN-- 2 1 -- 
## + attr: name (v/c)
## + edge from ae95e2f (vertex names):
## [1] X->Y

To visualize the graph, we need to automatically determine the coordinates to plot \(X\) and \(Y\) and draw a line from \(X\) to \(Y\).

# Lay out the graph as a tree
g_layout <- layout_as_tree(g)

# Determine the coordinates with ggnetwork
set.seed(1)
g_coord <- ggnetwork(g, layout = g_layout)
x y name xend yend
2 0.5 1 X 0.5 0.025
1 0.5 1 X 0.5 1.000
21 0.5 0 Y 0.5 0.000

To draw the graph, we utilize ggplot for its flexibility in customizing graph aesthetics. Use closed and curved arrows to clearly indicate the direction of the causal effect and helps prevent overlapping of arrows.

We may need to make the tree layout horizontal. This orientation helps in visualizing the causal flow from left to right, making it easier to follow.

# Define the plot area
g_plot <- ggplot(g_coord, aes(x, y, xend = xend, yend = yend))

# Draw edges with closed, curved arrows to emphasize direction
g_plot <- g_plot + geom_edges(arrow = arrow(type = "closed"), curvature = 0.15)

# Add node labels
g_plot <- g_plot + geom_nodelabel(aes(label = name))

# Make the tree layout horizontal
g_plot <- g_plot + coord_flip()
g_plot <- g_plot + scale_y_reverse()

# Apply a minimal theme
g_plot <- g_plot + theme_void()

# Display the graph
print(g_plot)

Vertex and edge

A variable is represented as a vertex, and a path is represented as 1 or more edges. Two variables may have a path between them. It means \(X\) and \(Y\) are dependent.

It is also possible for two variables to have no path between them. It means \(X\) and \(Y\) are independent.

Path

If two variables have a path between them, there are four possible paths. First, a path consists of a single edge, which we call a causal path.