Authors: Palaeoverse Development Team
Last updated: 2024-09-16
rmacrostrat
is an R package which allows users to easily
retrieve geological data from the Macrostrat database and facilitates
analyses of these data within the R environment. This vignette (or
tutorial, if you prefer) is provided to guide you through the
installation process and some of the functionality available within
rmacrostrat
. Specifically, we will focus on obtaining and
plotting a geologic column, containing stratigraphic and lithological
information for the San Juan Basin, located in the southwestern United
States.
The rmacrostrat
package can be installed via CRAN, or
its dedicated GitHub repository
if the development version is preferred. To install via CRAN, simply
use:
To install the development version, first install the
devtools
package, and then use install_github
to install rmacrostrat
directly from GitHub.
You can now load rmacrostrat
using the default
library
function, alongside some packages we will need for
plotting, namely ggplot2
, ggrepel
, and
deeptime
(don’t forget to install these packages if you
don’t have them already!):
library(rmacrostrat)
# Install and load data visualization packages
library(ggplot2)
library(ggrepel)
library(deeptime)
Before we get into the good stuff, the development team has a
small request. If you use rmacrostrat
in your
research, please cite the associated publication. This will help us to
continue our work in supporting you to do yours. You can access the
appropriate citation via:
## To cite rmacrostrat in publications, use the following citation:
##
## Jones, L.A., Dean, C.D., Gearty, W., and Allen, B.J. 2024. rmacrostrat: An R package for
## accessing and retrieving data from the Macrostrat geological database. EarthArXiv, 1–29,
## doi: 10.31223/X5XX37
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {rmacrostrat: An R Package for fetching geologic data from the Macrostrat database.},
## author = {Lewis A. Jones and Christopher D. Dean and William Gearty and Bethany J. Allen},
## year = {2024},
## journal = {EarthArXiv},
## pages = {1–29},
## doi = {10.31223/X5XX37},
## }
The San Juan Basin is a large structural depression which spans parts
of New Mexico, Colorado, Utah, and Arizona. It is renowned for its oil
and natural gas reserves, but it is also well-known for its late
Cretaceous dinosaurs. In this vignette, we will investigate the geologic
attributes of the rocks of the San Juan Basin and use this information
to plot a stratigraphic column. In order to do this, we will make use of
the rmacrostrat
package to fetch data from the Macrostrat database.
Our first step is to search for geologic columns named “San Juan
Basin” and retrieve some basic information about them, using the
def_columns
function:
## col_id col_group_id col_name lat lng col_area max_thick ref_id status t_units
## 1 489 3 San Juan Basin 35.896 -108.068 35096.52 8040 1 active 47
## project_id notes
## 1 1 NA
From this call, we can see there is a single column named “San Juan
Basin”. We are also given some information about this column, such as
its geographic location (in the form of a latitude and longitude), the
area over which it spans in km2, its thickness in meters, and
the number of geologic units it contains (t_units
). We are
also given its col_id
, 489, which we can use to retrieve
more information through other functions in rmacrostrat
. We
will do this now, specifically using get_columns
:
We now have much more information relating to the San Juan Basin column. For example, we can look at the age range it spans:
## [1] 2050
## [1] 32.38
## [1] "Orosirian"
## [1] "Rupelian"
We can see that our column spans from 2050 to 32 million years ago, from the Orosirian (Paleoproterozoic) to the Rupelian (Cenozoic). Let’s take a look at the mix of lithologies contained in the column:
## [[1]]
## name type class prop lith_id
## 1 claystone siliciclastic sedimentary 0.0015 6
## 2 shale siliciclastic sedimentary 0.2867 8
## 3 siltstone siliciclastic sedimentary 0.0340 9
## 4 sandstone siliciclastic sedimentary 0.2773 10
## 5 arkose siliciclastic sedimentary 0.0177 11
## 6 conglomerate siliciclastic sedimentary 0.0606 14
## 7 limestone carbonate sedimentary 0.0771 30
## 8 dolomite carbonate sedimentary 0.0832 31
## 9 evaporite evaporite sedimentary 0.0019 34
## 10 gypsum evaporite sedimentary 0.0030 36
## 11 anhydrite evaporite sedimentary 0.0030 37
## 12 coal organic sedimentary 0.0039 38
## 13 granite plutonic igneous 0.0177 53
## 14 gneiss metamorphic metamorphic 0.0106 79
## 15 slate metasedimentary metamorphic 0.0071 82
## 16 phyllite metasedimentary metamorphic 0.0071 83
## 17 quartzite metasedimentary metamorphic 0.0147 85
## 18 clay siliciclastic sedimentary 0.0030 93
## 19 graywacke siliciclastic sedimentary 0.0213 113
## 20 arenite siliciclastic sedimentary 0.0097 134
## 21 syenogranite plutonic igneous 0.0035 145
## 22 quartz arenite siliciclastic sedimentary 0.0213 169
## 23 metavolcanic metaigneous metamorphic 0.0106 178
## 24 wacke siliciclastic sedimentary 0.0019 187
## 25 litharenite siliciclastic sedimentary 0.0213 198
So our column contains a total of 25 different lithologies, including sedimentary, metamorphic, and igneous rocks. We can quickly visualize the proportion of the column made up of these different rocks, colored by their class:
# Extract lithologies
lith_table <- san_juan$lith[[1]]
# Plot lithologies by proportion in San Juan Basin column
ggplot(lith_table, aes(y = prop, x = reorder(name, -prop))) +
# Plot bars
geom_bar(stat = "identity", aes(fill = class)) +
# Label axes
scale_x_discrete("Lithology") +
scale_y_continuous("Proportion of stratigraphic column") +
theme_classic() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
It seems the San Juan Basin is dominated by sedimentary rocks. Perhaps this is not surprising given that we know it is famed for fossils and energy resources! Speaking of these, let’s take a look at the column’s economic attributes:
## [[1]]
## name type class prop econ_id
## 1 oil reservoir hydrocarbon energy 0.0345 4
## 2 gas reservoir hydrocarbon energy 0.0345 5
## 3 gas indet. hydrocarbon energy 0.4138 8
## 4 oil indet. hydrocarbon energy 0.3103 9
## 5 uranium ore nuclear energy 0.1034 10
## 6 coal coal energy 0.0690 13
## 7 aquifer aquifer water 0.0345 17
Here we can see that not only does the San Juan Basin contain coal, oil, and natural gas, but it also has some uranium ore and aquifers. We can also see the number of Paleobiology Database collections linked to the San Juan Basin in our column information:
## [1] 646
To view more information about these fossil data, we need to use a
different rmacrostrat
function,
get_fossils
:
# Retrieve fossils associated with the San Juan Basin column
sj_fossils <- get_fossils(column_id = 489)
This table contains information about each of the ~650 Paleobiology Database collections associated with the San Juan Basin, including their ages and the number of occurrences they contain. Let’s find out how many fossil occurrences we have in total:
## [1] 2151
So over 2000 fossils are known from this single basin. We can also visualize the temporal distribution of these fossil collections, using the midpoint of their age ranges:
# Add midpoint age for plotting
sj_fossils$m_age <- (sj_fossils$b_age + sj_fossils$t_age) / 2
# Plot of fossil occurrence counts from the San Juan Basin through time
ggplot(sj_fossils, aes(x = m_age)) +
# Histogram with bars 5 million years wide
geom_histogram(binwidth = 5, boundary = 0) +
# Label y-axis
scale_y_continuous("Number of fossil collections") +
# Label x-axis and reverse direction
scale_x_reverse("Time (Ma)", breaks = c(250, 200, 150, 100, 50)) +
# Theming
theme_bw() +
# Add geological timescale
coord_geo()
From this plot, we can see that the San Juan Basin fossils range in age from the Permian to the Paleogene, but we have the highest concentration of fossil collections around the K-Pg boundary. Now that we have explored the data in the Macrostrat database on the San Juan Basin, it is time to plot our stratigraphic column.
To plot the stratigraphic column, we will need to obtain data for
each lithological unit contained within the San Juan Basin column. We
will do this using another rmacrostrat
function,
get_units
, and referencing the column_id
for
the San Juan Basin. To keep our column plot contained, we will limit it
to geological units which are Cretaceous in age:
# Using the column ID, retrieve the units in the San Juan Basin column
san_juan_units <- get_units(column_id = 489, interval_name = "Cretaceous")
# See the column names and number of rows
colnames(san_juan_units)
## [1] "unit_id" "section_id" "col_id" "project_id" "col_area"
## [6] "unit_name" "strat_name_id" "Mbr" "Fm" "Gp"
## [11] "SGp" "t_age" "b_age" "max_thick" "min_thick"
## [16] "outcrop" "pbdb_collections" "pbdb_occurrences" "lith" "environ"
## [21] "econ" "measure" "notes" "color" "text_color"
## [26] "t_int_id" "t_int_name" "t_int_age" "t_prop" "units_above"
## [31] "b_int_id" "b_int_name" "b_int_age" "b_prop" "units_below"
## [36] "strat_name_long" "refs" "clat" "clng" "t_plat"
## [41] "t_plng" "b_plat" "b_plng"
## [1] 17
We now have information for each of the 17 Cretaceous geologic units
contained within the San Juan Basin, including the age of the top and
bottom of each, which is what we will use to plot our stratigraphic
column. To reiterate, the y-axis on our plot is going to be time rather
than height or thickness, so any unconformities present in the column
will be evident. We can start out very simply, by using
geom_rect
in ggplot2
to plot a rectangle
corresponding to the age range of each unit in the section.
# Plot stratigraphic column
ggplot(san_juan_units, aes(ymin = b_age, ymax = t_age, xmin = 0, xmax = 1)) +
# Plot units
geom_rect(alpha = 0.5, color = "black") +
# Reverse direction of y-axis
scale_y_reverse(limits = c(145, 66), n.breaks = 10, name = "Time (Ma)") +
# Theming
theme_classic() +
theme(legend.position = "none",
axis.line.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
We can already see something that roughly resembles a stratigraphic column. One thing to notice here is that we seem to have some overlap between our units, resulting in a darker shade of gray. We can take a closer look at this by dodging the units horizontally.
# Plot stratigraphic column
ggplot(san_juan_units, aes(ymin = b_age, ymax = t_age, xmin = 0, xmax = 1)) +
# Plot units with position dodge
geom_rect(position = "dodge2", alpha = 0.5, color = "black") +
# Reverse direction of y-axis
scale_y_reverse(limits = c(145, 66), n.breaks = 10, name = "Time (Ma)") +
# Theming
theme_classic() +
theme(legend.position = "none",
axis.line.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
Indeed, there are two units that overlap with each other: the Gallup Sandstone and the Upper Shale Member of the Mancos Shale. We can make these units plot next to each other by adding columns to our dataframe which define the x-axis values.
# Specify x_min and x_max in dataframe
san_juan_units$x_min <- 0
san_juan_units$x_max <- 1
# Tweak values for overlapping units
san_juan_units$x_max[10] <- 0.5
san_juan_units$x_min[11] <- 0.5
# Plot stratigraphic column
ggplot(san_juan_units, aes(ymin = b_age, ymax = t_age,
xmin = x_min, xmax = x_max)) +
# Plot units
geom_rect(alpha = 0.5, color = "black") +
# Reverse direction of y-axis
scale_y_reverse(limits = c(145, 66), n.breaks = 10, name = "Time (Ma)") +
# Theming
theme_classic() +
theme(legend.position = "none",
axis.line.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
However, there is a lot we can do to improve the aesthetics of our
plot. For example, the column named color
in our dataframe
specifies the hexadecimal color corresponding to the dominant lithology
of the unit. We can use this to color-code the units by lithology.
# Plot stratigraphic column
ggplot(san_juan_units, aes(ymin = b_age, ymax = t_age,
xmin = x_min, xmax = x_max)) +
# Plot units, colored by rock type
geom_rect(fill = san_juan_units$color, color = "black") +
# Reverse direction of y-axis
scale_y_reverse(limits = c(145, 66), n.breaks = 10, name = "Time (Ma)") +
# Theming
theme_classic() +
theme(legend.position = "none",
axis.line.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
Great! Now let’s add labels indicating the names of the different units.
# Add midpoint age for plotting
san_juan_units$m_age <- (san_juan_units$b_age + san_juan_units$t_age) / 2
# Plot stratigraphic column
ggplot(san_juan_units, aes(ymin = b_age, ymax = t_age,
xmin = x_min, xmax = x_max)) +
# Plot units, colored by rock type
geom_rect(fill = san_juan_units$color, color = "black") +
# Add text labels
geom_text_repel(aes(x = x_max, y = m_age, label = unit_name),
size = 4, hjust = 0, force = 2,
min.segment.length = 0, direction = "y",
nudge_x = rep_len(x = c(2, 3), length.out = 17)) +
# Reverse direction of y-axis
scale_y_reverse(limits = c(145, 66), n.breaks = 10, name = "Time (Ma)") +
# Theming
theme_classic() +
theme(legend.position = "none",
axis.line.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
And finally, we can add a column along the y-axis indicating the
different stages of the Cretaceous, using the R package
deeptime
:
# Plot stratigraphic column
ggplot(san_juan_units, aes(ymin = b_age, ymax = t_age,
xmin = x_min, xmax = x_max)) +
# Plot units, colored by rock type
geom_rect(fill = san_juan_units$color, color = "black") +
# Add text labels
geom_text_repel(aes(x = x_max, y = m_age, label = unit_name),
size = 3.5, hjust = 0, force = 2,
min.segment.length = 0, direction = "y",
nudge_x = rep_len(x = c(2, 3), length.out = 17)) +
# Reverse direction of y-axis
scale_y_reverse(limits = c(145, 66), n.breaks = 10, name = "Time (Ma)") +
# Theming
theme_classic() +
theme(legend.position = "none",
axis.line.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
# Add geological time scale
coord_geo(pos = "left", dat = list("stages"), rot = 90)