Authors: The Palaeoverse Development Team
Last updated: 2024-04-16
palaeoverse
is an R package developed by
palaeobiologists, for palaeobiologists. The aim of the package is to
generate a community-driven toolkit of generic functions for the
palaeobiological community. The package does not provide implementations
of statistical analyses, rather it provides auxiliary functions to help
streamline analyses and improve code readability and reproducibility.
This vignette (or tutorial if you prefer) is provided to guide you
through the installation process and some of the functionality available
within palaeoverse
. To do so, we will focus on a usage
example looking at latitudinal trends in Phanerozoic reefs.
The palaeoverse
package can be installed via CRAN or its
dedicated GitHub
repository if the development version is preferred. To install via
the CRAN, simply use:
install.packages("palaeoverse")
To install the development version, first install the
devtools
package, and then use install_github
to install palaeoverse
directly from GitHub.
install.packages("devtools")
::install_github("palaeoverse/palaeoverse") devtools
You can now load palaeoverse
using the default
library
function:
library(palaeoverse)
Before we get onto the good stuff, the development team has a
small request. If you use palaeoverse
in your
research, please cite the associated publication. This will help us to
continue our work in supporting you to do yours. You can access the
appropriate citation via:
citation("palaeoverse")
Functions in palaeoverse
are designed around the
assumption that most users’ data are stored in an occurrence
data.frame
. As such, this is the expected input data format
for most functions. We provide two example occurrence datasets
(tetrapods
and reefs
) within the package from
different data sources. In this vignette, we will focus on the
reefs
dataset which is a compilation of Phanerozoic reef
occurrences (n = 4,363) from the PaleoReefs
Database (Kiessling & Krause, 2022).
# Load the dataset
data(reefs)
# View the first five rows & columns of reefs
1:5, 1:5] reefs[
## r_number name formation system series
## 1 1 Tilkideligi Tepe Triassic Upper Triassic
## 2 2 Hydra Pantokrator Limestone Triassic Upper Triassic
## 3 3 Shiraho, W-Pacific Neogene Holocene
## 4 4 Aksu-Terziler area Triassic Upper Triassic
## 5 5 Adnet, Salzburg Triassic Upper Triassic
Note the structure of the dataset. Each row contains a unique occurrence and each column contains a unique variable.
Reefs have long been considered good tracers of past tropical and subtropical conditions (Kiessling, 2001; Perrin & Kiessling, 2010). More specifically, their distribution has historically been used to infer the latitudinal extent of tropical and subtropical climatic conditions in deep time. This is of course an oversimplification given all we know about modern reefs systems and their engineers. However, for the purpose of this example, let’s not get weighed down by the finer details.
In this example, we will use the reefs
dataset to
examine latitudinal trends in Phanerozoic reefs and demonstrate some of
the functionality available in palaeoverse
.
Let’s start by exploring the dataset. Before conducting any analyses, it is always a good idea to explore your data and understand what you are working with. Details related to the example datasets can be accessed via the usual documentation call:
# Call documentation for dataset
?reefs# You can also use
help(reefs)
Let’s start by exploring the data a little:
# How many reefs are there in the dataset?
# Each row represents an individual reef.
nrow(reefs)
## [1] 4363
Not bad (n = 4363), that’s quite a bit of data to play with!
Remember that these are reefs (whole ecosystems), not individual
occurrences of fossil organisms like in tetrapods
. But how
much does this vary across time? We can make use of the
group_apply
wrapper function to run a function across
subsets of data.
# How many reefs per interval?
# Let's group by the interval column to test
<- group_apply(occdf = reefs, group = "interval", fun = nrow)
reef_counts head(reef_counts)
## nrow interval
## 1 2 (early?) Toarcian
## 2 1 Aalenian-Bajocian
## 3 4 Aalenian/Bajocian
## 4 1 Aalenian/lower Bajocian
## 5 1 Abadehian
## 6 6 Aeronian
Oh dear, we’ve just hit a common issue… our reef data does not
conform to a common time scale! Some reefs are resolved to stage level,
while others are resolved to a coarser or finer temporal resolution. In
some instances, regional names are also used, while international names
are used in others. This makes it challenging to make comparisons
through time… one solution could be to use age values to bin the reefs
into time bins, but no age range data is provided in the dataset
(i.e. “max_ma” and “min_ma”). We must therefore use the names of each
interval to link to a common time scale, the ‘International Geological
Stages’ established by the International Commission on Stratigraphy
(ICS). Fortunately, we can make use of the look_up
function
to do so. This function can be used to assign international geological
stages and numeric ages to occurrences, or use a user-defined interval
key to link to a common time scale. Here, we will use the example
interval_key
available in palaeoverse
to
assign international geological stage names and numeric ages.
# Load the interval key
data("interval_key")
# Assign a common time scale based on an interval key
<- look_up(occdf = reefs,
reefs early_interval = "interval",
late_interval = "interval",
int_key = interval_key)
# Example output
100:110, 15:19] reefs[
## early_stage late_stage interval_max_ma interval_mid_ma interval_min_ma
## 100 Ladinian Carnian 242.0 234.50 227.0
## 101 Norian Norian 227.0 217.75 208.5
## 102 Rhaetian Rhaetian 208.5 204.90 201.3
## 103 Ladinian Carnian 242.0 234.50 227.0
## 104 Norian Rhaetian 227.0 214.15 201.3
## 105 Anisian Anisian 247.2 244.60 242.0
## 106 Rhaetian Rhaetian 208.5 204.90 201.3
## 107 Sinemurian Sinemurian 199.3 195.05 190.8
## 108 Oxfordian Oxfordian 163.5 160.40 157.3
## 109 Tithonian Tithonian 152.1 148.55 145.0
## 110 Tithonian Tithonian 152.1 148.55 145.0
Great! We now have a common time scale and numeric ages. However,
some of our reefs seem to range through several stages and we can’t
count just by the early or late stage. Our data still need to be binned.
Let’s check out the time_bins
and bin_time
functions.
# Now we have numeric ages for our data, we can easily
# remove pre-Phanerozoic data to focus our study
<- subset(reefs, interval_max_ma <= 541)
reefs # Extract Phanerozoic stage-level stages for time bins
<- time_bins(interval = "Phanerozoic", rank = "stage")
bins # Bin data
# bin_time requires "max_ma" and "min_ma" columns
# Rename columns in reefs
colnames(reefs)[which(colnames(reefs) == "interval_max_ma")] <- "max_ma"
colnames(reefs)[which(colnames(reefs) == "interval_min_ma")] <- "min_ma"
# Five methods exist in bin_time for binning occurrence data
# You can see details on each via ?bin_time
# Bin by midpoint age
bin_time(occdf = reefs, bins = bins, method = "mid")
# Bin by overlap majority
bin_time(occdf = reefs, bins = bins, method = "majority")
# Bin into every bin within age range
bin_time(occdf = reefs, bins = bins, method = "all")
# Bin randomly into bins within age range (equal probability)
bin_time(occdf = reefs, bins = bins, method = "random", reps = 10)
# Bin randomly into bins within age range (uniform probability)
bin_time(occdf = reefs, bins = bins, method = "point", reps = 10)
# Let's go with "all" for this example!
<- bin_time(occdf = reefs, bins = bins, method = "all") reefs
Note that the number of rows of reefs
have now
increased. This is because the “all” method in bin_time
duplicates an occurrence for every bin the age range overlaps with.
Let’s check those reef numbers through time again using our binned data:
# Count number of occurrences per interval
<- group_apply(occdf = reefs, group = "interval_mid_ma", fun = nrow)
reefs_time # Check output
head(reefs_time)
## nrow interval_mid_ma
## 1 9 0.00585
## 2 85 0.07035
## 3 10 0.39285
## 4 17 0.4515
## 5 68 1.29585
## 6 9 1.3545
That’s more like it! But, let’s not forget it’s always useful to
visualise our data. Why don’t we plot the number of reefs through time?
We can even make use of the axis_geo
function to add the
Geological Time Scale to our plot.
# Plot data
plot(x = reefs_time$interval_mid_ma,
y = reefs_time$nrow,
xlab = "Time (Ma)",
ylab = "Number of reefs per stage",
xlim = c(541, 0),
xaxt = "n",
type = "b", pch = 20)
# Add axis of the geological time scale
axis_geo(side = 1, intervals = "periods")
You would (and should) of course want to explore your data a little more than this. However, for the sake of brevity, let’s focus on our research theme: latitudinal trends in Phanerozoic reefs.
When studying modern latitudinal trends (e.g. in biodiversity),
researchers can use the geographic coordinates of their samples to
conduct analyses. However, as the continents shift through time,
palaeobiologists must reconstruct the palaeogeographic coordinates of
fossil localities. To do so, plate rotation models are used (e.g. the
PALEOMAP model) to rotate the modern coordinates of fossil localities to
their location at time of deposition. The palaeorotate
function allows the user to do so using either the ‘point’ or ‘grid’
method. The first approach makes use of the GPlates Web Service and
allows point data to be rotated to specific ages using the available
models (see the GPlates Web Service
documentation). The second approach uses reconstruction files of
pre-generated palaeocoordinates to spatiotemporally link occurrences’
modern coordinates and age estimates with their respective
palaeocoordinates. Let’s give it a shot:
# Palaeorotate occurrences
<- palaeorotate(occdf = reefs, age = "bin_midpoint",
reefs method = "point", model = "PALEOMAP")
# Check palaeocoordinates
head(reefs[, c("p_lng", "p_lat")])
## p_lng p_lat
## 1 21.3630 10.0755
## 2 22.1341 2.3106
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 26.3041 -0.3093
Now let’s plot that palaeolatitudinal distribution!
# Plot data
plot(x = reefs$bin_midpoint,
y = reefs$p_lat,
xlab = "Time (Ma)",
ylab = "Palaeolatitude (\u00B0)",
xlim = c(541, 0),
xaxt = "n",
type = "p", pch = 20)
axis_geo(side = 1, intervals = "periods")
Neat! But it’s hard to draw any trends from this data (there is a lot of it!). We can do better… let’s summarise this information… if we assume that reefs are broadly limited to tropical and subtropical climatic conditions, the most poleward reef in each time bin should give us an estimate of the palaeolatitudinal extent of these climatic conditions.
# Let's first assume hemispheric symmetry and convert
# palaeolatitudes to absolute palaeolatitudes
$p_lat <- abs(reefs$p_lat)
reefs# Now we can calculate the most poleward latitude per time bin
# Extract unique interval midpoints
<- sort(unique(reefs$bin_midpoint))
midpoints # Calculate the maximum palaeolatitude for each time bin
<- sapply(X = midpoints, FUN = function(x) {
reefs_max max(reefs[which(reefs$bin_midpoint == x), ]$p_lat, na.rm = TRUE)
} )# Plot data
plot(x = midpoints,
y = reefs_max,
xlab = "Time (Ma)",
ylab = "Palaeolatitude (\u00B0)",
xlim = c(541, 0),
xaxt = "n",
type = "b",
pch = 20)
# Add axis of the geological time scale
axis_geo(side = 1, intervals = "periods")
That’s definitely much clearer! Should we stop there? Well… how about one last thing… let’s consider how plate rotation model choice might impact these results using three different models: GOLONKA, PALEOMAP, and MERDITH2021 (Wright et al. 2013; Scotese & Wright, 2018; Merdith et al. 2021).
# We can call multiple models at once with palaeorotate
# First, let's define the models...
<- c("GOLONKA", "PALEOMAP", "MERDITH2021")
models # And now palaeorotate!
<- palaeorotate(occdf = reefs, age = "bin_midpoint",
reefs method = "point", model = models)
# Check palaeocoordinates
# When multiple models are called, the name of the model
# is added as a suffix to p_lng and p_lat
head(reefs[, c("p_lng_PALEOMAP", "p_lat_PALEOMAP",
"p_lng_GOLONKA", "p_lat_GOLONKA",
"p_lng_MERDITH2021", "p_lat_MERDITH2021")])
## p_lng_PALEOMAP p_lat_PALEOMAP p_lng_GOLONKA p_lat_GOLONKA p_lng_MERDITH2021
## 1 21.3630 10.0755 35.6565 8.3384 34.4223
## 2 22.1341 2.3106 31.0982 4.8069 30.8863
## 3 NA NA 124.2500 24.3330 124.2489
## 4 NA NA 124.2500 24.3330 124.2493
## 5 NA NA 124.2500 24.3330 124.2498
## 6 26.3041 -0.3093 38.0979 4.9270 36.3998
## p_lat_MERDITH2021
## 1 12.1964
## 2 4.7404
## 3 24.3367
## 4 24.3352
## 5 24.3337
## 6 4.5809
Now we’ve palaeorotated our data, let’s repeat what we did earlier and calculate the most poleward reef occurrence for each model.
# Let's code a little helper function to begin with.
# This is generally useful when repeating code several times!
<- function(occdf, midpoint, p_lat) {
p_lat_max # Get absolute palaeolatitudes
<- abs(occdf[, p_lat])
occdf[, p_lat] # Extract unique bin midpoints
<- sort(unique(occdf[, midpoint]))
midpoints # Calculate maximum palaeolatitude for each bin
<- sapply(X = midpoints, FUN = function(x) {
lat_max max(occdf[which(occdf[, midpoint] == x), ][, p_lat], na.rm = TRUE)
})# Bind data
<- cbind.data.frame(midpoints, lat_max)
lat_max # Return df
return(lat_max)
}# Calculate maximum palaeolatitude of reefs for each time bin for each model
<- p_lat_max(occdf = reefs,
paleomap midpoint = "bin_midpoint",
p_lat = "p_lat_PALEOMAP")
<- p_lat_max(occdf = reefs,
golonka midpoint = "bin_midpoint",
p_lat = "p_lat_GOLONKA")
<- p_lat_max(occdf = reefs,
merdith midpoint = "bin_midpoint",
p_lat = "p_lat_MERDITH2021")
# Set up plot
plot(x = NULL,
y = NULL,
xlab = "Time (Ma)",
ylab = "Palaeolatitude (\u00B0)",
xlim = c(541, 0),
ylim = c(0, 90),
xaxt = "n")
# Plot maximum palaeolatitudes for each model
lines(x = paleomap$midpoints, y = paleomap$lat_max,
type = "o", col = "#e41a1c", pch = 15)
lines(x = golonka$midpoints, y = golonka$lat_max,
type = "o", col = "#377eb8", pch = 16)
lines(x = merdith$midpoints, y = merdith$lat_max,
type = "o", col = "#4daf4a", pch = 17)
# Add legend
legend(530, 15, legend=c("PALEOMAP", "GOLONKA", "MERDITH2021"),
col = c("#e41a1c", "#377eb8", "#4daf4a"),
lty = 1, pch = c(15, 16, 17), bty = "n")
# Add geological time scale axis
axis_geo(side = 1, intervals = "periods")