The purpose of this vignette is to illustrate how the Control Polygon Reduction (CPR) method can be used to select a set of knots defining B-spline to get a low degree of freedom and smooth fit to data. We start with a primer on B-splines and control polygons then the development and use of CPR.
The term “spline” is likely derived from shipwright or draftsmen splines, thin wood strips, held in place by weights, used to define curves. These splines naturally minimize strain energy and the use of additional weights at strategic locations on the spline are needed to achieve specific curvatures. Cubic B-splines are not dissimilar.
Splines are piecewise polynomial curves that are differentiable up to a prescribed order. B-splines are based on a basis of polynomial functions.
Definitions and notation for uni-variable and multi-variable B-splines and the associated control polygons and control nets are presented in the following. Two very good references for splines are de Boor (2001) and Prautzsch, Boehm, and Paluszny (2002) if you wish to dig into the details.
A curve defined by \(f\left(x\right)\) is a spline of order \(k\) (degree \(k -1\)), with knots \(\xi_1, \xi_2, \ldots, \xi_m\) where \(\xi_j \leq x_{j+1}\) and \(\xi_{j} < \xi_{j + k} \forall j\) if \(f\left(x\right)\) is \(k-r-1\) times differentiable at any \(r\)-fold knot, and \(f\left(x\right)\) is a polynomial of order \(\leq k\) over each interval \(x \in \left[\xi_j, \xi_{j+1}\right]\) for \(j = 0, \ldots, m - 1.\)
In particular, B-splines, are defined as an affine combination:
\[\begin{equation} f \left( x \right) = \sum_{j} \theta_j B_{j,k,\boldsymbol{\xi}}\left(x\right) = \boldsymbol{B}_{k, \boldsymbol{\xi}} \left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \label{eq:f} \end{equation}\]
where \(B_{j,k,\boldsymbol{\xi}}\left(x\right)\) is the \(j^{th}\) basis spline function, \(\boldsymbol{\xi}\) is a sequence of \(l\) interior knots and a total of \(2k\) boundary knots, i.e., the cardinality of the knot sequence is \(\left\lvert \boldsymbol{\xi} \right\rvert = 2k + l.\) The value of the \(k\) boundary knots is arbitrary, but a common choice is to use \(k\)-fold knots on the boundary:
\[ \xi_1 = \xi_2 = \cdots = \xi_k < \xi_{k+1} \leq \cdots \leq \xi_{k + l} = \xi_{k + l + 1} = \cdots = \xi_{2k + l}.\]
Alternative boundary knots can be used so long as the sequence \(\boldsymbol{\xi}\) is non-decreasing. More on the implications of \(k\)-fold boundary knots follow in the next section.
The \(j^{th}\) B-spline is defined as:
\[\begin{equation} B_{j,k,\boldsymbol{\xi}}\left(x\right) = \omega_{j,k,\boldsymbol{\xi}} \left(x\right) B_{j,k-1,\boldsymbol{\xi}}\left(x\right) + \left(1 - \omega_{j+1,k,\boldsymbol{\xi}} \right) B_{j+1,k-1,\boldsymbol{\xi}}\left(x\right), \end{equation}\] where \[\begin{equation} B_{j,k,\boldsymbol{\xi}}\left(x\right) = 0 \quad \text{for} \quad x \notin \left[\xi_j, \xi_{j+k} \right), \quad \quad B_{j,1,\boldsymbol{\xi}}\left(x\right) = \begin{cases} 1 & x \in \left[\xi_j, \xi_{j+k}\right) \\ 0 & \text{otherwise}, \end{cases} \end{equation}\] and \[\begin{equation} \omega_{j,k,\boldsymbol{\xi}}\left(x\right) = \begin{cases} 0 & x \leq \xi_j \\ \frac{x-\xi_j}{\xi_{j+k-1} - \xi_{j}} & \xi_j < x < \xi_{j+k-1} \\ 1 & \xi_{j+k-1} \leq x. \end{cases} \end{equation}\]
For a set of observations, \(\boldsymbol{x} = x_1, \ldots, x_n,\) the basis functions defined above generalize to a matrix:
\[\begin{equation} \boldsymbol{B}_{k, \boldsymbol{\xi}} \left(\boldsymbol{x}\right) = \begin{pmatrix} B_{1, k, \boldsymbol{\xi}} \left(x_1\right) & B_{2, k, \boldsymbol{\xi}} \left(x_1\right) & \ldots & B_{k + l, k, \boldsymbol{\xi}} \left(x_1\right) \\ \vdots & \vdots & \ddots & \vdots \\ B_{1, k, \boldsymbol{\xi}} \left(x_n\right) & B_{2, k, \boldsymbol{\xi}} \left(x_n\right) & \ldots & B_{k + l, k, \boldsymbol{\xi}} \left(x_n\right). \end{pmatrix} \end{equation}\]
Within the cpr package we can generate a basis matrix thusly:
x <- seq(0, 5.9999, length.out = 5000)
bmat <- bsplines(x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
bmat
## Basis matrix dims: [5,000 x 9]
## Order: 4
## Number of internal knots: 5
##
## First 6 rows:
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 1.0000000 0.000000000 0.000000e+00 0.000000e+00 0 0 0 0 0
## [2,] 0.9964037 0.003593461 2.878634e-06 5.011451e-10 0 0 0 0 0
## [3,] 0.9928160 0.007172539 1.150485e-05 4.009161e-09 0 0 0 0 0
## [4,] 0.9892369 0.010737255 2.586411e-05 1.353092e-08 0 0 0 0 0
## [5,] 0.9856664 0.014287632 4.594188e-05 3.207329e-08 0 0 0 0 0
## [6,] 0.9821045 0.017823691 7.172363e-05 6.264314e-08 0 0 0 0 0
Note: the default order for bsplines
is 4, and the
default for the boundary knots is the range of x
. However,
relying on the default boundary knots can lead to unexpected behavior
as, by definition the splines on the \(k\)-fold upper boundary is 0.
We can quickly view the plot of each of these spline functions as well.
A few of many important properties of the basis functions:
\[B_{j,k,\boldsymbol{xi}} \left(x\right) \in \left[0, 1\right], \forall x \in \mathbb{R};\]
\[B_{j,k,\boldsymbol{\xi}} \left(x\right) > 0 \quad \text{for} \quad x \in \left(\xi_{j}, \xi_{j+k}\right);\]
and
\[\sum_j B_{j,k,\boldsymbol{xi}} \left(x\right) = \begin{cases} 1 & x \in \left[\xi_1, \xi_{2k+l}\right) \\ 0 & otherwise. \end{cases}\]
Part of the base R distribution is the splines package which build
B-splines by calling bs
. There are three areas where the
functions differ: 1. input arguments, 2. attributes of the returned
matrix, and 3. behavior at the right boundary knot.
args(bsplines)
## function (x, iknots = NULL, df = NULL, bknots = range(x), order = 4L)
## NULL
args(splines::bs)
## function (x, df = NULL, knots = NULL, degree = 3, intercept = FALSE,
## Boundary.knots = range(x), warn.outside = TRUE)
## NULL
cpr::bsplines | splines::bs | Notes |
---|---|---|
x | x | numeric vector; the predictor variable |
iknots | knots | internal knots |
bknots | Boundary.knots | boundary knots |
order | degree | polynomial order = polynomial degree + 1 |
df | df | degrees of freedom |
intercept | ||
warn.outside |
bsplines
does not have the intercept
nor
the warn.outside
arguments because the matrix generated by
bsplines
effectively always has
intercept = TRUE
and warn.outside = TRUE
. How
values of x
at the right boundary, and outside the boundary
are treated also differ between the two functions.
The default call for both B-spline functions returns a basis matrix
for a order 4 (degree 3; cubic) B-spline with boundary knots placed at
range(x)
. However, the returns are not the same.
bs_mat <- splines::bs(x, knots = attr(bmat, "iknots"), Boundary.knots = attr(bmat, "bknots"))
str(attributes(bmat))
## List of 10
## $ dim : int [1:2] 5000 9
## $ order : num 4
## $ df : num 9
## $ iknots : num [1:5] 1 1.5 2.3 4 4.5
## $ bknots : num [1:2] 0 6
## $ xi : num [1:13] 0 0 0 0 1 1.5 2.3 4 4.5 6 ...
## $ xi_star : num [1:9] 0 0.333 0.833 1.6 2.6 ...
## $ class : chr [1:2] "cpr_bs" "matrix"
## $ call : language bsplines(x = x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
## $ environment:<environment: R_GlobalEnv>
str(attributes(bs_mat))
## List of 7
## $ dim : int [1:2] 5000 8
## $ dimnames :List of 2
## ..$ : NULL
## ..$ : chr [1:8] "1" "2" "3" "4" ...
## $ degree : int 3
## $ knots : num [1:5] 1 1.5 2.3 4 4.5
## $ Boundary.knots: num [1:2] 0 6
## $ intercept : logi FALSE
## $ class : chr [1:3] "bs" "basis" "matrix"
The bspline_mat
has additional attributes related to the
control polygons.
The major difference is the in the dimension of the matrices. By
default splines::bs
omits one column from the basis matrix
such that when using using the function is a regression formula the
resulting design matrix is not rank deficient. Using
bsplines
would suggest using a +0
or
-1
in the regression formula to omit the intercept (is
nuance is handled in calls to cp
so the end user need not
worry about it).
By definition, the \(\boldsymbol{B}_{j,k,\boldsymbol{\xi}}\left(x\right)\)
are non-negative right-continuous functions. bsplines
adheres to the definition strictly, whereas splines::bs
uses a pivoting method to allow for non-zero extrapolations outside the
support.
Example: for the cpr::bsplines
call, notice that the
first, third, and fifth rows, corresponding to values outside the
support are all zeros as are the row sums. Compare that to the
splines::bs
which returns negative values and in the
matrix, and all rows sum to 1.
bspline_eg <- bsplines(c(0, 1, 2, 5, 6), bknots = c(1, 5))
## Warning in bsplines(c(0, 1, 2, 5, 6), bknots = c(1, 5)): At least one x value <
## min(bknots)
## Warning in bsplines(c(0, 1, 2, 5, 6), bknots = c(1, 5)): At least one x value
## >= max(bknots)
bs_eg <- splines::bs(c(0, 1, 2, 5, 6), Boundary.knots = c(1, 5), intercept = TRUE )
## Warning in splines::bs(c(0, 1, 2, 5, 6), Boundary.knots = c(1, 5), intercept =
## TRUE): some 'x' values beyond boundary knots may cause ill-conditioned bases
head(bspline_eg)
## [,1] [,2] [,3] [,4]
## [1,] 0.000000 0.000000 0.000000 0.000000
## [2,] 1.000000 0.000000 0.000000 0.000000
## [3,] 0.421875 0.421875 0.140625 0.015625
## [4,] 0.000000 0.000000 0.000000 0.000000
## [5,] 0.000000 0.000000 0.000000 0.000000
rowSums(bspline_eg)
## [1] 0 1 1 0 0
head(bs_eg)
## 1 2 3 4
## [1,] 1.953125 -1.171875 0.234375 -0.015625
## [2,] 1.000000 0.000000 0.000000 0.000000
## [3,] 0.421875 0.421875 0.140625 0.015625
## [4,] 0.000000 0.000000 0.000000 1.000000
## [5,] -0.015625 0.234375 -1.171875 1.953125
rowSums(bs_eg)
## [1] 1 1 1 1 1
The spline \(f\left(\boldsymbol{x}\right) = \boldsymbol{B}_{k,\boldsymbol{\xi}}\left(\boldsymbol{x}\right)\) is a convex sum of the coefficients \(\boldsymbol{\theta}_{\boldsymbol{\xi}}.\) A meaningful geometric relationship between \(\boldsymbol{\xi}\) and \(\boldsymbol{\theta}_{\boldsymbol{\xi}}\) exist in the form of a control polygon, \(CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}},\) a strong convex hull for \(\boldsymbol{B}_{k,\boldsymbol{\xi}}\left(\boldsymbol{x}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}}.\)
\[\begin{equation} CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\xi}} = \left\{ \left( \xi_{j}^{*}, \theta_{j,\boldsymbol{\xi}} \right) \right \}_{j=1}^{\left\lvert\boldsymbol{\theta}_{\boldsymbol{\xi}}\right\rvert}; \quad \quad \xi_{j}^{*} = \frac{1}{k-1} \sum_{i = 1}^{k-1} \xi_{j + i}. \end{equation}\]
\(CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}}\) is a sequence of \(\left\lvert \boldsymbol{\theta}_{\boldsymbol{\xi}} \right\rvert = k + l\) control vertices. The control polygon can be thought of as a piecewise linear function approximating the spline function. Changes in convexity and other subtle characteristics of the spline function are exaggerated by the control polygon.
For example, using the basis matrix defined above and the following coefficients we can easily define a spline function and control polygon.
Plotting the control polygon and the corresponding spline:
Consider two knot sequences \(\boldsymbol{\xi}\) and \(\boldsymbol{\xi} \cup \boldsymbol{\xi}'.\) Then, for a given polynomial order \(k,\) the spline space \(\mathbb{S}_{k,\boldsymbol{\xi}} \subset \mathbb{S}_{k,\boldsymbol{\xi} \cup \boldsymbol{\xi}'}\) (de Boor 2001, pg135). Given this relationship between spline spaces, and the convex sums generating spline functions, Boehm (1980) presented a method for inserting a knots into the knot sequence such that the spline function is unchanged. Specifically, for \[ \boldsymbol{\xi}' = \left\{\left. \xi_{j}' \right| \min\left(\boldsymbol{\xi}\right) < \xi_j < \max\left(\boldsymbol{\xi}\right) \quad \forall j\right\}, \] then there exist a \(\boldsymbol{\theta}_{\boldsymbol{\xi} \cup \boldsymbol{\xi}'}\) such that \[ \boldsymbol{B}_{k,\boldsymbol{\xi}} \left(x\right)\boldsymbol{\theta}_{\boldsymbol{\xi}} = \boldsymbol{B}_{k,\boldsymbol{\xi}\cup\boldsymbol{\xi}'}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi} \cup \boldsymbol{\xi}'} \quad \forall x \in \left[\min\left(\boldsymbol{\xi}\right), \max\left(\boldsymbol{\xi} \right) \right]. \]
When inserting a singleton \(\xi'\) into \(\boldsymbol{\xi},\) then \[ \boldsymbol{\theta}_{\boldsymbol{\xi} \cup \xi'} = \boldsymbol{W}_{k, \boldsymbol{\xi}}\left(\xi'\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \] where \(\boldsymbol{W}_{k,\boldsymbol{\xi}}\left(\xi'\right)\) is a \(\left( \left\lvert \boldsymbol{\theta} \right\rvert + 1\right) \times \left\lvert \boldsymbol{\theta} \right\rvert\) lower bi-diagonal matrix \[ \boldsymbol{W}_{k,\boldsymbol{\xi}}\left(\xi'\right) = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 1 - \omega_{1, k, \boldsymbol{\xi}}\left(\xi'\right) & \omega_{1, k, \boldsymbol{\xi}} \left(\xi'\right) & \cdots & 0 \\ 0 & 1 - \omega_{2, k, \boldsymbol{\xi}}\left(\xi'\right) & \cdots 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 1 - \omega_{\left\lvert \boldsymbol{\theta} \right\rvert - 1, k, \boldsymbol{\xi}} \left(\xi'\right) & \omega_{\left\lvert \boldsymbol{\theta} \right\rvert - 1, k, \boldsymbol{\xi}} \left(\xi'\right) \\ 0 & 0 & 0 & 1 \end{pmatrix}, \] with \(\omega_{j,k,\boldsymbol{\xi}}\left(x\right)\) as defined above in the de~Boor recursive algorithm. Through recursion we can insert as many knots into \(\boldsymbol{\xi}\) without changing the value of the spline function.
For an example, insert a knot \(\xi' = 3\) into the control polygon defined above.
Here we derive a metric for assessing how much influence \(\xi_j \in \boldsymbol{\xi}\) has on \(\boldsymbol{B}_{k,\boldsymbol{\xi}} \left(x\right)\boldsymbol{\theta}_{\boldsymbol{\xi}}.\) Using the relationship defined by Boehm (1980), we can derive a metric for the influence of \(\xi_j\) on the spline function.
Start with an defined \(k, \boldsymbol{\xi},\) and \(\boldsymbol{\theta}_{k,\boldsymbol{\xi}}.\) The relationship \[ \boldsymbol{\theta}_{\boldsymbol{\xi}} = \boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} \] holds if \(\xi_j\) has zero influence. However, in practice, we would expect that the relationship is \[ \boldsymbol{\theta}_{\boldsymbol{\xi}} = \boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} + \boldsymbol{d} \] for a set of deviations \(\boldsymbol{d}\) which would be equal to \(\boldsymbol{0}\) if \(\xi_j\) has no influence on the spline.
We can estimate \(\boldsymbol{d}\) via least squares. To simplify the notation in the following we drop some of the subscripts and parenthesis, that is, let \(\boldsymbol{W} = \boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right).\)
\[ \begin{align} \boldsymbol{d} &= \boldsymbol{\theta}_{\boldsymbol{\xi}} - \boldsymbol{W} \boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} \\ &= \boldsymbol{\theta}_{\boldsymbol{\xi}} - \boldsymbol{W} \left(\boldsymbol{W}^{T}\boldsymbol{W}\right)^{-1}\boldsymbol{W}^{T} \boldsymbol{\theta}_{\boldsymbol{\xi}} \\ &= \left(\boldsymbol{I} - \boldsymbol{W} \left(\boldsymbol{W}^{T}\boldsymbol{W}\right)^{-1}\boldsymbol{W}^{T}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \\ \end{align} \]
Finally, we define the influence of \(\xi_j\) on \(CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}}\) as \[ w_j = \left\lVert \boldsymbol{d} \right\rVert_{2}^{2}. \]
The influence of knots on the spline used in the above section.
x <- influence_of_iknots(cp0)
summary(x)
## j iknot influence influence_rank chisq chisq_rank p_value os_p_value
## 1 5 1.0 1.64661178 5 NA NA NA NA
## 2 6 1.5 0.29066719 2 NA NA NA NA
## 3 7 2.3 0.31205029 3 NA NA NA NA
## 4 8 4.0 0.07702981 1 NA NA NA NA
## 5 9 4.5 0.41987740 4 NA NA NA NA
Let’s look at the following plots to explore the influence of \(\xi_{7}\) (the third interior knot in a \(k=4\) order spline) on the spline. In panel (a) we see the original control polygon and spline along with the coarsened control polygon and spline. Note that the there are fewer control points, and the spline deviates from the original. In panel (b) we see that the restored control polygon is the result of inserting \(\xi_7\) into the coarsened control polygon of panel (a). These plots are also a good example of the local support and strong convexity of the control polygons as there are only \(k + 1 = 5\) control points which are impacted by the removal and re-insertion of \(\xi_7.\) Lastly, in panel (c) we see all three control polygons plotted together.
ggpubr::ggarrange(
ggpubr::ggarrange(
plot(x, j = 3, coarsened = TRUE, restored = FALSE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, plot(x, j = 3, coarsened = FALSE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, labels = c("(a)", "(b)")
, nrow = 1
)
, plot(x, j = 3, coarsened = TRUE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "bottom")
, labels = c("", "(c)")
, nrow = 2
, ncol = 1
, heights = c(1, 2)
)
Next, consider the influence of \(\xi_8,\) the fourth interior knot. By the influence metric defined above, this is the least influential knot in the sequence. This can be seen easily as the spline between the original and the coarsened spline are very similar, this is despite the apparent large difference in the magnitude of the control point ordinates between the original and coarsened control polygons. However, when re-inserting \(\xi_8\) the recovered control polygon is very similar to the original, hence the low influence of \(\xi_8.\)
ggpubr::ggarrange(
ggpubr::ggarrange(
plot(x, j = 4, coarsened = TRUE, restored = FALSE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, plot(x, j = 4, coarsened = FALSE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "none")
, labels = c("(a)", "(b)")
, nrow = 1
)
, plot(x, j = 4, coarsened = TRUE, restored = TRUE, color = TRUE, show_spline = TRUE) +
ggplot2::theme(legend.position = "bottom")
, labels = c("", "(c)")
, nrow = 2
, ncol = 1
, heights = c(1, 2)
)
If you were required to omit an internal knot, it would be preferable to omit \(\xi_8\) over \(\xi_5, \xi_6, \xi_7,\) or \(\xi_9\) as that will have the least impact on the spline approximation of the original functional form.
Start with the spline we have been using and add some noise to it.
set.seed(42)
x <- seq(0, 5.99999, length.out = 100)
bmat <- bsplines(x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
theta <- matrix(c(1, 0, 3.5, 4.2, 3.7, -0.5, -0.7, 2, 1.5), ncol = 1)
DF <- data.frame(x = x, truth = as.numeric(bmat %*% theta))
DF$y <- as.numeric(bmat %*% theta + rnorm(nrow(bmat), sd = 0.3))
original_data_ggplot_layers <-
list(
ggplot2::geom_point(data = DF
, mapping = ggplot2::aes(x = x, y = y)
, inherit.aes = FALSE
, color = "#6F263D"
, alpha = 0.2)
,
ggplot2::geom_line(data = DF
, mapping = ggplot2::aes(x = x, y = truth)
, inherit.aes = FALSE
, color = "#6F263D"
, alpha = 0.6)
)
ggplot2::ggplot(DF) + ggplot2::theme_bw() + original_data_ggplot_layers
To fit a spline and control polygon to the noisy data use a formula
statement in the cp
call. In this example we will use the
known internal knots and add one extra.
initial_cp <-
cp(y ~ bsplines(x, iknots = c(1, 1.5, 2.3, 3.0, 4, 4.5), bknots = c(0, 6))
, data = DF
, keep_fit = TRUE # default is FALSE
)
plot(initial_cp, show_spline = TRUE) + original_data_ggplot_layers
The plot above shows the fitted spline is does well at approximating the true spline function. Just to make it perfectly clear, the regression coefficients are the estimates of the ordinates for the control polygon:
initial_cp$fit |> coef() |> unname()
## [1] 1.0032390 0.6730762 3.0035459 4.4136192 3.5386419 1.7539897
## [7] -0.4708520 -0.7312097 2.1631310 1.3575369
initial_cp$cp$theta
## [1] 1.0032390 0.6730762 3.0035459 4.4136192 3.5386419 1.7539897
## [7] -0.4708520 -0.7312097 2.1631310 1.3575369
Let’s now look at the influence of the internal knots on the fit
initial_cp |>
influence_of_iknots() |>
summary()
## j iknot influence influence_rank chisq chisq_rank p_value os_p_value
## 1 5 1.0 0.16235075 5 0.9100724 5 0.3400952 0.66205444
## 2 6 1.5 0.05785223 3 0.4331666 2 0.5104392 0.11947108
## 3 7 2.3 0.05128404 2 0.5061398 3 0.4768147 0.30134562
## 4 8 3.0 0.08264093 4 0.8363492 4 0.3604430 0.37410640
## 5 9 4.0 0.02585915 1 0.2230694 1 0.6367111 0.06662766
## 6 10 4.5 0.39575696 6 2.6821030 6 0.1014816 0.47378677
The least influential knot is \(\xi_8 = 3.0,\) the extra knot inserted. Good, we this is the expected result.
How would someone determine if the influence was significant? That is, how can we test the null hypothesis \[H_0: w_{j} = 0\]
Under the null hypothesis that the knot has zero influence and under standard ordinary least squares regression assumptions the regression coefficients, (the ordinates of the control polygon are the regression coefficients), are realizations of a Gaussian random variable. Then
\[ w_j = \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)^{T} \left[ \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right) \boldsymbol{\Sigma} \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)^{T} \right]^{+} \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right) \] where \(\boldsymbol{H} = W (W^{T}W)^{-1} W^{T},\) \(\boldsymbol{\Sigma}\) is the variance-covariance matrix for the regression coefficients, and \(+\) denotes the Moore-Penrose inverse of the matrix. By construction, \(\left(\boldsymbol{I} - \boldsymbol{H}\right) \boldsymbol{\Sigma} \left(\boldsymbol{I} - \boldsymbol{H}\right)^{T}\) is singular and thus the standard inverse does not exist and a generalized inverse is necessary. This yields the test statistic following an F distribution with 1, and \(\nu_2\) degrees of freedom, \[ w_j \sim F_{1,\nu_2}.\] The second degree of freedom is dependent on the sample size, the number of regression parameters, and the type of regression used to estimate the ordinates.
To simplify the work, and generalize the approach, we will use the fact that the limiting distribution of \(F_{1, \nu_2}\) as \(\nu_2 \rightarrow \infty\) is \(\chi_{1}^{2},\) that is, \[ w_j \sim \chi_{1}^{2}.\]
Now, if we are interested in removing the knot with the lowest influence we are interested in the minimum. So the hypothesis test we are actually interested in is \[ H_0: w_{(1)} = 0 \] which follows the distribution \[ \Pr \left[ w_{(1)} > w \right] = \sum_{j = 1}^{k+l} \binom{n+l}{j} \left(F_{\chi_{1}^{2}}\left(w\right)\right)^{j} \left( 1 - F_{\chi_{1}^{2}}\left(w\right) \right)^{k+l-j}\] where \(F_{\chi_{1}^{2}}\left(w\right)\) is the distribution function of the chi-square distribution with one degree of freedom.
The results generated by calling influence_of_iknots
.
report two sets of p-values. The first is the p-value is the probability
of observed chisq value greater than reported, and the second p-value is
the probability of the rank order statistic exceeding the observed
value.
j | iknot | influence | influence_rank | chisq | chisq_rank | p_value | os_p_value | |
---|---|---|---|---|---|---|---|---|
1 | 5 | 1.0 | 0.1623508 | 5 | 0.9100724 | 5 | 0.3400952 | 0.6620544 |
2 | 6 | 1.5 | 0.0578522 | 3 | 0.4331666 | 2 | 0.5104392 | 0.1194711 |
3 | 7 | 2.3 | 0.0512840 | 2 | 0.5061398 | 3 | 0.4768147 | 0.3013456 |
4 | 8 | 3.0 | 0.0826409 | 4 | 0.8363492 | 4 | 0.3604430 | 0.3741064 |
5 | 9 | 4.0 | 0.0258592 | 1 | 0.2230694 | 1 | 0.6367111 | 0.0666277 |
6 | 10 | 4.5 | 0.3957570 | 6 | 2.6821030 | 6 | 0.1014816 | 0.4737868 |
It is worth remembering how fraught binary classification of statistical (non)significance can be. Just because the p-value is low does not mean that the knot is influential, just as a high p-value dose not mean that the knot is not influential. Sample size, over-fitting, and other factors can/will lead to poor selection of a model if you only consider these p-values.
That said, consider \(\xi_9 = 4.0\) which has the lowest influence weight. Let’s omit that knot and refit the model.
first_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 1.5, 2.3, 3, 4.5), bknots = c(0, 6)), data = DF)
first_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
j | iknot | influence | influence_rank | chisq | chisq_rank | p_value | os_p_value | |
---|---|---|---|---|---|---|---|---|
1 | 5 | 1.0 | 0.1957262 | 4 | 1.1517249 | 4 | 0.2831884 | 0.4369320 |
2 | 6 | 1.5 | 0.0345902 | 2 | 0.2881794 | 1 | 0.5913896 | 0.0723383 |
3 | 7 | 2.3 | 0.0238548 | 1 | 0.2900566 | 2 | 0.5901843 | 0.3202087 |
4 | 8 | 3.0 | 0.0782231 | 3 | 0.8844555 | 3 | 0.3469842 | 0.2305056 |
5 | 9 | 4.5 | 5.9074903 | 5 | 46.2296402 | 5 | 0.0000000 | 0.0000000 |
After omitting one knot and refitting the model we see that \(\xi = 2.3\) is the least influential. Just for fun, let’s omit that knot, and refit. Let’s continue that process all the way down to zero knots.
second_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 1.5, 3, 4.5), bknots = c(0, 6)), data = DF)
second_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
j | iknot | influence | influence_rank | chisq | chisq_rank | p_value | os_p_value | |
---|---|---|---|---|---|---|---|---|
1 | 5 | 1.0 | 0.4805752 | 3 | 3.8031184 | 3 | 0.0511572 | 0.0146519 |
2 | 6 | 1.5 | 0.0019515 | 1 | 0.0186252 | 1 | 0.8914463 | 0.6315108 |
3 | 7 | 3.0 | 0.1381508 | 2 | 1.3834860 | 2 | 0.2395083 | 0.0450848 |
4 | 8 | 4.5 | 6.2382229 | 4 | 59.9559534 | 4 | 0.0000000 | 0.0000000 |
The least influential knot in the second reduction is \(\xi = 1.5\) and that will be omitted for the third reduction.
third_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 3, 4.5), bknots = c(0, 6)), data = DF)
third_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
j | iknot | influence | influence_rank | chisq | chisq_rank | p_value | os_p_value | |
---|---|---|---|---|---|---|---|---|
1 | 5 | 1.0 | 3.0669773 | 2 | 41.672311 | 2 | 0.0000000 | 0.00e+00 |
2 | 6 | 3.0 | 0.4304104 | 1 | 3.986697 | 1 | 0.0458609 | 9.65e-05 |
3 | 7 | 4.5 | 6.6872293 | 3 | 73.937519 | 3 | 0.0000000 | 0.00e+00 |
Within the third reduction, the least influential knot is \(\xi = 3.0\) and that knot will be omitted for the fourth reduction.
fourth_reduction_cp <-
cp(y ~ bsplines(x, iknots = c(1, 4.5), bknots = c(0, 6)), data = DF)
fourth_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
j | iknot | influence | influence_rank | chisq | chisq_rank | p_value | os_p_value | |
---|---|---|---|---|---|---|---|---|
1 | 5 | 1.0 | 5.952998 | 1 | 107.2956 | 1 | 0 | 0 |
2 | 6 | 4.5 | 9.517375 | 2 | 142.2148 | 2 | 0 | 0 |
Of the two remaining internal knots, \(\xi - 1.0\) is the least influential and will be omitted for the fifth reduction.
fifth_reduction_cp <-
cp(y ~ bsplines(x, iknots = 4.5, bknots = c(0, 6)), data = DF)
fifth_reduction_cp |>
influence_of_iknots() |>
summary() |>
knitr::kable(row.names = TRUE)
j | iknot | influence | influence_rank | chisq | chisq_rank | p_value | os_p_value | |
---|---|---|---|---|---|---|---|---|
1 | 5 | 4.5 | 4.103958 | 1 | 31.66441 | 1 | 0 | 0 |
Only one knot can be omitted from the fifth to the sixth reduction. Having the sixth reduction, where there are zero internal knots, lets us compare model fits to a model with just a \(k=4\) order polynomial.
sixth_reduction_cp <-
cp(y ~ bsplines(x, bknots = c(0, 6)), data = DF)
sixth_reduction_cp |>
influence_of_iknots() |>
summary()
## j iknot influence influence_rank chisq chisq_rank p_value os_p_value
## 1 4 NA NA NA NA NA NA NA
Let’s compare all the fits. We will start by looking at the control polygons and splines.
The control polygons, panel (a), we see that the sixth, fifth, and fourth reductions are different from the initial, first, second, and third reductions, and for the splines, the fifth and sixth reductions are easily identified as different.
Next, the graphic below will let us compare these models to the truth, and the observed data.
The dashed black line is the spline fitted to the data (light burgundy dots) and the true value of the target function is the burgundy line. In the fifth reduction there is an easily noticeable difference between the fitted spline and the target function. Between the initial control polygon and the first three reductions it is difficult to visually discern any meaningful difference between the fits.
Thus, I would argue that the third reduction is the preferable model as it has the fewest degrees of freedom while providing a good quality of fit. This conclusion is supported by looking at the residual standard error (rse) \[ rse = \sqrt{ \frac{1}{df} \sum_{i=1}^{n} \left(y_i - f\left(x_i\right)\right)^2 },\] where the degrees of freedom, \(df,\) is the sample size \(n\) minus the number of regression parameters.
list( initial_cp , first_reduction_cp , second_reduction_cp , third_reduction_cp
, fourth_reduction_cp , fifth_reduction_cp , sixth_reduction_cp) |>
rev() |>
lapply(summary) |>
do.call(what = rbind, args = _) |>
cbind(data.frame(reduction = seq(6, 0, by = -1))) |>
knitr::kable(row.names = TRUE)
dfs | n_iknots | iknots | loglik | rss | rse | wiggle | fdsc | reduction | |
---|---|---|---|---|---|---|---|---|---|
1 | 4 | 0 | -74.52286 | 25.991002 | 0.5203264 | 46.86602 | 2 | 6 | |
2 | 5 | 1 | 4.5 | -60.13965 | 19.493599 | 0.4529854 | 46.12997 | 2 | 5 |
3 | 6 | 2 | 1, 4.5 | -22.06566 | 9.103022 | 0.3111923 | 113.09010 | 4 | 4 |
4 | 7 | 3 | 1, 3, 4.5 | -19.96695 | 8.728836 | 0.3063633 | 99.30959 | 4 | 3 |
5 | 8 | 4 | 1, 1.5, …. | -19.95683 | 8.727070 | 0.3079926 | 95.81790 | 4 | 2 |
6 | 9 | 5 | 1, 1.5, …. | -19.79771 | 8.699341 | 0.3091879 | 92.04602 | 4 | 1 |
7 | 10 | 6 | 1, 1.5, …. | -19.67393 | 8.677833 | 0.3105163 | 87.38456 | 4 | 0 |
The wiggle
is one measure of wiggliness defined as \[
\int_{\min\left(\boldsymbol{\xi}\right)}^{\max\left(\boldsymbol{\xi}\right)}
\left( \frac{d^2}{dx^2} f\left(x\right) \right)^2 dx.
\] fdsc
reports the number of times the first
derivative has a sign change.
The exercise above of manually identifying and omitting the knot with
the smallest influence in each model would be tedious when working with
a large set of initial knots. Fortunately, the process has been
automated. Calling cpr
on a cpr_cp
object
defined by a function will automatically omit the internal knot with the
lowest influence.
Apply CPR to the initial_cp
from the above example.
cpr0
## A list of control polygons
## List of 7
## - attr(*, "ioik")=List of 7
## - attr(*, "class")= chr [1:2] "cpr_cpr" "list"
There are 7 control polygons within the cpr_cpr
object,
length(initial_cp$iknots) + 1
. The indexing is set such at
that the \(i^{th}\) element has \(i-1\) internal knots.
Before exploring the results, let’s just verify that the results of
the call to cpr
are the same as the manual results found
about. There are some differences in the metadata of the objects, but
the important parts, like the control polygons, are the same.
all.equal( cpr0[["cps"]][[7]][["cp"]], initial_cp[["cp"]])
## [1] "target is NULL, current is data.frame"
# some attributes are different with the last cp due to how the automation
# creates the call vs how the call was created manually.
call_idx <- which(names(cpr0[["cps"]][[6]]) == "call")
all.equal( cpr0[["cps"]][[6]][-call_idx], first_reduction_cp [-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[5]][-call_idx], second_reduction_cp[-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[4]][-call_idx], third_reduction_cp [-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[3]][-call_idx], fourth_reduction_cp[-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[2]][-call_idx], fifth_reduction_cp [-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[1]][-call_idx], sixth_reduction_cp [-call_idx], check.attributes = FALSE)
## [1] "target is NULL, current is list"
In the manual process we identified third_reduction_cp
as the preferable model. For the cpr0
object we can quickly
see a similar result as we did for the manual process.
dfs | n_iknots | iknots | loglik | rss | rse | wiggle | fdsc | Pr(>w_(1)) | |
---|---|---|---|---|---|---|---|---|---|
1 | 4 | 0 | -74.52286 | 25.991002 | 0.5203264 | 46.86602 | 2 | NA | |
2 | 5 | 1 | 4.5 | -60.13965 | 19.493599 | 0.4529854 | 46.12997 | 2 | 0.0000000 |
3 | 6 | 2 | 1, 4.5 | -22.06566 | 9.103022 | 0.3111923 | 113.09010 | 4 | 0.0000000 |
4 | 7 | 3 | 1, 3, 4.5 | -19.96695 | 8.728836 | 0.3063633 | 99.30959 | 4 | 0.0000965 |
5 | 8 | 4 | 1, 1.5, …. | -19.95683 | 8.727070 | 0.3079926 | 95.81790 | 4 | 0.6315108 |
6 | 9 | 5 | 1, 1.5, …. | -19.79771 | 8.699341 | 0.3091879 | 92.04602 | 4 | 0.0723383 |
7 | 10 | 6 | 1, 1.5, …. | -19.67393 | 8.677833 | 0.3105163 | 87.38456 | 4 | 0.0666277 |
The additional columns in this summary, loglik_elbow
and
rse_elbow
, indicate a location in the plot for either the
loglik or rse by model index (degrees of freedom) where the trade-off
between additional degrees of freedom and improvement in the fix
statistic is negligible. See plot below. This is determined by finding
the breakpoint such that a continuous, but not differentiable at
breakpoint, quadratic spline fits the plot with minimal residual
standard error.
ggpubr::ggarrange(
ggpubr::ggarrange(
plot(cpr0, color = TRUE)
, plot(cpr0, show_cp = FALSE, show_spline = TRUE, color = TRUE)
, ncol = 1
, common.legend = TRUE
)
,
ggpubr::ggarrange(
plot(s0, type = "rse")
, plot(s0, type = "rss")
, plot(s0, type = "loglik")
, plot(s0, type = "wiggle")
, plot(s0, type = "fdsc")
, plot(s0, type = "Pr(>w_(1))")
, common.legend = TRUE
)
, widths = c(2, 3)
)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).
In practice it is be extremely unlikely to know where knots should be placed. Analytic solutions are difficult, if not impossible to derive (Jupp 1978). However, an optimal solution may not be necessary.
From de Boor (2001) (page 106)
“…a B-spline doesn’t change much if one changes its \(k+1\) knots a little bit. Therefore, if one
has multiple knots, then it is very easy to find B-spline almost like
with simple knots: Simply replace knot of multiplicity \(r > 1\) by \(r\) simple knots nearby.”
That is,
\[ \boldsymbol{B}_{k, \boldsymbol{\xi}}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \approx \boldsymbol{B}_{k, \boldsymbol{\xi'}}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi'}} \] where \[ \left\lvert \xi_j - \xi_j' \right\rvert < \delta \quad \text{for small} \quad \delta > 0. \]
So, in the case when we are looking for a good set of knots (a good set of knots should be parsimonious, and provide a good model fit) we start with an initial knot sequence with a high cardinality, this is not without precedent Eilers and Marx (2010). We then apply the CPR algorithm to find a good set of knots.
For example we will use 50 internal knots. Not surprisingly we have a fit that is more “connect-the-dots” than a smooth fit.