Control Polygon Reduction

Peter E. DeWitt

library(cpr)
packageVersion("cpr")
## [1] '0.4.0'

The purpose of this vignette is to illustrate how the Control Polygon Reduction (CPR) method can be used to select a set of knots defining B-spline to get a low degree of freedom and smooth fit to data. We start with a primer on B-splines and control polygons then the development and use of CPR.

B-splines and Control Polygons

The term “spline” is likely derived from shipwright or draftsmen splines, thin wood strips, held in place by weights, used to define curves. These splines naturally minimize strain energy and the use of additional weights at strategic locations on the spline are needed to achieve specific curvatures. Cubic B-splines are not dissimilar.

Splines are piecewise polynomial curves that are differentiable up to a prescribed order. B-splines are based on a basis of polynomial functions.

Definitions and notation for uni-variable and multi-variable B-splines and the associated control polygons and control nets are presented in the following. Two very good references for splines are de Boor (2001) and Prautzsch, Boehm, and Paluszny (2002) if you wish to dig into the details.

B-splines

A curve defined by \(f\left(x\right)\) is a spline of order \(k\) (degree \(k -1\)), with knots \(\xi_1, \xi_2, \ldots, \xi_m\) where \(\xi_j \leq x_{j+1}\) and \(\xi_{j} < \xi_{j + k} \forall j\) if \(f\left(x\right)\) is \(k-r-1\) times differentiable at any \(r\)-fold knot, and \(f\left(x\right)\) is a polynomial of order \(\leq k\) over each interval \(x \in \left[\xi_j, \xi_{j+1}\right]\) for \(j = 0, \ldots, m - 1.\)

In particular, B-splines, are defined as an affine combination:

\[\begin{equation} f \left( x \right) = \sum_{j} \theta_j B_{j,k,\boldsymbol{\xi}}\left(x\right) = \boldsymbol{B}_{k, \boldsymbol{\xi}} \left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \label{eq:f} \end{equation}\]

where \(B_{j,k,\boldsymbol{\xi}}\left(x\right)\) is the \(j^{th}\) basis spline function, \(\boldsymbol{\xi}\) is a sequence of \(l\) interior knots and a total of \(2k\) boundary knots, i.e., the cardinality of the knot sequence is \(\left\lvert \boldsymbol{\xi} \right\rvert = 2k + l.\) The value of the \(k\) boundary knots is arbitrary, but a common choice is to use \(k\)-fold knots on the boundary:

\[ \xi_1 = \xi_2 = \cdots = \xi_k < \xi_{k+1} \leq \cdots \leq \xi_{k + l} = \xi_{k + l + 1} = \cdots = \xi_{2k + l}.\]

Alternative boundary knots can be used so long as the sequence \(\boldsymbol{\xi}\) is non-decreasing. More on the implications of \(k\)-fold boundary knots follow in the next section.

The \(j^{th}\) B-spline is defined as:

\[\begin{equation} B_{j,k,\boldsymbol{\xi}}\left(x\right) = \omega_{j,k,\boldsymbol{\xi}} \left(x\right) B_{j,k-1,\boldsymbol{\xi}}\left(x\right) + \left(1 - \omega_{j+1,k,\boldsymbol{\xi}} \right) B_{j+1,k-1,\boldsymbol{\xi}}\left(x\right), \end{equation}\] where \[\begin{equation} B_{j,k,\boldsymbol{\xi}}\left(x\right) = 0 \quad \text{for} \quad x \notin \left[\xi_j, \xi_{j+k} \right), \quad \quad B_{j,1,\boldsymbol{\xi}}\left(x\right) = \begin{cases} 1 & x \in \left[\xi_j, \xi_{j+k}\right) \\ 0 & \text{otherwise}, \end{cases} \end{equation}\] and \[\begin{equation} \omega_{j,k,\boldsymbol{\xi}}\left(x\right) = \begin{cases} 0 & x \leq \xi_j \\ \frac{x-\xi_j}{\xi_{j+k-1} - \xi_{j}} & \xi_j < x < \xi_{j+k-1} \\ 1 & \xi_{j+k-1} \leq x. \end{cases} \end{equation}\]

For a set of observations, \(\boldsymbol{x} = x_1, \ldots, x_n,\) the basis functions defined above generalize to a matrix:

\[\begin{equation} \boldsymbol{B}_{k, \boldsymbol{\xi}} \left(\boldsymbol{x}\right) = \begin{pmatrix} B_{1, k, \boldsymbol{\xi}} \left(x_1\right) & B_{2, k, \boldsymbol{\xi}} \left(x_1\right) & \ldots & B_{k + l, k, \boldsymbol{\xi}} \left(x_1\right) \\ \vdots & \vdots & \ddots & \vdots \\ B_{1, k, \boldsymbol{\xi}} \left(x_n\right) & B_{2, k, \boldsymbol{\xi}} \left(x_n\right) & \ldots & B_{k + l, k, \boldsymbol{\xi}} \left(x_n\right). \end{pmatrix} \end{equation}\]

Within the cpr package we can generate a basis matrix thusly:

x <- seq(0, 5.9999, length.out = 5000)
bmat <- bsplines(x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
bmat
## Basis matrix dims: [5,000 x 9]
## Order: 4
## Number of internal knots: 5
## 
## First 6 rows:
## 
##           [,1]        [,2]         [,3]         [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 1.0000000 0.000000000 0.000000e+00 0.000000e+00    0    0    0    0    0
## [2,] 0.9964037 0.003593461 2.878634e-06 5.011451e-10    0    0    0    0    0
## [3,] 0.9928160 0.007172539 1.150485e-05 4.009161e-09    0    0    0    0    0
## [4,] 0.9892369 0.010737255 2.586411e-05 1.353092e-08    0    0    0    0    0
## [5,] 0.9856664 0.014287632 4.594188e-05 3.207329e-08    0    0    0    0    0
## [6,] 0.9821045 0.017823691 7.172363e-05 6.264314e-08    0    0    0    0    0

Note: the default order for bsplines is 4, and the default for the boundary knots is the range of x. However, relying on the default boundary knots can lead to unexpected behavior as, by definition the splines on the \(k\)-fold upper boundary is 0.

We can quickly view the plot of each of these spline functions as well.

plot(bmat, show_xi = TRUE, show_x = TRUE)

A few of many important properties of the basis functions:

\[B_{j,k,\boldsymbol{xi}} \left(x\right) \in \left[0, 1\right], \forall x \in \mathbb{R};\]

all(bmat >= 0)
## [1] TRUE
all(bmat <= 1)
## [1] TRUE

\[B_{j,k,\boldsymbol{\xi}} \left(x\right) > 0 \quad \text{for} \quad x \in \left(\xi_{j}, \xi_{j+k}\right);\]

and

\[\sum_j B_{j,k,\boldsymbol{xi}} \left(x\right) = \begin{cases} 1 & x \in \left[\xi_1, \xi_{2k+l}\right) \\ 0 & otherwise. \end{cases}\]

all.equal(rowSums(bmat), rep(1, nrow(bmat)))
## [1] TRUE

cpr::bsplines vs splines::bs

Part of the base R distribution is the splines package which build B-splines by calling bs. There are three areas where the functions differ: 1. input arguments, 2. attributes of the returned matrix, and 3. behavior at the right boundary knot.

Input Arguments

args(bsplines)
## function (x, iknots = NULL, df = NULL, bknots = range(x), order = 4L) 
## NULL
args(splines::bs)
## function (x, df = NULL, knots = NULL, degree = 3, intercept = FALSE, 
##     Boundary.knots = range(x), warn.outside = TRUE) 
## NULL
cpr::bsplines splines::bs Notes
x x numeric vector; the predictor variable
iknots knots internal knots
bknots Boundary.knots boundary knots
order degree polynomial order = polynomial degree + 1
df df degrees of freedom
intercept
warn.outside

bsplines does not have the intercept nor the warn.outside arguments because the matrix generated by bsplines effectively always has intercept = TRUE and warn.outside = TRUE. How values of x at the right boundary, and outside the boundary are treated also differ between the two functions.

Attributes of the returned matrices

The default call for both B-spline functions returns a basis matrix for a order 4 (degree 3; cubic) B-spline with boundary knots placed at range(x). However, the returns are not the same.

bs_mat <- splines::bs(x, knots = attr(bmat, "iknots"), Boundary.knots = attr(bmat, "bknots"))
str(attributes(bmat))
## List of 10
##  $ dim        : int [1:2] 5000 9
##  $ order      : num 4
##  $ df         : num 9
##  $ iknots     : num [1:5] 1 1.5 2.3 4 4.5
##  $ bknots     : num [1:2] 0 6
##  $ xi         : num [1:13] 0 0 0 0 1 1.5 2.3 4 4.5 6 ...
##  $ xi_star    : num [1:9] 0 0.333 0.833 1.6 2.6 ...
##  $ class      : chr [1:2] "cpr_bs" "matrix"
##  $ call       : language bsplines(x = x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
##  $ environment:<environment: R_GlobalEnv>
str(attributes(bs_mat))
## List of 7
##  $ dim           : int [1:2] 5000 8
##  $ dimnames      :List of 2
##   ..$ : NULL
##   ..$ : chr [1:8] "1" "2" "3" "4" ...
##  $ degree        : int 3
##  $ knots         : num [1:5] 1 1.5 2.3 4 4.5
##  $ Boundary.knots: num [1:2] 0 6
##  $ intercept     : logi FALSE
##  $ class         : chr [1:3] "bs" "basis" "matrix"

The bspline_mat has additional attributes related to the control polygons.

The major difference is the in the dimension of the matrices. By default splines::bs omits one column from the basis matrix such that when using using the function is a regression formula the resulting design matrix is not rank deficient. Using bsplines would suggest using a +0 or -1 in the regression formula to omit the intercept (is nuance is handled in calls to cp so the end user need not worry about it).

Right Continuity

By definition, the \(\boldsymbol{B}_{j,k,\boldsymbol{\xi}}\left(x\right)\) are non-negative right-continuous functions. bsplines adheres to the definition strictly, whereas splines::bs uses a pivoting method to allow for non-zero extrapolations outside the support.

Example: for the cpr::bsplines call, notice that the first, third, and fifth rows, corresponding to values outside the support are all zeros as are the row sums. Compare that to the splines::bs which returns negative values and in the matrix, and all rows sum to 1.

bspline_eg <- bsplines(c(0, 1, 2, 5, 6), bknots = c(1, 5))
## Warning in bsplines(c(0, 1, 2, 5, 6), bknots = c(1, 5)): At least one x value <
## min(bknots)
## Warning in bsplines(c(0, 1, 2, 5, 6), bknots = c(1, 5)): At least one x value
## >= max(bknots)
bs_eg      <- splines::bs(c(0, 1, 2, 5, 6), Boundary.knots = c(1, 5), intercept = TRUE )
## Warning in splines::bs(c(0, 1, 2, 5, 6), Boundary.knots = c(1, 5), intercept =
## TRUE): some 'x' values beyond boundary knots may cause ill-conditioned bases

head(bspline_eg)
##          [,1]     [,2]     [,3]     [,4]
## [1,] 0.000000 0.000000 0.000000 0.000000
## [2,] 1.000000 0.000000 0.000000 0.000000
## [3,] 0.421875 0.421875 0.140625 0.015625
## [4,] 0.000000 0.000000 0.000000 0.000000
## [5,] 0.000000 0.000000 0.000000 0.000000
rowSums(bspline_eg)
## [1] 0 1 1 0 0

head(bs_eg)
##              1         2         3         4
## [1,]  1.953125 -1.171875  0.234375 -0.015625
## [2,]  1.000000  0.000000  0.000000  0.000000
## [3,]  0.421875  0.421875  0.140625  0.015625
## [4,]  0.000000  0.000000  0.000000  1.000000
## [5,] -0.015625  0.234375 -1.171875  1.953125
rowSums(bs_eg)
## [1] 1 1 1 1 1

Control Polygons

The spline \(f\left(\boldsymbol{x}\right) = \boldsymbol{B}_{k,\boldsymbol{\xi}}\left(\boldsymbol{x}\right)\) is a convex sum of the coefficients \(\boldsymbol{\theta}_{\boldsymbol{\xi}}.\) A meaningful geometric relationship between \(\boldsymbol{\xi}\) and \(\boldsymbol{\theta}_{\boldsymbol{\xi}}\) exist in the form of a control polygon, \(CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}},\) a strong convex hull for \(\boldsymbol{B}_{k,\boldsymbol{\xi}}\left(\boldsymbol{x}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}}.\)

\[\begin{equation} CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\xi}} = \left\{ \left( \xi_{j}^{*}, \theta_{j,\boldsymbol{\xi}} \right) \right \}_{j=1}^{\left\lvert\boldsymbol{\theta}_{\boldsymbol{\xi}}\right\rvert}; \quad \quad \xi_{j}^{*} = \frac{1}{k-1} \sum_{i = 1}^{k-1} \xi_{j + i}. \end{equation}\]

\(CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}}\) is a sequence of \(\left\lvert \boldsymbol{\theta}_{\boldsymbol{\xi}} \right\rvert = k + l\) control vertices. The control polygon can be thought of as a piecewise linear function approximating the spline function. Changes in convexity and other subtle characteristics of the spline function are exaggerated by the control polygon.

For example, using the basis matrix defined above and the following coefficients we can easily define a spline function and control polygon.

theta <- matrix(c(1, 0, 3.5, 4.2, 3.7, -0.5, -0.7, 2, 1.5), ncol = 1)
cp0 <- cp(bmat, theta)

Plotting the control polygon and the corresponding spline:

plot(cp0, show_spline = TRUE)

Knot Influence

Spline Spaces and Inserting a Knot

Consider two knot sequences \(\boldsymbol{\xi}\) and \(\boldsymbol{\xi} \cup \boldsymbol{\xi}'.\) Then, for a given polynomial order \(k,\) the spline space \(\mathbb{S}_{k,\boldsymbol{\xi}} \subset \mathbb{S}_{k,\boldsymbol{\xi} \cup \boldsymbol{\xi}'}\) (de Boor 2001, pg135). Given this relationship between spline spaces, and the convex sums generating spline functions, Boehm (1980) presented a method for inserting a knots into the knot sequence such that the spline function is unchanged. Specifically, for \[ \boldsymbol{\xi}' = \left\{\left. \xi_{j}' \right| \min\left(\boldsymbol{\xi}\right) < \xi_j < \max\left(\boldsymbol{\xi}\right) \quad \forall j\right\}, \] then there exist a \(\boldsymbol{\theta}_{\boldsymbol{\xi} \cup \boldsymbol{\xi}'}\) such that \[ \boldsymbol{B}_{k,\boldsymbol{\xi}} \left(x\right)\boldsymbol{\theta}_{\boldsymbol{\xi}} = \boldsymbol{B}_{k,\boldsymbol{\xi}\cup\boldsymbol{\xi}'}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi} \cup \boldsymbol{\xi}'} \quad \forall x \in \left[\min\left(\boldsymbol{\xi}\right), \max\left(\boldsymbol{\xi} \right) \right]. \]

When inserting a singleton \(\xi'\) into \(\boldsymbol{\xi},\) then \[ \boldsymbol{\theta}_{\boldsymbol{\xi} \cup \xi'} = \boldsymbol{W}_{k, \boldsymbol{\xi}}\left(\xi'\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \] where \(\boldsymbol{W}_{k,\boldsymbol{\xi}}\left(\xi'\right)\) is a \(\left( \left\lvert \boldsymbol{\theta} \right\rvert + 1\right) \times \left\lvert \boldsymbol{\theta} \right\rvert\) lower bi-diagonal matrix \[ \boldsymbol{W}_{k,\boldsymbol{\xi}}\left(\xi'\right) = \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 1 - \omega_{1, k, \boldsymbol{\xi}}\left(\xi'\right) & \omega_{1, k, \boldsymbol{\xi}} \left(\xi'\right) & \cdots & 0 \\ 0 & 1 - \omega_{2, k, \boldsymbol{\xi}}\left(\xi'\right) & \cdots 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 1 - \omega_{\left\lvert \boldsymbol{\theta} \right\rvert - 1, k, \boldsymbol{\xi}} \left(\xi'\right) & \omega_{\left\lvert \boldsymbol{\theta} \right\rvert - 1, k, \boldsymbol{\xi}} \left(\xi'\right) \\ 0 & 0 & 0 & 1 \end{pmatrix}, \] with \(\omega_{j,k,\boldsymbol{\xi}}\left(x\right)\) as defined above in the de~Boor recursive algorithm. Through recursion we can insert as many knots into \(\boldsymbol{\xi}\) without changing the value of the spline function.

For an example, insert a knot \(\xi' = 3\) into the control polygon defined above.

cp1 <- insert_a_knot(cp0, xi_prime = 3)
plot(cp0, cp1, color = TRUE, show_spline = TRUE)

Assessing influence of \(\xi_j\)

Here we derive a metric for assessing how much influence \(\xi_j \in \boldsymbol{\xi}\) has on \(\boldsymbol{B}_{k,\boldsymbol{\xi}} \left(x\right)\boldsymbol{\theta}_{\boldsymbol{\xi}}.\) Using the relationship defined by Boehm (1980), we can derive a metric for the influence of \(\xi_j\) on the spline function.

Start with an defined \(k, \boldsymbol{\xi},\) and \(\boldsymbol{\theta}_{k,\boldsymbol{\xi}}.\) The relationship \[ \boldsymbol{\theta}_{\boldsymbol{\xi}} = \boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} \] holds if \(\xi_j\) has zero influence. However, in practice, we would expect that the relationship is \[ \boldsymbol{\theta}_{\boldsymbol{\xi}} = \boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} + \boldsymbol{d} \] for a set of deviations \(\boldsymbol{d}\) which would be equal to \(\boldsymbol{0}\) if \(\xi_j\) has no influence on the spline.

We can estimate \(\boldsymbol{d}\) via least squares. To simplify the notation in the following we drop some of the subscripts and parenthesis, that is, let \(\boldsymbol{W} = \boldsymbol{W}_{k,\boldsymbol{\xi}\backslash\xi_{j}}\left(\xi_{j}\right).\)

\[ \begin{align} \boldsymbol{d} &= \boldsymbol{\theta}_{\boldsymbol{\xi}} - \boldsymbol{W} \boldsymbol{\theta}_{\boldsymbol{\xi}\backslash\xi_{j}} \\ &= \boldsymbol{\theta}_{\boldsymbol{\xi}} - \boldsymbol{W} \left(\boldsymbol{W}^{T}\boldsymbol{W}\right)^{-1}\boldsymbol{W}^{T} \boldsymbol{\theta}_{\boldsymbol{\xi}} \\ &= \left(\boldsymbol{I} - \boldsymbol{W} \left(\boldsymbol{W}^{T}\boldsymbol{W}\right)^{-1}\boldsymbol{W}^{T}\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \\ \end{align} \]

Finally, we define the influence of \(\xi_j\) on \(CP_{k,\boldsymbol{\xi},\boldsymbol{\theta}_{\boldsymbol{\xi}}}\) as \[ w_j = \left\lVert \boldsymbol{d} \right\rVert_{2}^{2}. \]

The influence of knots on the spline used in the above section.

x <- influence_of_iknots(cp0)
summary(x)
##   j iknot  influence influence_rank chisq chisq_rank p_value os_p_value
## 1 5   1.0 1.64661178              5    NA         NA      NA         NA
## 2 6   1.5 0.29066719              2    NA         NA      NA         NA
## 3 7   2.3 0.31205029              3    NA         NA      NA         NA
## 4 8   4.0 0.07702981              1    NA         NA      NA         NA
## 5 9   4.5 0.41987740              4    NA         NA      NA         NA

Let’s look at the following plots to explore the influence of \(\xi_{7}\) (the third interior knot in a \(k=4\) order spline) on the spline. In panel (a) we see the original control polygon and spline along with the coarsened control polygon and spline. Note that the there are fewer control points, and the spline deviates from the original. In panel (b) we see that the restored control polygon is the result of inserting \(\xi_7\) into the coarsened control polygon of panel (a). These plots are also a good example of the local support and strong convexity of the control polygons as there are only \(k + 1 = 5\) control points which are impacted by the removal and re-insertion of \(\xi_7.\) Lastly, in panel (c) we see all three control polygons plotted together.

ggpubr::ggarrange(
  ggpubr::ggarrange(
      plot(x, j = 3, coarsened = TRUE, restored = FALSE, color = TRUE, show_spline = TRUE) +
        ggplot2::theme(legend.position = "none")
    , plot(x, j = 3, coarsened = FALSE, restored = TRUE, color = TRUE, show_spline = TRUE) +
      ggplot2::theme(legend.position = "none")
    , labels = c("(a)", "(b)")
    , nrow = 1
  )
  , plot(x, j = 3, coarsened = TRUE, restored = TRUE, color = TRUE, show_spline = TRUE) +
    ggplot2::theme(legend.position = "bottom")
  , labels = c("", "(c)")
  , nrow = 2
  , ncol = 1
  , heights = c(1, 2)
)

Next, consider the influence of \(\xi_8,\) the fourth interior knot. By the influence metric defined above, this is the least influential knot in the sequence. This can be seen easily as the spline between the original and the coarsened spline are very similar, this is despite the apparent large difference in the magnitude of the control point ordinates between the original and coarsened control polygons. However, when re-inserting \(\xi_8\) the recovered control polygon is very similar to the original, hence the low influence of \(\xi_8.\)

ggpubr::ggarrange(
  ggpubr::ggarrange(
      plot(x, j = 4, coarsened = TRUE, restored = FALSE, color = TRUE, show_spline = TRUE) +
        ggplot2::theme(legend.position = "none")
    , plot(x, j = 4, coarsened = FALSE, restored = TRUE, color = TRUE, show_spline = TRUE) +
      ggplot2::theme(legend.position = "none")
    , labels = c("(a)", "(b)")
    , nrow = 1
  )
  , plot(x, j = 4, coarsened = TRUE, restored = TRUE, color = TRUE, show_spline = TRUE) +
    ggplot2::theme(legend.position = "bottom")
  , labels = c("", "(c)")
  , nrow = 2
  , ncol = 1
  , heights = c(1, 2)
)

If you were required to omit an internal knot, it would be preferable to omit \(\xi_8\) over \(\xi_5, \xi_6, \xi_7,\) or \(\xi_9\) as that will have the least impact on the spline approximation of the original functional form.

Fitting B-splines to noisy data

Start with the spline we have been using and add some noise to it.

set.seed(42)
x <- seq(0, 5.99999, length.out = 100)
bmat <- bsplines(x, iknots = c(1, 1.5, 2.3, 4, 4.5), bknots = c(0, 6))
theta <- matrix(c(1, 0, 3.5, 4.2, 3.7, -0.5, -0.7, 2, 1.5), ncol = 1)
DF <- data.frame(x = x, truth = as.numeric(bmat %*% theta))
DF$y <- as.numeric(bmat %*% theta + rnorm(nrow(bmat), sd = 0.3))

original_data_ggplot_layers <-
  list(
    ggplot2::geom_point(data = DF
                        , mapping = ggplot2::aes(x = x, y = y)
                        , inherit.aes = FALSE
                        , color = "#6F263D"
                        , alpha = 0.2)
    ,
    ggplot2::geom_line(data = DF
                       , mapping = ggplot2::aes(x = x, y = truth)
                       , inherit.aes = FALSE
                       , color = "#6F263D"
                       , alpha = 0.6)
  )

ggplot2::ggplot(DF) + ggplot2::theme_bw() + original_data_ggplot_layers

To fit a spline and control polygon to the noisy data use a formula statement in the cp call. In this example we will use the known internal knots and add one extra.

initial_cp <-
  cp(y ~ bsplines(x, iknots = c(1, 1.5, 2.3, 3.0, 4, 4.5), bknots = c(0, 6))
     , data = DF
     , keep_fit = TRUE # default is FALSE
  )

plot(initial_cp, show_spline = TRUE) + original_data_ggplot_layers

The plot above shows the fitted spline is does well at approximating the true spline function. Just to make it perfectly clear, the regression coefficients are the estimates of the ordinates for the control polygon:

initial_cp$fit |> coef() |> unname()
##  [1]  1.0032390  0.6730762  3.0035459  4.4136192  3.5386419  1.7539897
##  [7] -0.4708520 -0.7312097  2.1631310  1.3575369
initial_cp$cp$theta
##  [1]  1.0032390  0.6730762  3.0035459  4.4136192  3.5386419  1.7539897
##  [7] -0.4708520 -0.7312097  2.1631310  1.3575369

Let’s now look at the influence of the internal knots on the fit

initial_cp |>
  influence_of_iknots() |>
  summary()
##    j iknot  influence influence_rank     chisq chisq_rank   p_value os_p_value
## 1  5   1.0 0.16235075              5 0.9100724          5 0.3400952 0.66205444
## 2  6   1.5 0.05785223              3 0.4331666          2 0.5104392 0.11947108
## 3  7   2.3 0.05128404              2 0.5061398          3 0.4768147 0.30134562
## 4  8   3.0 0.08264093              4 0.8363492          4 0.3604430 0.37410640
## 5  9   4.0 0.02585915              1 0.2230694          1 0.6367111 0.06662766
## 6 10   4.5 0.39575696              6 2.6821030          6 0.1014816 0.47378677

The least influential knot is \(\xi_8 = 3.0,\) the extra knot inserted. Good, we this is the expected result.

How would someone determine if the influence was significant? That is, how can we test the null hypothesis \[H_0: w_{j} = 0\]

Under the null hypothesis that the knot has zero influence and under standard ordinary least squares regression assumptions the regression coefficients, (the ordinates of the control polygon are the regression coefficients), are realizations of a Gaussian random variable. Then

\[ w_j = \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)^{T} \left[ \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right) \boldsymbol{\Sigma} \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right)^{T} \right]^{+} \left(\left(\boldsymbol{I} - \boldsymbol{H}\right)\hat{\boldsymbol{\theta}} \right) \] where \(\boldsymbol{H} = W (W^{T}W)^{-1} W^{T},\) \(\boldsymbol{\Sigma}\) is the variance-covariance matrix for the regression coefficients, and \(+\) denotes the Moore-Penrose inverse of the matrix. By construction, \(\left(\boldsymbol{I} - \boldsymbol{H}\right) \boldsymbol{\Sigma} \left(\boldsymbol{I} - \boldsymbol{H}\right)^{T}\) is singular and thus the standard inverse does not exist and a generalized inverse is necessary. This yields the test statistic following an F distribution with 1, and \(\nu_2\) degrees of freedom, \[ w_j \sim F_{1,\nu_2}.\] The second degree of freedom is dependent on the sample size, the number of regression parameters, and the type of regression used to estimate the ordinates.

To simplify the work, and generalize the approach, we will use the fact that the limiting distribution of \(F_{1, \nu_2}\) as \(\nu_2 \rightarrow \infty\) is \(\chi_{1}^{2},\) that is, \[ w_j \sim \chi_{1}^{2}.\]

Now, if we are interested in removing the knot with the lowest influence we are interested in the minimum. So the hypothesis test we are actually interested in is \[ H_0: w_{(1)} = 0 \] which follows the distribution \[ \Pr \left[ w_{(1)} > w \right] = \sum_{j = 1}^{k+l} \binom{n+l}{j} \left(F_{\chi_{1}^{2}}\left(w\right)\right)^{j} \left( 1 - F_{\chi_{1}^{2}}\left(w\right) \right)^{k+l-j}\] where \(F_{\chi_{1}^{2}}\left(w\right)\) is the distribution function of the chi-square distribution with one degree of freedom.

The results generated by calling influence_of_iknots. report two sets of p-values. The first is the p-value is the probability of observed chisq value greater than reported, and the second p-value is the probability of the rank order statistic exceeding the observed value.

initial_cp |>
  influence_of_iknots() |>
  summary() |>
  knitr::kable(row.names = TRUE)
j iknot influence influence_rank chisq chisq_rank p_value os_p_value
1 5 1.0 0.1623508 5 0.9100724 5 0.3400952 0.6620544
2 6 1.5 0.0578522 3 0.4331666 2 0.5104392 0.1194711
3 7 2.3 0.0512840 2 0.5061398 3 0.4768147 0.3013456
4 8 3.0 0.0826409 4 0.8363492 4 0.3604430 0.3741064
5 9 4.0 0.0258592 1 0.2230694 1 0.6367111 0.0666277
6 10 4.5 0.3957570 6 2.6821030 6 0.1014816 0.4737868

It is worth remembering how fraught binary classification of statistical (non)significance can be. Just because the p-value is low does not mean that the knot is influential, just as a high p-value dose not mean that the knot is not influential. Sample size, over-fitting, and other factors can/will lead to poor selection of a model if you only consider these p-values.

That said, consider \(\xi_9 = 4.0\) which has the lowest influence weight. Let’s omit that knot and refit the model.

first_reduction_cp <-
  cp(y ~ bsplines(x, iknots = c(1, 1.5, 2.3, 3, 4.5), bknots = c(0, 6)), data = DF)
first_reduction_cp |>
  influence_of_iknots() |>
  summary() |>
  knitr::kable(row.names = TRUE)
j iknot influence influence_rank chisq chisq_rank p_value os_p_value
1 5 1.0 0.1957262 4 1.1517249 4 0.2831884 0.4369320
2 6 1.5 0.0345902 2 0.2881794 1 0.5913896 0.0723383
3 7 2.3 0.0238548 1 0.2900566 2 0.5901843 0.3202087
4 8 3.0 0.0782231 3 0.8844555 3 0.3469842 0.2305056
5 9 4.5 5.9074903 5 46.2296402 5 0.0000000 0.0000000

After omitting one knot and refitting the model we see that \(\xi = 2.3\) is the least influential. Just for fun, let’s omit that knot, and refit. Let’s continue that process all the way down to zero knots.

second_reduction_cp <-
  cp(y ~ bsplines(x, iknots = c(1, 1.5, 3, 4.5), bknots = c(0, 6)), data = DF)
second_reduction_cp |>
  influence_of_iknots() |>
  summary() |>
  knitr::kable(row.names = TRUE)
j iknot influence influence_rank chisq chisq_rank p_value os_p_value
1 5 1.0 0.4805752 3 3.8031184 3 0.0511572 0.0146519
2 6 1.5 0.0019515 1 0.0186252 1 0.8914463 0.6315108
3 7 3.0 0.1381508 2 1.3834860 2 0.2395083 0.0450848
4 8 4.5 6.2382229 4 59.9559534 4 0.0000000 0.0000000

The least influential knot in the second reduction is \(\xi = 1.5\) and that will be omitted for the third reduction.

third_reduction_cp <-
  cp(y ~ bsplines(x, iknots = c(1, 3, 4.5), bknots = c(0, 6)), data = DF)
third_reduction_cp |>
  influence_of_iknots() |>
  summary() |>
  knitr::kable(row.names = TRUE)
j iknot influence influence_rank chisq chisq_rank p_value os_p_value
1 5 1.0 3.0669773 2 41.672311 2 0.0000000 0.00e+00
2 6 3.0 0.4304104 1 3.986697 1 0.0458609 9.65e-05
3 7 4.5 6.6872293 3 73.937519 3 0.0000000 0.00e+00

Within the third reduction, the least influential knot is \(\xi = 3.0\) and that knot will be omitted for the fourth reduction.

fourth_reduction_cp <-
  cp(y ~ bsplines(x, iknots = c(1, 4.5), bknots = c(0, 6)), data = DF)
fourth_reduction_cp |>
  influence_of_iknots() |>
  summary() |>
  knitr::kable(row.names = TRUE)
j iknot influence influence_rank chisq chisq_rank p_value os_p_value
1 5 1.0 5.952998 1 107.2956 1 0 0
2 6 4.5 9.517375 2 142.2148 2 0 0

Of the two remaining internal knots, \(\xi - 1.0\) is the least influential and will be omitted for the fifth reduction.

fifth_reduction_cp <-
  cp(y ~ bsplines(x, iknots = 4.5, bknots = c(0, 6)), data = DF)
fifth_reduction_cp |>
  influence_of_iknots() |>
  summary() |>
  knitr::kable(row.names = TRUE)
j iknot influence influence_rank chisq chisq_rank p_value os_p_value
1 5 4.5 4.103958 1 31.66441 1 0 0

Only one knot can be omitted from the fifth to the sixth reduction. Having the sixth reduction, where there are zero internal knots, lets us compare model fits to a model with just a \(k=4\) order polynomial.

sixth_reduction_cp <-
  cp(y ~ bsplines(x, bknots = c(0, 6)), data = DF)
sixth_reduction_cp |>
  influence_of_iknots() |>
  summary()
##   j iknot influence influence_rank chisq chisq_rank p_value os_p_value
## 1 4    NA        NA             NA    NA         NA      NA         NA

Let’s compare all the fits. We will start by looking at the control polygons and splines.

The control polygons, panel (a), we see that the sixth, fifth, and fourth reductions are different from the initial, first, second, and third reductions, and for the splines, the fifth and sixth reductions are easily identified as different.

Next, the graphic below will let us compare these models to the truth, and the observed data.

The dashed black line is the spline fitted to the data (light burgundy dots) and the true value of the target function is the burgundy line. In the fifth reduction there is an easily noticeable difference between the fitted spline and the target function. Between the initial control polygon and the first three reductions it is difficult to visually discern any meaningful difference between the fits.

Thus, I would argue that the third reduction is the preferable model as it has the fewest degrees of freedom while providing a good quality of fit. This conclusion is supported by looking at the residual standard error (rse) \[ rse = \sqrt{ \frac{1}{df} \sum_{i=1}^{n} \left(y_i - f\left(x_i\right)\right)^2 },\] where the degrees of freedom, \(df,\) is the sample size \(n\) minus the number of regression parameters.

list(  initial_cp , first_reduction_cp , second_reduction_cp , third_reduction_cp
     , fourth_reduction_cp , fifth_reduction_cp , sixth_reduction_cp) |>
  rev() |>
  lapply(summary) |>
  do.call(what = rbind, args = _) |>
  cbind(data.frame(reduction = seq(6, 0, by = -1))) |>
  knitr::kable(row.names = TRUE)
dfs n_iknots iknots loglik rss rse wiggle fdsc reduction
1 4 0 -74.52286 25.991002 0.5203264 46.86602 2 6
2 5 1 4.5 -60.13965 19.493599 0.4529854 46.12997 2 5
3 6 2 1, 4.5 -22.06566 9.103022 0.3111923 113.09010 4 4
4 7 3 1, 3, 4.5 -19.96695 8.728836 0.3063633 99.30959 4 3
5 8 4 1, 1.5, …. -19.95683 8.727070 0.3079926 95.81790 4 2
6 9 5 1, 1.5, …. -19.79771 8.699341 0.3091879 92.04602 4 1
7 10 6 1, 1.5, …. -19.67393 8.677833 0.3105163 87.38456 4 0

The wiggle is one measure of wiggliness defined as \[ \int_{\min\left(\boldsymbol{\xi}\right)}^{\max\left(\boldsymbol{\xi}\right)} \left( \frac{d^2}{dx^2} f\left(x\right) \right)^2 dx. \] fdsc reports the number of times the first derivative has a sign change.

Control Polygon Reduction

The exercise above of manually identifying and omitting the knot with the smallest influence in each model would be tedious when working with a large set of initial knots. Fortunately, the process has been automated. Calling cpr on a cpr_cp object defined by a function will automatically omit the internal knot with the lowest influence.

Example with known knots

Apply CPR to the initial_cp from the above example.

cpr0 <- cpr(initial_cp)
cpr0
## A list of control polygons
## List of 7
##  - attr(*, "ioik")=List of 7
##  - attr(*, "class")= chr [1:2] "cpr_cpr" "list"

There are 7 control polygons within the cpr_cpr object, length(initial_cp$iknots) + 1. The indexing is set such at that the \(i^{th}\) element has \(i-1\) internal knots.

Before exploring the results, let’s just verify that the results of the call to cpr are the same as the manual results found about. There are some differences in the metadata of the objects, but the important parts, like the control polygons, are the same.

all.equal( cpr0[["cps"]][[7]][["cp"]],  initial_cp[["cp"]])
## [1] "target is NULL, current is data.frame"

# some attributes are different with the last cp due to how the automation
# creates the call vs how the call was created manually.
call_idx <- which(names(cpr0[["cps"]][[6]]) == "call")
all.equal( cpr0[["cps"]][[6]][-call_idx], first_reduction_cp [-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[5]][-call_idx], second_reduction_cp[-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[4]][-call_idx], third_reduction_cp [-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[3]][-call_idx], fourth_reduction_cp[-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[2]][-call_idx], fifth_reduction_cp [-call_idx])
## [1] "target is NULL, current is list"
all.equal( cpr0[["cps"]][[1]][-call_idx], sixth_reduction_cp [-call_idx], check.attributes = FALSE)
## [1] "target is NULL, current is list"

In the manual process we identified third_reduction_cp as the preferable model. For the cpr0 object we can quickly see a similar result as we did for the manual process.

s0 <- summary(cpr0)
knitr::kable(s0, row.names = TRUE)
dfs n_iknots iknots loglik rss rse wiggle fdsc Pr(>w_(1))
1 4 0 -74.52286 25.991002 0.5203264 46.86602 2 NA
2 5 1 4.5 -60.13965 19.493599 0.4529854 46.12997 2 0.0000000
3 6 2 1, 4.5 -22.06566 9.103022 0.3111923 113.09010 4 0.0000000
4 7 3 1, 3, 4.5 -19.96695 8.728836 0.3063633 99.30959 4 0.0000965
5 8 4 1, 1.5, …. -19.95683 8.727070 0.3079926 95.81790 4 0.6315108
6 9 5 1, 1.5, …. -19.79771 8.699341 0.3091879 92.04602 4 0.0723383
7 10 6 1, 1.5, …. -19.67393 8.677833 0.3105163 87.38456 4 0.0666277

The additional columns in this summary, loglik_elbow and rse_elbow, indicate a location in the plot for either the loglik or rse by model index (degrees of freedom) where the trade-off between additional degrees of freedom and improvement in the fix statistic is negligible. See plot below. This is determined by finding the breakpoint such that a continuous, but not differentiable at breakpoint, quadratic spline fits the plot with minimal residual standard error.

ggpubr::ggarrange(
  ggpubr::ggarrange(
      plot(cpr0, color = TRUE)
    , plot(cpr0, show_cp = FALSE, show_spline = TRUE, color = TRUE)
    , ncol = 1
    , common.legend = TRUE
  )
  ,
  ggpubr::ggarrange(
      plot(s0, type = "rse")
    , plot(s0, type = "rss")
    , plot(s0, type = "loglik")
    , plot(s0, type = "wiggle")
    , plot(s0, type = "fdsc")
    , plot(s0, type = "Pr(>w_(1))")
  , common.legend = TRUE
  )
  , widths = c(2, 3)
)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).

Example when knots are unknown

In practice it is be extremely unlikely to know where knots should be placed. Analytic solutions are difficult, if not impossible to derive (Jupp 1978). However, an optimal solution may not be necessary.

From de Boor (2001) (page 106) “…a B-spline doesn’t change much if one changes its \(k+1\) knots a little bit. Therefore, if one has multiple knots, then it is very easy to find B-spline almost like with simple knots: Simply replace knot of multiplicity \(r > 1\) by \(r\) simple knots nearby.”

That is,

\[ \boldsymbol{B}_{k, \boldsymbol{\xi}}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi}} \approx \boldsymbol{B}_{k, \boldsymbol{\xi'}}\left(x\right) \boldsymbol{\theta}_{\boldsymbol{\xi'}} \] where \[ \left\lvert \xi_j - \xi_j' \right\rvert < \delta \quad \text{for small} \quad \delta > 0. \]

So, in the case when we are looking for a good set of knots (a good set of knots should be parsimonious, and provide a good model fit) we start with an initial knot sequence with a high cardinality, this is not without precedent Eilers and Marx (2010). We then apply the CPR algorithm to find a good set of knots.

For example we will use 50 internal knots. Not surprisingly we have a fit that is more “connect-the-dots” than a smooth fit.

initial_cp <- cp(y ~ bsplines(x, df = 54, bknots = c(0, 6)), data = DF)

ggpubr::ggarrange(
  plot(initial_cp, show_cp = TRUE, show_spline = FALSE) + original_data_ggplot_layers
  ,
  plot(initial_cp, show_cp = FALSE, show_spline = TRUE) + original_data_ggplot_layers
)