The Evasive Spline

Estimation

Modeling

Basis Functions & Splines

Author

Alex Zajichek

Published

January 13, 2023

This one always gets me. I’ve learned and forgot how splines work many times over the years, and when I need to relearn it, I read the Moving Beyond Linearity chapter of An Introduction to Statistical Learning.

Basis Functions are just a general approach for transforming a simple model like:

\[Y_i = \beta_0 + \beta_1X_i + \epsilon_i\] into a linear combination of transformations of the \(X_i\) of the form:

\[Y_i = \beta_0 + \beta_1b_1(X_i) + \beta_2b_2(X_i) + ... + \beta_Kb_k(X_i) + \epsilon_i\] where \(b_i\) is a known function that transforms the predictor variable. Note: \(\beta_1\) is not the same in both of these, they are just placeholders for an arbitrary parameter. For example, in the case of a piecewise regression where the model is of the form:

\[Y_i = \beta_0 + \beta_1I(X_i < c_1) + \beta_2I(c_1 < X_i < c_2) + ... + \beta_kI(c_{k-1} < X_i < c_k) + \epsilon_i\] the indicators are basis functions such that:

\[b_j(X_i) = I(c_{j-1} < X_i < c_j) \hskip.1in \text{for j=1,..,k}\] or in a polynomial model, the basis functions are \(b_j(X_i) = X_i^j\).

Knots are points (cutoffs) along \(X_i\) that a local regression starts/ends. For example, we might fit a cubic model (e.g., with parameters \(\beta_1, \beta_2, \beta_3\)) where \(X_i < C\), and another model (with a separate set of \(\beta_1, \beta_2, \beta_3\)) where \(X_i \geq C\). \(C\) is a knot. In this sense, the piecewise regression above was also a polynomial regression with degree 0, and knots at each value of \(c_j\).

The general problem with an unconstrained polynomial model is that there are no restrictions that force a smooth function across \(X\), so there are discontinuities. Thus, restrictions need to be put in place such as (1) making it continuous at the knot(s), and/or even further, (2) making the first and second derivatives continuous at the knots. These restrictions reduce the complexity of the model (i.e., the number of parameters we estimate).

Cubic Splines

A cubic spline with \(K\) knots uses \(K+4\) parameters. The best way to do this is to use (1) the basis of a cubic polynomial (\(x, x^2, x^3\)) and (2) a truncated power basis for each knot:

\[h(x,\nu) = {(x-\nu)}_+^3\] where \(\nu\) is the knot location. Thus, a one-knot model looks like:

\[Y_i = \beta_0 + \beta_1X_i + \beta_2X_i^2 + \beta_3X_i^3 + \beta_4h(X_i,\nu_1) + \epsilon_i\] We can add more knots as needed, and it simply adds \(h(x,\nu)\) terms only (so 1 more parameter per knot). A function of this form is guaranteed to have continuous first and second derivatives.

So how does this relate to what is produced in the rcs function from the rms package?

Well, the package actually fits a restricted cubic spline, which is a natural spline. This adds even more restrictions that the general cubic spline by forcing it to be linear where \(X\) is less than the smallest knot and where \(X\) is larger than the largest not (i.e., the boundaries). These add an additional two constraints at each boundary. So if we have a regular cubic spline model above with 3 knots (i.e., 7 parameters), then a restricted cubic spline model with 3 knots should have only 3 parameters.

Code

set.seed(123)
x <- rnorm(100)
y <- .5*x^2 - .75*x + rnorm(100)
x_trans <- rms::rcs(x, 3)
head(x_trans)

               x         x'
[1,] -0.56047565 0.02405534
[2,] -0.23017749 0.10816181
[3,]  1.55870831 2.14013743
[4,]  0.07050839 0.27135366
[5,]  0.12928774 0.31547103
[6,]  1.71506499 2.36735648
attr(,"class")
[1] "rms"
attr(,"name")
[1] "x"
attr(,"label")
[1] "x"
attr(,"assume")
[1] "rcspline"
attr(,"assume.code")
[1] 4
attr(,"parms")
[1] -1.06822046  0.06175631  1.26449867
attr(,"nonlinear")
[1] FALSE  TRUE
attr(,"colnames")
[1] "x"  "x'"

Code

summary(lm(y~x_trans))


Call:
lm(formula = y ~ x_trans)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9888 -0.7341 -0.0803  0.6900  3.2215 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.3838     0.1842  -2.084   0.0398 *  
x_transx     -1.6552     0.2449  -6.758 1.04e-09 ***
x_transx'     1.2922     0.2925   4.417 2.61e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9823 on 97 degrees of freedom
Multiple R-squared:  0.3805,    Adjusted R-squared:  0.3677 
F-statistic: 29.79 on 2 and 97 DF,  p-value: 8.214e-11

We can see this model contains three estimated parameters as expected. The actual transformation completed for the restricted cubic spline in producing, in general, the \(K-2\) additional predictors is more complex than the cubic spline (although similar). In this case, the 3 knot positions were selected to be:

Code

knots <- attr(x_trans, "parms")
knots

[1] -1.06822046  0.06175631  1.26449867

Note, these are just the \(10^{th}, 50^{th}, 90^{th}\) percentiles:

Code

quantile(x, c(.1,.5,.9))

        10%         50%         90% 
-1.06822046  0.06175631  1.26449867

The following transformation is made (general solution here):

\[X_{trans} = (x-\nu_1)_+^3 - (x-\nu_2)_+^3\frac{\nu_3-\nu_1}{\nu_3-\nu_2} + (x-\nu_3)_+^3\frac{\nu_2-\nu_1}{\nu_3-\nu_2}\] Let’s check out that transformation on our data:

Code

tibble::tibble(
  x = as.numeric(x_trans[,"x"]),
  x_trans_actual = as.numeric(x_trans[,"x'"]),
  x_trans_calculated = 
    pmax((x-knots[1])^3, 0) -
    pmax((x-knots[2])^3, 0) * ((knots[3]-knots[1]) / (knots[3]-knots[2])) +
    pmax((x-knots[3])^3, 0) * ((knots[2]-knots[1])/(knots[3]-knots[2]))
)

# A tibble: 100 × 3
         x x_trans_actual x_trans_calculated
     <dbl>          <dbl>              <dbl>
 1 -0.560          0.0241             0.131 
 2 -0.230          0.108              0.589 
 3  1.56           2.14              11.6   
 4  0.0705         0.271              1.48  
 5  0.129          0.315              1.72  
 6  1.72           2.37              12.9   
 7  0.461          0.634              3.45  
 8 -1.27           0                  0     
 9 -0.687          0.0102             0.0555
10 -0.446          0.0443             0.241 
# ℹ 90 more rows

For some reason this is close but off by a factor close to 5? Looking into the documentation/code, it is because of the norm argument in the Hmisc::rcspline.eval function. When we run this, we get the same result that we calculated (which is the original restricted cubic spline calculation):

Code

head(Hmisc::rcspline.eval(x,nk=3, norm = 0))

          [,1]
[1,]  0.130899
[2,]  0.588571
[3,] 11.645726
[4,]  1.476592
[5,]  1.716660
[6,] 12.882156

By default, this function uses norm=2, which “normalizes by the square of the spacing between the first and last knots…has the advantage of making all nonlinear terms be on the x-scale”.

Code

head(Hmisc::rcspline.eval(x,nk=3, norm = 2))

           [,1]
[1,] 0.02405534
[2,] 0.10816181
[3,] 2.14013743
[4,] 0.27135366
[5,] 0.31547103
[6,] 2.36735648