This one always gets me. I’ve learned and forgot how splines work many times over the years, and when I need to relearn it, I read the Moving Beyond Linearity chapter of An Introduction to Statistical Learning.
Basis Functions are just a general approach for transforming a simple model like:
\[Y_i = \beta_0 + \beta_1X_i + \epsilon_i\] into a linear combination of transformations of the \(X_i\) of the form:
\[Y_i = \beta_0 + \beta_1b_1(X_i) + \beta_2b_2(X_i) + ... + \beta_Kb_k(X_i) + \epsilon_i\] where \(b_i\) is a known function that transforms the predictor variable. Note: \(\beta_1\) is not the same in both of these, they are just placeholders for an arbitrary parameter. For example, in the case of a piecewise regression where the model is of the form:
\[Y_i = \beta_0 + \beta_1I(X_i < c_1) + \beta_2I(c_1 < X_i < c_2) + ... + \beta_kI(c_{k-1} < X_i < c_k) + \epsilon_i\] the indicators are basis functions such that:
\[b_j(X_i) = I(c_{j-1} < X_i < c_j) \hskip.1in \text{for j=1,..,k}\] or in a polynomial model, the basis functions are \(b_j(X_i) = X_i^j\).
Knots are points (cutoffs) along \(X_i\) that a local regression starts/ends. For example, we might fit a cubic model (e.g., with parameters \(\beta_1, \beta_2, \beta_3\)) where \(X_i < C\), and another model (with a separate set of \(\beta_1, \beta_2, \beta_3\)) where \(X_i \geq C\). \(C\) is a knot. In this sense, the piecewise regression above was also a polynomial regression with degree 0, and knots at each value of \(c_j\).
The general problem with an unconstrained polynomial model is that there are no restrictions that force a smooth function across \(X\), so there are discontinuities. Thus, restrictions need to be put in place such as (1) making it continuous at the knot(s), and/or even further, (2) making the first and second derivatives continuous at the knots. These restrictions reduce the complexity of the model (i.e., the number of parameters we estimate).
Cubic Splines
A cubic spline with \(K\) knots uses \(K+4\) parameters. The best way to do this is to use (1) the basis of a cubic polynomial (\(x, x^2, x^3\)) and (2) a truncated power basis for each knot:
\[h(x,\nu) = {(x-\nu)}_+^3\] where \(\nu\) is the knot location. Thus, a one-knot model looks like:
\[Y_i = \beta_0 + \beta_1X_i + \beta_2X_i^2 + \beta_3X_i^3 + \beta_4h(X_i,\nu_1) + \epsilon_i\] We can add more knots as needed, and it simply adds \(h(x,\nu)\) terms only (so 1 more parameter per knot). A function of this form is guaranteed to have continuous first and second derivatives.
So how does this relate to what is produced in the rcs function from the rms package?
Well, the package actually fits a restricted cubic spline, which is a natural spline. This adds even more restrictions that the general cubic spline by forcing it to be linear where \(X\) is less than the smallest knot and where \(X\) is larger than the largest not (i.e., the boundaries). These add an additional two constraints at each boundary. So if we have a regular cubic spline model above with 3 knots (i.e., 7 parameters), then a restricted cubic spline model with 3 knots should have only 3 parameters.
Call:
lm(formula = y ~ x_trans)
Residuals:
Min 1Q Median 3Q Max
-1.9888 -0.7341 -0.0803 0.6900 3.2215
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.3838 0.1842 -2.084 0.0398 *
x_transx -1.6552 0.2449 -6.758 1.04e-09 ***
x_transx' 1.2922 0.2925 4.417 2.61e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9823 on 97 degrees of freedom
Multiple R-squared: 0.3805, Adjusted R-squared: 0.3677
F-statistic: 29.79 on 2 and 97 DF, p-value: 8.214e-11
We can see this model contains three estimated parameters as expected. The actual transformation completed for the restricted cubic spline in producing, in general, the \(K-2\) additional predictors is more complex than the cubic spline (although similar). In this case, the 3 knot positions were selected to be:
Code
knots <-attr(x_trans, "parms")knots
[1] -1.06822046 0.06175631 1.26449867
Note, these are just the \(10^{th}, 50^{th}, 90^{th}\) percentiles:
Code
quantile(x, c(.1,.5,.9))
10% 50% 90%
-1.06822046 0.06175631 1.26449867
The following transformation is made (general solution here):
\[X_{trans} = (x-\nu_1)_+^3 - (x-\nu_2)_+^3\frac{\nu_3-\nu_1}{\nu_3-\nu_2} + (x-\nu_3)_+^3\frac{\nu_2-\nu_1}{\nu_3-\nu_2}\] Let’s check out that transformation on our data:
For some reason this is close but off by a factor close to 5? Looking into the documentation/code, it is because of the norm argument in the Hmisc::rcspline.eval function. When we run this, we get the same result that we calculated (which is the original restricted cubic spline calculation):
By default, this function uses norm=2, which “normalizes by the square of the spacing between the first and last knots…has the advantage of making all nonlinear terms be on the x-scale”.
---title: "The Evasive Spline"description: "Basis Functions & Splines"author: "Alex Zajichek"date: "1/13/2023"image: "feature.png"categories: - Estimation - Modelingformat: html: code-fold: true code-tools: true---This one always gets me. I've learned and forgot how splines work many times over the years, and when I need to relearn it, I read the _Moving Beyond Linearity_ chapter of [An Introduction to Statistical Learning](https://www.statlearning.com/). **Basis Functions** are just a general approach for transforming a simple model like:$$Y_i = \beta_0 + \beta_1X_i + \epsilon_i$$into a linear combination of transformations of the $X_i$ of the form:$$Y_i = \beta_0 + \beta_1b_1(X_i) + \beta_2b_2(X_i) + ... + \beta_Kb_k(X_i) + \epsilon_i$$where $b_i$ is a known function that transforms the predictor variable. _Note: $\beta_1$ is not the same in both of these, they are just placeholders for an arbitrary parameter._ For example, in the case of a piecewise regression where the model is of the form:$$Y_i = \beta_0 + \beta_1I(X_i < c_1) + \beta_2I(c_1 < X_i < c_2) + ... + \beta_kI(c_{k-1} < X_i < c_k) + \epsilon_i$$the indicators are _basis functions_ such that:$$b_j(X_i) = I(c_{j-1} < X_i < c_j) \hskip.1in \text{for j=1,..,k}$$or in a polynomial model, the basis functions are $b_j(X_i) = X_i^j$.**Knots** are points (cutoffs) along $X_i$ that a _local_ regression starts/ends. For example, we might fit a cubic model (e.g., with parameters $\beta_1, \beta_2, \beta_3$) where $X_i < C$, and another model (with a _separate_ set of $\beta_1, \beta_2, \beta_3$) where $X_i \geq C$. $C$ is a _knot_. In this sense, the piecewise regression above was also a polynomial regression with degree 0, and knots at each value of $c_j$.The general problem with an unconstrained polynomial model is that there are no restrictions that force a smooth function across $X$, so there are discontinuities. Thus, restrictions need to be put in place such as (1) making it continuous at the knot(s), and/or even further, (2) making the first and second derivatives continuous at the knots. These restrictions reduce the complexity of the model (i.e., the number of parameters we estimate). # Cubic SplinesA cubic spline with $K$ knots uses $K+4$ parameters. The best way to do this is to use (1) the basis of a cubic polynomial ($x, x^2, x^3$) and (2) a truncated power basis _for each knot_:$$h(x,\nu) = {(x-\nu)}_+^3$$where $\nu$ is the knot location. Thus, a _one-knot_ model looks like:$$Y_i = \beta_0 + \beta_1X_i + \beta_2X_i^2 + \beta_3X_i^3 + \beta_4h(X_i,\nu_1) + \epsilon_i$$We can add more knots as needed, and it simply adds $h(x,\nu)$ terms only (so 1 more parameter per knot). A function of this form is guaranteed to have continuous first and second derivatives.**So how does this relate to what is produced in the `rcs` function from the `rms` package?**Well, the package actually fits a _restricted_ cubic spline, which is a _natural spline_. This adds even more restrictions that the general cubic spline by forcing it to be _linear_ where $X$ is less than the smallest knot and where $X$ is larger than the largest not (i.e., the boundaries). These add an additional _two_ constraints at _each_ boundary. So if we have a regular cubic spline model above with 3 knots (i.e., 7 parameters), then a _restricted_ cubic spline model with 3 knots should have only 3 parameters.```{r}set.seed(123)x <-rnorm(100)y <- .5*x^2- .75*x +rnorm(100)x_trans <- rms::rcs(x, 3)head(x_trans)summary(lm(y~x_trans))```We can see this model contains three estimated parameters as expected. The actual transformation completed for the restricted cubic spline in producing, in general, the $K-2$ additional predictors is more complex than the cubic spline (although similar). In this case, the 3 knot positions were selected to be:```{r}knots <-attr(x_trans, "parms")knots```Note, these are just the $10^{th}, 50^{th}, 90^{th}$ percentiles:```{r}quantile(x, c(.1,.5,.9))```The following transformation is made (general solution [here](https://support.sas.com/resources/papers/proceedings16/5621-2016.pdf)):$$X_{trans} = (x-\nu_1)_+^3 - (x-\nu_2)_+^3\frac{\nu_3-\nu_1}{\nu_3-\nu_2} + (x-\nu_3)_+^3\frac{\nu_2-\nu_1}{\nu_3-\nu_2}$$Let's check out that transformation on our data:```{r}tibble::tibble(x =as.numeric(x_trans[,"x"]),x_trans_actual =as.numeric(x_trans[,"x'"]),x_trans_calculated =pmax((x-knots[1])^3, 0) -pmax((x-knots[2])^3, 0) * ((knots[3]-knots[1]) / (knots[3]-knots[2])) +pmax((x-knots[3])^3, 0) * ((knots[2]-knots[1])/(knots[3]-knots[2])))```For some reason this is close but off by a factor close to 5? Looking into the documentation/code, it is because of the `norm` argument in the `Hmisc::rcspline.eval` function. When we run this, we get the same result that we calculated (which is the original restricted cubic spline calculation):```{r}head(Hmisc::rcspline.eval(x,nk=3, norm =0))```By default, this function uses `norm=2`, which _"normalizes by the square of the spacing between the first and last knots...has the advantage of making all nonlinear terms be on the x-scale"_.```{r}head(Hmisc::rcspline.eval(x,nk=3, norm =2))```