Quick_Start with the changeS Package

An S-Curve Method for Estimating Abrupt and Gradual Changepoints

Using This Document for a Quick Introduction to the Package

Reading through the first example (“Nile River Data”) is sufficient for a quick start.

What This Package Does

The code here implements the method of Jiang et al (2023), whch does changepoint detection/estimation for changes in mean, performed by using an S-curve (logistic function) to approximate a step function. This enables asymptotic standard errors, and associated confidence intervals and tests for changepoint locations and change magnitudes. Both abrupt and gradual changes may be modeled.

Also, changes in slope and intercept in piecewise linear models can be analyzed, again with the ability to form confidence intervals for the various quantities of interest..

Example: Nile River data

The Nile dataset, built-in to R, consists of yearly measurements of the height of the river during 1871-1970. Here is the code:

nile <- data.frame(t=1871:1970, ht=Nile)
nileOut <- fitS(nile,1,2,10)  # abrupt change model
nileOut
# [1] "point estimates of the alpha_i"
#  postMean   preMean  changePt 
#  849.9696 1097.9298 1898.3814 
# [1] "covariance matrix"
#             postMean     preMean    changePt
# postMean 228.8062041   0.7764724  -0.5270315
# preMean    0.7764724 609.8103449 -11.2918541
# changePt  -0.5270315 -11.2918541   6.1601174
# [1] "standard error of the difference between pre-changepoint and post-changepoint means"
# [1] 28.93205

Our model concluded that there is an abrupt changepoint in the middle of year 1898. This matches the fact that the British started the construction of the Aswan Low Dam in that year. We modeled an abrupt change here, by setting the S-curve slope to a large value, 10.

Here domain expertise would identify the location of the changepoint, so the main interest is the value of the change. A major drop of water flow is detected around this point, from 1097.75 down to 849.972. An approximate 95% confidence interval for the change in mean would then be

849.972 - 1097.75 ± 1.96 x 28.93205

We can also plot the fit against the raw data:

Example: Cancer Occurrence Rates

We next consider applying the S-Curve approach to data collected in on the rates of breast cancer among women in Sweden. There had been speculation that such rates rise with the onset of menopause; see Pawitan (2005; we are grateful to Prof. Pawitan for making his data available.

While that paper considers an abrupt model, the relationship, if one exists, may be gradual. For a given woman the transition to menopause is gradual. And even if it were abrupt, different women experience menopause at different ages, so that the data would follow a mixture of abrupt changes, hence gradual overall. Thus this is a good example use case for our S-curve approach.

data(cancerRates)
crOut <- fitS(cancerRates,1,2)
crOut
# [1] "point estimates of the alpha_i"
#  postMean   preMean     slope  changePt 
#  8.965952  3.425623  1.178302 43.262817 
# [1] "covariance matrix"
#             postMean    preMean      slope   changePt
# postMean  0.16734502 -0.1215118 -0.1318708 0.02302526
# preMean  -0.12151178  1.1340273  0.5080011 0.44335717
# slope    -0.13187077  0.5080011  0.4107214 0.16400967
# changePt  0.02302526  0.4433572  0.1640097 0.29987282
# [1] "standard error of the difference between pre-changepoint and post-changepoint means"
# [1] 1.242737

Here we did not specify an S-curve slope, asking fitS to estimate it for us. This models a gradual change.

We can plot the output:

plot(crOut)

Example: piecewise linear model

This dataset came from a study that investigated the impact of Medicare, the US medical insurance program for retired people. One nomimally qualifies at age 65, though this can occur earlier or later. Here we consider 90-day mortality in relation to age.

# get Medicare data of David Card et al
con <- url('https://github.com/108michael/ms_thesis/raw/master/medicare.Rdata')
load(con)
z <- fitS_linear(medicare[,c(4,3)],1,2)
summary(z)

# Formula: y ~ big_linear_guy(b1, h1, s1, c, b2, h2, s2, x = x)
# 
# Parameters:
#     Estimate Std. Error t value Pr(>|t|)    
# b1   0.60490    0.07135   8.478 9.87e-14 ***
# h1   0.78810    0.18587   4.240 4.59e-05 ***
# s1   9.59339  717.41168   0.013  0.98935    
# c   65.99895    0.10660 619.109  < 2e-16 ***
# b2 -23.93643    4.40366  -5.436 3.19e-07 ***
# h2 -37.23297   12.69091  -2.934  0.00406 ** 
# s2   2.89020    1.76829   1.634  0.10495    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.6417 on 113 degrees of freedom
# 
# Number of iterations to convergence: 91 
# Achieved convergence tolerance: 1.49e-08

As we would hope, mortality did decline after people became eligible for Medicare. For instance, the intercept went down from -23.9364322 to -37.2329709. Interestingly, the changepoint is close to 66, indicating that many people opted to start the program a little later.

An approximate 95% confidence interval for the location of the changepoint is 65.99895 ± 1.96 x 0.10660.

Comparison to Other Packages

There are many R packages for determining changepoints. We will mentipn two here for comparison to changeS, the changepoints and mcp packages.

References

An S-Curve Method for Abrupt and Gradual Changepoint Analysis, Lan Jiang, Collin Kennedy, Norman Matloff; SDSS 2023

Encyclopedia of Biostatistics, Change-point Problem Yudi Pawitan, 2005, https://doi.org/10.1002/0470011815.b2a12011