Before estimating the SDR subspaces, it is required to estimate the dimension (d) of the SDR subspace and tunning parameters sw2
, and st2
. In Section 2.1, the estimation of dimension (d) is demonstrated. The estimation of the tuning parameter sw2
for both subspaces, i.e., for the central subspace (CS) and the central mean subspace (CMS), is explained in Section 2.2.1. Moreover, the estimation of st2
for the central subspace (CS) is explained in Section 2.2.2. Finally, the use of itdr()
function to estimate the central subspace is demonstrated in Section 2.3.
2.1: Estimating the dimension (d) of sufficient dimension reduction (SDR) subspaces
Bootstrap estimation procedure is used to estimate the unknown dimension (d) of sufficient dimension reduction subspaces, for more details see Zhu and Zeng (2006). The d.boots()
function can be used to estimate the dimension (d). Now, let’s estimate the dimension d
of the central subspace of the automobile
dataset, here we use the response variable y, and the predictor variables x as mentioned in Zhu and Zeng (2006). We need to pass the arguments to the d.boots()
function as space="pdf"
to estimate the CS, xdensity="normal"
for assuming normal density for the predictors, and method="FM"
for useing the Fourier transformation method.
#Install package
library(itdr)
# Use dataset available in itdr package
data(automobile)
head(automobile)
automobile.na=na.omit(automobile)
# prepare response and predictor variables
auto_y=log(automobile.na[,26])
auto_xx=automobile.na[,c(10,11,12,13,14,17,19,20,21,22,23,24,25)]
auto_x=scale(auto_xx) # Standardize the predictors
# call to the d.boots() function with required arguments
d_est=d.boots(auto_y,auto_x,plot=TRUE,space="pdf",xdensity = "normal",method="FM")
auto_d=d_est$d.hat
# Estimated d_hat=2
Here, the estimate of the dimension of the central subspace for automobile
data is 2, i.e., d_hat=2.
2.2: Estimating tuning parameters and bandwidth parameters for Gaussian kernel density estimation
There are two tuning parameters that need to be estimated in the process of estimating SDR subspaces using the Fourier method: namely sw2
and st2
. The sw2
required in both the central mean (CMS) and the central subspace (CS). However, the st2
required only in the central subspace. The code in Section 2.2.1 demonstrates the use of function wx()
to estimate the tunning parameter sw2
, and the use of the function wy()
to estimate the tunning parameter st2
is described in Section 2.2.2.
2.2.1: Estimate sw2
To estimate the tuning parameter sw2
, we can use wx()
function with the subspace option either space="pdf"
for the CS and space="mean"
for the CMS. During the estimation process, the other parameters are fixed. The following R code chunk demonstrates the estimation of sw2
for the central subspace.
2.2.2: Estimate st2
To estimate the tuning parameter st2
, we can use wy()
function. Here, the other parameters are fixed. Notice that we do not need to specify the space
, because the tuning parameter st2
only required for the central subspace (CS).
2.2.3: Estimate the bandwidth (h
) of the Gaussian kernel density function
If the distribution function of the predictor variables is unknown, then we use the Gaussian kernel density estimation to approximate the density function of the predictor variables. However, the bandwidth parameter needs to be estimated when xdensity="kernel"
is used. The wh()
function uses the bootstrap estimator to estimate the bandwidth of the Gaussian kernel density estimation.
h_hat=wh(auto_y,auto_x,auto_d,wx=5,wy=0.1,wh_seq=seq(0.1,2,by=.1),B=50,space = "pdf",xdensity = "kernel",method="FM")
#Bandwidth estimator for Gaussian kernel density estimation for central subspace
h_hat$h.hat #we have the estimator as h_hat=1
2.3: Estimate SDR subspaces
We have described the estimation procedure of the tunning parameters in the Fourier method in Sections 2.1-2.2. Now, we are ready to estimate the SDR subspaces. Zhu and Zeng (2006) used the Fourier method to facilitate the estimation of the SDR subspaces when the predictors are following a multivariate normal distribution. However, when the predictor variables is following an elliptical distribution or more generally when the distribution of the predictors is unknow, the predictors’ distribution function is approximated by using the Gaussian kernel density estimation (Zeng and Zhu, 2010). The itdr()
function can be used to estimate the SDR subspaces under FM
method as follows. Since the default setting of the itdr()
function has method="FM"
, It is optional to specify the method as “FM”.
library(itdr)
data(automobile)
head(automobile)
df=cbind(automobile[,c(26,10,11,12,13,14,17,19,20,21,22,23,24,25)])
dff=as.matrix(df)
automobi=dff[complete.cases(dff),]
d=2; # Estimated value from Section 2.1
wx=.14 # Estimated value from Section 2.2.1
wy=.9 # Estimated value from Section 2.2.2
wh=1.5 # Estimated value from Section 2.2.3
p=13 # Estimated value from Section 2.3
y=automobi[,1]
x=automobi[,c(2:14)]
xt=scale(x)
#Distribution of the predictors is a normal distribution
fit.F_CMS=itdr(y,xt,d,wx,wy,wh,space="pdf",xdensity = "normal",method="FM")
round(fit.F_CMS$eta_hat,2)
#Distribution of the predictors is a unknown (using kernel method)
fit.F_CMS=itdr(y,xt,d,wx,wy,wh,space="pdf",xdensity = "kernel",method="FM")
round(fit.F_CMS$eta_hat,2)