Numerical Example - std_selected

Shu Fai Cheung and David Weng Ngai Vong

2022-04-21

Purpose

This document demonstrates how to use std_selected() to do mean centering and rescaling for selected variables in a regression model. A moderated regression model is used as an example.

Setup the Environment

library(stdmod)
library(visreg) # For visualizing the moderation effect
library(lm.beta) # For generating the typical standardized solution

Load the Dataset

data(sleep_emo_con)
head(sleep_emo_con)
#>   case_id sleep_duration conscientiousness emotional_stability age gender
#> 1       1              6               3.6                 3.6  20 female
#> 2       2              4               3.8                 2.4  20 female
#> 3       3              7               4.3                 2.7  20 female
#> 4       4              8               2.9                 1.3  20 female
#> 5       5              8               3.3                 2.6  22 female
#> 6       6              8               2.4                 2.2  25 female

This data set has 500 cases of data. The variables are sleep duration, age, gender, and the scores from two personality scales, emotional stability and conscientiousness of the IPIP Big Five markers. Please refer to (citation to be included) for the detail of the data set.

Moderated Regression

Suppose we are interested in predicting sleep duration by emotional stability, after controlling for gender and age. However, we suspect that the effect of emotional stability, if any, may be moderated by conscientiousness. Therefore, we conduct a moderated regression as follow:

lm_raw <- lm(sleep_duration ~ age + gender + emotional_stability*conscientiousness, sleep_emo_con)
summary(lm_raw)
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = sleep_emo_con)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.0841 -0.7882  0.0089  0.9440  6.1189 
#> 
#> Coefficients:
#>                                       Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)                            1.85154    1.35224   1.369  0.17155   
#> age                                    0.01789    0.02133   0.838  0.40221   
#> gendermale                            -0.26127    0.16579  -1.576  0.11570   
#> emotional_stability                    1.32151    0.45039   2.934  0.00350 **
#> conscientiousness                      1.20385    0.37062   3.248  0.00124 **
#> emotional_stability:conscientiousness -0.33140    0.13273  -2.497  0.01286 * 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.384 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05
visreg(lm_raw, "emotional_stability", "conscientiousness",
            breaks = 2, overlay = TRUE)

The results show that conscientiousness significantly moderates the effect of emotional stability on sleep duration.

Mean Center Conscientiousness

To know the effect of emotional stability when conscientiousness is equal to its mean, we can center conscientiousness by its mean in the data and redo the moderated regression. Instead of creating the new variable and rerun the regression, we can pass the previous output to std_selected() and specify the variables to be mean centered.

lm_w_centered <- std_selected(lm_raw, to_center = ~ conscientiousness)
summary(lm_w_centered)
#> 
#> Selected variable(s) are centered and/or scaled
#> - Variable(s) centered: conscientiousness
#> - Variable(s) scaled:
#>                     centered_by scaled_by
#> sleep_duration           0.0000         1
#> age                      0.0000         1
#> gender                   0.0000         1
#> emotional_stability      0.0000         1
#> conscientiousness        3.3432         1
#> 
#> Note:
#> - Centered by 0 or NA: No centering
#> - Scaled by 1 or NA: No scaling
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = dat_mod)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.0841 -0.7882  0.0089  0.9440  6.1189 
#> 
#> Coefficients:
#>                                       Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                            5.87626    0.51695  11.367  < 2e-16 ***
#> age                                    0.01789    0.02133   0.838  0.40221    
#> gendermale                            -0.26127    0.16579  -1.576  0.11570    
#> emotional_stability                    0.21358    0.08343   2.560  0.01076 *  
#> conscientiousness                      1.20385    0.37062   3.248  0.00124 ** 
#> emotional_stability:conscientiousness -0.33140    0.13273  -2.497  0.01286 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.384 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05
visreg(lm_w_centered, "emotional_stability", "conscientiousness",
            breaks = 2, overlay = TRUE)

The argument for meaning centering is to_center. The variable is specified in the formula form.

Mean Center Conscientiousness and Emotional Stability

This example demonstrates centering more than one variable. In the following model, both emotional stability and conscientiousness are centered. They are placed after ~ and joined by +.

lm_xw_centered <- std_selected(lm_raw, to_center = ~ emotional_stability + conscientiousness)
summary(lm_xw_centered)
#> 
#> Selected variable(s) are centered and/or scaled
#> - Variable(s) centered: emotional_stability conscientiousness
#> - Variable(s) scaled:
#>                     centered_by scaled_by
#> sleep_duration           0.0000         1
#> age                      0.0000         1
#> gender                   0.0000         1
#> emotional_stability      2.7132         1
#> conscientiousness        3.3432         1
#> 
#> Note:
#> - Centered by 0 or NA: No centering
#> - Scaled by 1 or NA: No scaling
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = dat_mod)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.0841 -0.7882  0.0089  0.9440  6.1189 
#> 
#> Coefficients:
#>                                       Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                            6.45575    0.47828  13.498  < 2e-16 ***
#> age                                    0.01789    0.02133   0.838  0.40221    
#> gendermale                            -0.26127    0.16579  -1.576  0.11570    
#> emotional_stability                    0.21358    0.08343   2.560  0.01076 *  
#> conscientiousness                      0.30470    0.10546   2.889  0.00403 ** 
#> emotional_stability:conscientiousness -0.33140    0.13273  -2.497  0.01286 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.384 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05
visreg(lm_xw_centered, "emotional_stability", "conscientiousness",
            breaks = 2, overlay = TRUE)

Standardize Both Conscientiousness and Emotional Stability

To standardize a variablex we first mean center it and then scale it by its standard deviation. Scaling is done by listing the variable on to_scale. The input format is identical to that of to_center.

lm_xw_std <- std_selected(lm_raw,
                to_center = ~ emotional_stability + conscientiousness,
                to_scale  = ~ emotional_stability + conscientiousness)
summary(lm_xw_std)
#> 
#> Selected variable(s) are centered and/or scaled
#> - Variable(s) centered: emotional_stability conscientiousness
#> - Variable(s) scaled: emotional_stability conscientiousness
#>                     centered_by scaled_by
#> sleep_duration           0.0000 1.0000000
#> age                      0.0000 1.0000000
#> gender                   0.0000 1.0000000
#> emotional_stability      2.7132 0.7629613
#> conscientiousness        3.3432 0.6068198
#> 
#> Note:
#> - Centered by 0 or NA: No centering
#> - Scaled by 1 or NA: No scaling
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = dat_mod)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.0841 -0.7882  0.0089  0.9440  6.1189 
#> 
#> Coefficients:
#>                                       Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                            6.45575    0.47828  13.498  < 2e-16 ***
#> age                                    0.01789    0.02133   0.838  0.40221    
#> gendermale                            -0.26127    0.16579  -1.576  0.11570    
#> emotional_stability                    0.16295    0.06365   2.560  0.01076 *  
#> conscientiousness                      0.18490    0.06399   2.889  0.00403 ** 
#> emotional_stability:conscientiousness -0.15343    0.06145  -2.497  0.01286 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.384 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05
visreg(lm_xw_std, "emotional_stability", "conscientiousness",
            breaks = 2, overlay = TRUE)

Standardize Conscientiousness, Emotional Stability, and Sleep Duration

Note that we can also mean center or standardize the dependent variable. We just add the variable to the right hand side of ~ in to_center and to_scale as appropriate.

lm_xwy_std <- std_selected(lm_raw,
                to_center = ~ emotional_stability + conscientiousness + sleep_duration,
                to_scale  = ~ emotional_stability + conscientiousness + sleep_duration)
summary(lm_xwy_std)
#> 
#> Selected variable(s) are centered and/or scaled
#> - Variable(s) centered: emotional_stability conscientiousness sleep_duration
#> - Variable(s) scaled: emotional_stability conscientiousness sleep_duration
#>                     centered_by scaled_by
#> sleep_duration         6.776333 1.4168291
#> age                    0.000000 1.0000000
#> gender                 0.000000 1.0000000
#> emotional_stability    2.713200 0.7629613
#> conscientiousness      3.343200 0.6068198
#> 
#> Note:
#> - Centered by 0 or NA: No centering
#> - Scaled by 1 or NA: No scaling
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = dat_mod)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.2941 -0.5563  0.0063  0.6663  4.3187 
#> 
#> Coefficients:
#>                                       Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)                           -0.22627    0.33757  -0.670  0.50298   
#> age                                    0.01262    0.01506   0.838  0.40221   
#> gendermale                            -0.18440    0.11702  -1.576  0.11570   
#> emotional_stability                    0.11501    0.04493   2.560  0.01076 * 
#> conscientiousness                      0.13050    0.04517   2.889  0.00403 **
#> emotional_stability:conscientiousness -0.10829    0.04337  -2.497  0.01286 * 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.9771 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05
visreg(lm_xwy_std, "emotional_stability", "conscientiousness",
            breaks = 2, overlay = TRUE)

Standardize All Variables

If we want to standardize all variables, we can use ~ . as a shortcut. Note that std_selected() will skip categorical variables (i.e., factors or string variables in the regression model of lm()).

lm_all_std <- std_selected(lm_raw, to_center = ~ .,
                                   to_scale  = ~ .)
summary(lm_all_std)
#> 
#> Selected variable(s) are centered and/or scaled
#> - Variable(s) centered: sleep_duration age gender emotional_stability conscientiousness
#> - Variable(s) scaled: sleep_duration age gender emotional_stability conscientiousness
#>                     centered_by scaled_by
#> sleep_duration         6.776333 1.4168291
#> age                   22.274000 2.9407857
#> gender                       NA        NA
#> emotional_stability    2.713200 0.7629613
#> conscientiousness      3.343200 0.6068198
#> 
#> Note:
#> - Centered by 0 or NA: No centering
#> - Scaled by 1 or NA: No scaling
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = dat_mod)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.2941 -0.5563  0.0063  0.6663  4.3187 
#> 
#> Coefficients:
#>                                       Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)                            0.05492    0.04883   1.125  0.26124   
#> age                                    0.03712    0.04428   0.838  0.40221   
#> gendermale                            -0.18440    0.11702  -1.576  0.11570   
#> emotional_stability                    0.11501    0.04493   2.560  0.01076 * 
#> conscientiousness                      0.13050    0.04517   2.889  0.00403 **
#> emotional_stability:conscientiousness -0.10829    0.04337  -2.497  0.01286 * 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.9771 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05
visreg(lm_all_std, "emotional_stability", "conscientiousness",
            breaks = 2, overlay = TRUE)

The Usual Standardized Solution

For comparison, this is the results of standardizing all variables, including the product term and the categorical variable.

lm_usual_std <- lm.beta(lm_raw)
summary(lm_usual_std)
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = sleep_emo_con)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6.0841 -0.7882  0.0089  0.9440  6.1189 
#> 
#> Coefficients:
#>                                       Estimate Standardized Std. Error t value Pr(>|t|)   
#> (Intercept)                            1.85154      0.00000    1.35224   1.369  0.17155   
#> age                                    0.01789      0.03712    0.02133   0.838  0.40221   
#> gendermale                            -0.26127     -0.06934    0.16579  -1.576  0.11570   
#> emotional_stability                    1.32151      0.71163    0.45039   2.934  0.00350 **
#> conscientiousness                      1.20385      0.51560    0.37062   3.248  0.00124 **
#> emotional_stability:conscientiousness -0.33140     -0.78201    0.13273  -2.497  0.01286 * 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.384 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05

Standardize Conscientiousness, Emotional Stability, and Sleep_Duration: With Nonparametric Bootstrapping Confidence Interval

It has been shown (e.g., Yuan & Chan, 2011) that the standard errors of standardized regression coefficients computed just by rescaling the variables are biased, and consequently the confidence intervals are also invalid. The function std_selected_boot() is a version of std_selected() that also report the confidence interval of the regression coefficients when rescaling is conducted.

We use the same example above that standardizes emotional stability, conscientiousness, and sleep duration, to illustrate this function. The argument nboot specifies the number of nonparametric bootstrap samples. Currently, only the 95% confidence intervals will be reported.

lm_xwy_std_ci <- std_selected_boot(lm_raw, 
        to_center = ~ emotional_stability + conscientiousness + sleep_duration,
        to_scale  = ~ emotional_stability + conscientiousness + sleep_duration,
        nboot = 2000)
summary(lm_xwy_std_ci)
#> 
#> Selected variable(s) are centered and/or scaled
#> - Variable(s) centered: emotional_stability conscientiousness sleep_duration
#> - Variable(s) scaled: emotional_stability conscientiousness sleep_duration
#>                     centered_by scaled_by
#> sleep_duration         6.776333 1.4168291
#> age                    0.000000 1.0000000
#> gender                 0.000000 1.0000000
#> emotional_stability    2.713200 0.7629613
#> conscientiousness      3.343200 0.6068198
#> 
#> Note:
#> - Centered by 0 or NA: No centering
#> - Scaled by 1 or NA: No scaling
#> - Nonparametric bootstrapping 95% confidence intervals computed.
#> - The number of bootstrap samples is 2000
#> 
#> Call:
#> lm(formula = sleep_duration ~ age + gender + emotional_stability * 
#>     conscientiousness, data = dat_mod)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.2941 -0.5563  0.0063  0.6663  4.3187 
#> 
#> Coefficients:
#>                                        Estimate  CI Lower  CI Upper Std. Error t value Pr(>|t|)   
#> (Intercept)                           -0.226270 -0.831491  0.350298   0.337569  -0.670  0.50298   
#> age                                    0.012624 -0.012787  0.039725   0.015057   0.838  0.40221   
#> gendermale                            -0.184402 -0.448399  0.072333   0.117016  -1.576  0.11570   
#> emotional_stability                    0.115014  0.025582  0.200143   0.044927   2.560  0.01076 * 
#> conscientiousness                      0.130502  0.028861  0.232320   0.045167   2.889  0.00403 **
#> emotional_stability:conscientiousness -0.108292 -0.200473 -0.007734   0.043374  -2.497  0.01286 * 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.9771 on 494 degrees of freedom
#> Multiple R-squared:  0.0548, Adjusted R-squared:  0.04523 
#> F-statistic: 5.728 on 5 and 494 DF,  p-value: 3.768e-05
#> 
#> Note:
#> - [CI Lower, CI Upper] are bootstrap percentile confidence intervals.
#> - Std. Error are standard errors in the original analysis, not bootstrap SEs.

The standardized moderation effect is -0.1083 , and the 95% nonparametric bootstrapping confidence interval is -0.2005 to -0.0077.

Note: As a side product, the nonparametric Bootstrapping confidence of the other coefficients are also reported. They can be used for other variables that are standardized in the same model, whether they are involved in the moderation or not.

Reference

Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized regression coefficients. Psychometrika, 76(4), 670–690. https://doi.org/10.1007/s11336-011-9224-6