Weighted Statistics With table1

Introduction

Weighted descriptive statistics are required in some contexts, for instance, in the analysis of survey data. This can be accomplished by using the provided weighted wrapper class. Internally, this will cause the functions wtd.mean, wtd.var, wtd.quantile and wtd.table from the Hmisc package (which is optional in general but required to use this functionality) to be used in place of their standard non-weighted counterparts.

Example

We take an example from the survey package (note that this is for illustration purposes only, it is not meant to be a real application):

library(survey, quietly=TRUE)

## 
## Attaching package: 'survival'

## The following object is masked from 'package:boot':
## 
##     aml

## 
## Attaching package: 'survey'

## The following object is masked from 'package:graphics':
## 
##     dotchart

data(myco)

myco$Leprosy <- factor(myco$leprosy, levels=1:0, labels=c("Leprosy Cases", "Controls"))

myco$AgeCat <- factor(myco$Age,
    levels=c(7.5,      12.5,       17.5,       22.5,       27.5,       32.5      ),
    labels=c("5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34")
)

myco$ScarL <- as.logical(myco$Scar)

label(myco$Age) <- "Age"
units(myco$Age) <- "years"
label(myco$AgeCat) <- "Age Group"
label(myco$ScarL) <- "BCG vaccination scar"

table1(~ ScarL + Age + AgeCat | Leprosy, data=weighted(myco, wt), big.mark=",")

	Leprosy Cases (N=258)	Controls (N=61,310)	Overall (N=61,568)
BCG vaccination scar
Yes	100 (38.8%)	34,847 (56.8%)	34,947 (56.8%)
No	158 (61.2%)	26,463 (43.2%)	26,621 (43.2%)
Age (years)
Mean (SD)	21.2 (8.32)	16.8 (8.36)	16.8 (8.37)
Median [Min, Max]	22.5 [7.50, 32.5]	17.5 [7.50, 32.5]	17.5 [7.50, 32.5]
Age Group
5 to 9	25 (9.7%)	17,327 (28.3%)	17,352 (28.2%)
10 to 14	50 (19.4%)	13,172 (21.5%)	13,222 (21.5%)
15 to 19	44 (17.1%)	10,325 (16.8%)	10,369 (16.8%)
20 to 24	39 (15.1%)	8,026 (13.1%)	8,065 (13.1%)
25 to 29	47 (18.2%)	5,981 (9.8%)	6,028 (9.8%)
30 to 34	53 (20.5%)	6,479 (10.6%)	6,532 (10.6%)

It also works in “transpose” mode:

table1(~ Age + ScarL | Leprosy, data=weighted(myco, wt), transpose=T, big.mark=",")

	Age (years)	BCG vaccination scar
Leprosy Cases (N=258)	Mean (SD): 21.2 (8.32) Median [Min, Max]: 22.5 [7.50, 32.5]	Yes: 100 (38.8%) No: 158 (61.2%)
Controls (N=61,310)	Mean (SD): 16.8 (8.36) Median [Min, Max]: 17.5 [7.50, 32.5]	Yes: 34,847 (56.8%) No: 26,463 (43.2%)
Overall (N=61,568)	Mean (SD): 16.8 (8.37) Median [Min, Max]: 17.5 [7.50, 32.5]	Yes: 34,947 (56.8%) No: 26,621 (43.2%)

For more flexibility, we may not want the weighting to be applied globally, but only to some of the variables. We can do this as well, by using weighted on individual variables:

table1(~ weighted(ScarL, wt) + Age + AgeCat | Leprosy, data=myco, big.mark=",")

	Leprosy Cases (N=258)	Controls (N=258)	Overall (N=516)
BCG vaccination scar
Yes	100 (38.8%)	34,847 (56.8%)	34,947 (56.8%)
No	158 (61.2%)	26,463 (43.2%)	26,621 (43.2%)
Age (years)
Mean (SD)	21.2 (8.32)	21.2 (8.32)	21.2 (8.31)
Median [Min, Max]	22.5 [7.50, 32.5]	22.5 [7.50, 32.5]	22.5 [7.50, 32.5]
Age Group
5 to 9	25 (9.7%)	25 (9.7%)	50 (9.7%)
10 to 14	50 (19.4%)	50 (19.4%)	100 (19.4%)
15 to 19	44 (17.1%)	44 (17.1%)	88 (17.1%)
20 to 24	39 (15.1%)	39 (15.1%)	78 (15.1%)
25 to 29	47 (18.2%)	47 (18.2%)	94 (18.2%)
30 to 34	53 (20.5%)	53 (20.5%)	106 (20.5%)

This implementation allows for simple weighted statistics, but does not currently support more complex designs from the survey package like stratified sampling or cluster sampling.

The `weighted` and `indexed` classes

The weighted class is just a wrapper around a vector or data.frame that adds a vector of weights as an attribute. These weights are carried along or subsetted appropriately during operations like slicing or subsetting. See ?weighted for some examples.

The indexed class is similar, but it simply maintains the indices of a vector (row indices for a data.frame) when a subset or slide is taken. This leads to some interesting possibilities when we want to do more complex things.

The following example also comes from the survey package:

data(api)

dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99+api00, ~stype, dclus1, svymean)

##   stype    api99    api00 se.api99 se.api00
## E     E 607.7917 648.8681 22.81660 22.36241
## H     H 595.7143 618.5714 41.76400 38.02025
## M     M 608.6000 631.4400 32.56064 31.60947

svytable(~sch.wide+stype, dclus1)

##         stype
## sch.wide         E         H         M
##      No   406.1640  101.5410  270.7760
##      Yes 4467.8035  372.3170  575.3989

Using table1, the same results can be presented more beautifully:

myrender <- function(x, name, ...) {
    if (is.numeric(x)) {
        r <- svymean(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x)))
        r <- c(Mean=as.numeric(r), SE=sqrt(attr(r, "var", exact=T)))
        r <- unlist(stats.apply.rounding(as.list(r), big.mark=","))
    } else {
        r <- svytable(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x)))
        r <- unlist(stats.apply.rounding(as.list(r), big.mark=",", digits=1, rounding.fn=round_pad))
    }
    c("", r)
}

apiclus1$stype2 <- factor(apiclus1$stype, levels=c("E", "M", "H"),
    labels=c("Elementary", "Middle School", "High School"))

label(apiclus1$api99)    <- "API in 1999"
label(apiclus1$api00)    <- "API in 2000"
label(apiclus1$sch.wide) <- "Met school-wide growth target?"

table1(~ api99 + api00 + sch.wide | stype2, indexed(apiclus1), render=myrender,
    render.strat=names)

	Elementary	Middle School	High School	Overall
API in 1999
Mean	608	609	596	607
SE	22.8	32.6	41.8	24.2
API in 2000
Mean	649	631	619	644
SE	22.4	31.6	38.0	23.5
Met school-wide growth target?
No	406.2	270.8	101.5	778.5
Yes	4,467.8	575.4	372.3	5,415.5

Weighted Statistics With table1

Benjamin Rich

2025-09-19

Introduction

Example

The `weighted` and `indexed` classes

Weighted Statistics With table1

Benjamin Rich

2025-09-19

Introduction

Example

The weighted and indexed classes

The `weighted` and `indexed` classes