--- title: "Weighted Statistics With table1" author: "Benjamin Rich" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: css: [style.css, vignette.css] toc: true vignette: > %\VignetteIndexEntry{Weighted Statistics With table1} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{boot,MatchIt} %\VignetteEncoding{UTF-8} --- ```{r echo=FALSE, message=FALSE, warning=FALSE, results='hide'} library(table1, quietly=TRUE) try(detach('package:printr', unload = TRUE), silent=TRUE) # Make sure printr is not loaded ``` ## Introduction Weighted descriptive statistics are required in some contexts, for instance, in the analysis of survey data. This can be accomplished by using the provided `weighted` wrapper class. Internally, this will cause the functions `wtd.mean`, `wtd.var`, `wtd.quantile` and `wtd.table` from the `Hmisc` package (which is optional in general but required to use this functionality) to be used in place of their standard non-weighted counterparts. ## Example We take an example from the `survey` package (note that this is for illustration purposes only, it is not meant to be a real application): ```{r} library(survey, quietly=TRUE) data(myco) myco$Leprosy <- factor(myco$leprosy, levels=1:0, labels=c("Leprosy Cases", "Controls")) myco$AgeCat <- factor(myco$Age, levels=c(7.5, 12.5, 17.5, 22.5, 27.5, 32.5 ), labels=c("5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34") ) myco$ScarL <- as.logical(myco$Scar) label(myco$Age) <- "Age" units(myco$Age) <- "years" label(myco$AgeCat) <- "Age Group" label(myco$ScarL) <- "BCG vaccination scar" table1(~ ScarL + Age + AgeCat | Leprosy, data=weighted(myco, wt), big.mark=",") ``` It also works in "transpose" mode: ```{r} table1(~ Age + ScarL | Leprosy, data=weighted(myco, wt), transpose=T, big.mark=",") ``` For more flexibility, we may not want the weighting to be applied globally, but only to some of the variables. We can do this as well, by using `weighted` on individual variables: ```{r} table1(~ weighted(ScarL, wt) + Age + AgeCat | Leprosy, data=myco, big.mark=",") ``` This implementation allows for simple weighted statistics, but does not currently support more complex designs from the `survey` package like stratified sampling or cluster sampling. ## The `weighted` and `indexed` classes The `weighted` class is just a wrapper around a vector or `data.frame` that adds a vector of weights as an attribute. These weights are carried along or subsetted appropriately during operations like slicing or subsetting. See `?weighted` for some examples. The `indexed` class is similar, but it simply maintains the indices of a vector (row indices for a `data.frame`) when a subset or slide is taken. This leads to some interesting possibilities when we want to do more complex things. The following example also comes from the `survey` package: ```{r} data(api) dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) svyby(~api99+api00, ~stype, dclus1, svymean) svytable(~sch.wide+stype, dclus1) ``` Using `table1`, the same results can be presented more beautifully: ```{r} myrender <- function(x, name, ...) { if (is.numeric(x)) { r <- svymean(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x))) r <- c(Mean=as.numeric(r), SE=sqrt(attr(r, "var", exact=T))) r <- unlist(stats.apply.rounding(as.list(r), big.mark=",")) } else { r <- svytable(as.formula(paste0("~", name)), subset(dclus1, (1:nrow(dclus1)) %in% indices(x))) r <- unlist(stats.apply.rounding(as.list(r), big.mark=",", digits=1, rounding.fn=round_pad)) } c("", r) } apiclus1$stype2 <- factor(apiclus1$stype, levels=c("E", "M", "H"), labels=c("Elementary", "Middle School", "High School")) label(apiclus1$api99) <- "API in 1999" label(apiclus1$api00) <- "API in 2000" label(apiclus1$sch.wide) <- "Met school-wide growth target?" table1(~ api99 + api00 + sch.wide | stype2, indexed(apiclus1), render=myrender, render.strat=names) ```