Univariate Statistics Analysis

Introduction

The univar function is called to produce univariate-type summary statistics for numeric variables. A typical example of using the univar function is to create a tbl chunk as shown below for summarizing N, MEAN (SD), MEDIAN, RANGE, IQ Range for the Age variable in adsl.

tbl <- cdisc_adsl %>% 
  univar(colvar = "TRT01PN", 
         rowvar = "AGE", 
         statlist = statlist(c("N", "MEANSD", "MEDIAN", "RANGE", "IQRANGE")),
         decimal = 0,
         row_header = "Age (Years)")

knitr::kable(tbl)

label	0	54	81	row_type
Age (Years)				HEADER
N	5	5	5	N
Mean (SD)	69.6 (14.40)	75.6 (6.73)	72.2 (9.23)	VALUE
Median	64.0	74.0	75.0	VALUE
Range	(52; 85)	(68; 84)	(57; 81)	VALUE
IQ range	(63.0; 84.0)	(71.0; 81.0)	(71.0; 77.0)	VALUE

Customizing Univariate Statistics

Besides the 5 standard univariate statistics shown above that are often required in the demographic tables, you can pick any univariate statistics from the table below and arrange them in a character vector for passing to the statlist argument.

Statlist	Description
N	number of non-missing values
SUM	sum
MEAN	mean
GeoMEAN	geometric mean
SD	standard deviation
SE	standard error
CV	coefficient of variation
GSD	geometric standard deviation
GSE	geometric standard error
MEANSD	mean (standard deviation)
MEANSE	mean (standard error)
MEDIAN	median
MIN	minimum
MAX	maximum
RANGE	range
Q1	1st quartile
Q3	3rd quartile
IQRANGE	inter-quartile range
MEDRANGE	median (range)
MEDIQRANGE	median (inter-quartile range)
MEAN_CI	mean (95% C.I.)
GeoMEAN_CI	geometric mean (95% C.I.)

A customized example is shown below for displaying N, Mean (95% C.I.), and Geometric Mean (95% C.I.) for the Age variable in adsl.

tbl <- cdisc_adsl %>% 
  univar(colvar = "TRT01PN", 
         rowvar = "AGE", 
         statlist = statlist(c("N", "MEAN_CI", "GeoMEAN_CI")),
         decimal = 0,
         row_header = "Age (Years)")

knitr::kable(tbl)

label	0	54	81	row_type
Age (Years)				HEADER
N	5	5	5	N
Mean (95% C.I.)	69.6 (51.72; 87.48)	75.6 (67.24; 83.96)	72.2 (60.74; 83.66)	VALUE
Geometric Mean (95% C.I.)	68.4 (52.72; 88.73)	75.4 (67.51; 84.13)	71.7 (60.50; 84.94)	VALUE

Decimal Precision

The decimal precision to be used in display of univariate statistics is comprised of two pieces. The base decimal precision is what controls the base number of decimals to be used, this can be set using the decimal argument. The precision extra is what controls the difference between the precision used for different statistics, this is controlled using the option tidytlg.precision.extra. The precision extra is the amount precision will need to be adjusted from the base precision for each different statistic. The default of the precision extra is set by following our table and listing conventions: Range has a precision extra of 0, Mean and Median have a precision extra of 1, SD has a precision extra of 2. To see a full list of precision extra defaults, please type options("tidytlg.precision.extra") in your console. An example function call of univar is shown below for presenting the data using a base decimal value of 2.

tbl <- cdisc_adsl %>% 
  univar(colvar = "TRT01PN", 
         rowvar = "BMIBL", 
         decimal = 2,
         row_header = "Age (Years)")

knitr::kable(tbl)

label	0	54	81	row_type
Age (Years)				HEADER
N	5	5	5	N
Mean (SD)	27.080 (3.6424)	27.180 (3.4419)	27.760 (2.4795)	VALUE
Median	27.600	27.300	28.100	VALUE
Range	(21.90; 30.40)	(23.90; 32.00)	(24.90; 31.40)	VALUE
IQ range	(25.100; 30.400)	(23.900; 28.800)	(26.100; 28.300)	VALUE

Data Driven Precision

While static precision is useful in some cases, data driven precision is also available. This is controlled using the precisionby, precisionon, and decimal arguments. precisionby tells the function the variable(s) the user would like to compute the precision using. This could be variables such as PARAMCD if the precision is to be varied between parameter. precisionon is the variable that should be used when calculating how many base decimal places are present in the data. The last piece to data drive precision is the decimal argument which gives us a cap for base precision values. This can be used to help avoid unnecessarily long decimals in your final output.

A customized example is shown below for presenting the univariate summary of vital signs data using PARAMCD as the by variable. In addition, we would like the precision to be data driven and varied by parameter, which can be achieved by setting precisionby = "PARAMCD".

tbl <- cdisc_advs %>% 
  univar(colvar = "TRTAN", 
         rowvar = "AVAL", 
         rowbyvar = "PARAMCD",
         precisionby = "PARAMCD",
         decimal = 4)

knitr::kable(tbl)

PARAMCD	label	0	54	81	row_type
DIABP	DIABP				BY_HEADER1
DIABP	N	186	147	204	N
DIABP	Mean (SD)	71.9 (9.75)	71.6 (7.12)	68.8 (10.34)	VALUE
DIABP	Median	71.5	72.0	69.0	VALUE
DIABP	Range	(50; 92)	(50; 87)	(43; 101)	VALUE
DIABP	IQ range	(65.0; 78.0)	(68.0; 77.0)	(60.0; 76.0)	VALUE
HEIGHT	HEIGHT				BY_HEADER1
HEIGHT	N	5	5	5	N
HEIGHT	Mean (SD)	161.696 (14.0567)	172.364 (9.1494)	163.576 (13.0260)	VALUE
HEIGHT	Median	162.560	175.260	162.560	VALUE
HEIGHT	Range	(147.32; 180.34)	(158.24; 181.61)	(147.32; 177.80)	VALUE
HEIGHT	IQ range	(148.590; 169.670)	(168.910; 177.800)	(154.940; 175.260)	VALUE
PULSE	PULSE				BY_HEADER1
PULSE	N	186	147	204	N
PULSE	Mean (SD)	69.4 (9.15)	64.9 (10.18)	70.5 (9.87)	VALUE
PULSE	Median	70.0	64.0	70.0	VALUE
PULSE	Range	(52; 94)	(50; 98)	(50; 97)	VALUE
PULSE	IQ range	(61.0; 76.0)	(58.0; 70.0)	(62.0; 76.5)	VALUE
SYSBP	SYSBP				BY_HEADER1
SYSBP	N	186	147	204	N
SYSBP	Mean (SD)	132.0 (12.02)	127.5 (12.58)	135.8 (23.65)	VALUE
SYSBP	Median	131.0	130.0	132.5	VALUE
SYSBP	Range	(100; 167)	(95; 151)	(95; 198)	VALUE
SYSBP	IQ range	(123.0; 138.0)	(122.0; 137.0)	(116.0; 150.5)	VALUE
TEMP	TEMP				BY_HEADER1
TEMP	N	61	49	68	N
TEMP	Mean (SD)	36.481 (0.3491)	36.537 (0.4374)	36.660 (0.2837)	VALUE
TEMP	Median	36.440	36.560	36.585	VALUE
TEMP	Range	(35.61; 37.67)	(34.28; 37.17)	(35.89; 37.33)	VALUE
TEMP	IQ range	(36.220; 36.720)	(36.390; 36.720)	(36.470; 36.915)	VALUE
WEIGHT	WEIGHT				BY_HEADER1
WEIGHT	N	47	33	54	N
WEIGHT	Mean (SD)	69.170 (10.1753)	82.795 (11.2792)	78.472 (15.5886)	VALUE
WEIGHT	Median	71.670	79.380	75.070	VALUE
WEIGHT	Range	(53.07; 80.74)	(59.88; 102.51)	(53.75; 99.79)	VALUE
WEIGHT	IQ range	(54.430; 78.470)	(78.470; 88.450)	(63.960; 90.720)	VALUE

While data driven precision is usually done with a by variable it doesn’t always have to. The precisionon argument can be used to calculate data driven precision on a single variable. This might be useful if a table template is going to be used multiple times or if multiple parts of the table are using a similar call but need to have different data driven precision. The following example uses the variable CHG to calculate precision, similar to the above example we still use decimal = 4 to cap our decimal spaces at 4.

tbl <- cdisc_advs %>% 
  filter(PARAMCD == "SYSBP") %>% 
  univar(colvar = "TRTAN", 
         rowvar = "CHG", 
         precisionon = "CHG",
         decimal = 4)

knitr::kable(tbl)

label	0	54	81	row_type
N	186	147	204	N
Mean (SD)	0.3 (12.49)	1.5 (9.87)	-5.5 (11.86)	VALUE
Median	0.0	1.0	-7.0	VALUE
Range	(-44; 33)	(-31; 24)	(-32; 30)	VALUE
IQ range	(-7.0; 8.0)	(-6.0; 7.0)	(-14.0; 0.0)	VALUE

Another use case for the precisionon argument could be if you need to calculate the summary on one variable but use another for precision for table output formatting. The following example uses both precisionby and precisionon to show how they can be used together to make special tables. For this table, we are creating an element of the table that summarizes AVAL but uses CHG to calculate precision. This allows us to have consistent formatting throughout the table even though the two variables may have different precision. We also calculate precision by PARAMCD since the output table will be presented using that as a by variable.

tbl <- cdisc_advs %>% 
  filter(PARAMCD == "SYSBP") %>% 
  univar(colvar = "TRTAN", 
         rowvar = "AVAL", 
         rowbyvar = "PARAMCD",
         precisionby = "PARAMCD",
         precisionon = "CHG",
         decimal = 4)

knitr::kable(tbl)

PARAMCD	label	0	54	81	row_type
SYSBP	SYSBP				BY_HEADER1
SYSBP	N	186	147	204	N
SYSBP	Mean (SD)	132.0 (12.02)	127.5 (12.58)	135.8 (23.65)	VALUE
SYSBP	Median	131.0	130.0	132.5	VALUE
SYSBP	Range	(100; 167)	(95; 151)	(95; 198)	VALUE
SYSBP	IQ range	(123.0; 138.0)	(122.0; 137.0)	(116.0; 150.5)	VALUE