ONEST

Gang Han, Baihong Guo

2021-07-26

library(ONEST)

1 General Information

The Observers Needed to Evaluate Subjective Tests software implements a statistical method in Reisenbichler et al. (20201), to determine the minimum number of evaluators needed to estimate agreement involving a large number of raters. This method could be utilized by regulatory agencies, such as the FDA, when evaluating agreement levels of a newly proposed subjective laboratory test. Input to the program should be binary(1/0) pathology data, where “0” may stand for negative and “1” for positive. The example datasets in this software are from Rimm et al. (20172) (the SP142 assay), and Reisenbichler et al. 2020. This program can run in R version 3.5.0 and above.

2 Model and Inference

We briefly introduce the statistical model and inference implemented by this program. Let p* denote the proportion of concordant (i.e., identical) reads among a group of raters, and the group size can be two or more. We let “p+” denote the proportion of tissue cases that will always be evaluated positive by all the raters, and “p-” a proportion that will always be evaluated negative. Among the proportion of “1-p+-p-” cases that could be rated either positive or negative, each case has the probability “p” of being rated positive from any pathologist. Then the proportion of consistent reads among k pathologists can be written as p*(k) = p++p-+(1-p+-p-)[pk+(1-p)k].

Let “I” denote the minimal sufficient number of pathologists in the sense that “I” is the minimum integer value to satisfy p* (i) - p*(i+1) < pᵟ with a large probability (e.g., 95%), where pᵟ is a threshold of the change in the percentage agreement due to including one additional pathologist. Let pc = p+ + p-.

The statistical inference is based on the joint likelihood function of parameters p+, p-, and p. For n cases and k pathologists, we have the data {yij; i=1,…,n, j=1,…,k}. Each observation yij is binary, where yij =1 if the read is positive and yij =0 if the read is negative. The probabilities of yij=1 and yij=0 can be written as P(yij=1) = p++ p(1-p+-p-) and P(yij=0) = p-+ (1-p)(1-p+-p-), respectively. We assume all {yij} are independently and identically distributed. The likelihood function can be written as L(p, p+, p-|{yij}) = [p++ p(1-p+-p-)]T [p-+ (1-p)(1-p+-p-)]nk-T, where T is the total number of reading equal to 1 among all “nk” reads. With k pathologists, we let nc denote the number of consistent reads among n cases, so nc ~ Bin(n, pc). Similarly, we have n+ ~ Bin(n, p+) and n- ~ Bin(n, p-), where n+ and n- denote the numbers of cases that all pathologists read positive and negative, respectively.

Based on the binomial maximum likelihood estimation, the estimates are p+ = n+/n, p- = n-/n, p++ p(1-p+-p-) = T/(nk), and p = [T/(nk) - p+]/(1-p+-p-). We then estimate p* by plugging the estimates of {pc , p} into the equation p* (k) = pc +(1-p+-p-)[pk+(1-p)k]. We define the objective function as D(i) = p* (i) - p*(i+1)=(1-p+-p-)[pi(1-p)+ p(1-p)i]. The estimate of “p” depends on the product of n and k, and the estimate of pc is nc/n. We use 95% as the probability threshold. Based on the central limit theorem, the asymptotic 95% lower bound of pc is: nc/n-1.645[nc(n-nc)/n3]1/2. By plugging in this lower bound of pc we can compute the upper bound of D(i) with 95% confidence level. If the upper bound of D(i) is less than pᵟ. We conclude “i” is the sufficient number of pathologists.

3 Inputs and Outputs

3.1 Inputs

This software has one driver file ONEST_main. Input to ONEST_main include

3.2 Outputs

Meanings of the output values are listed below.

All the outputs were saved in the following structure.

4 Example with dataset sp142_bin

The dataset “sp142_bin” is a pathology dataset of triple negative breast cancer in Reisenbichler et al. (2020) in a 68 by 18 matrix. An element in position (i, j) having value of 0 means negative for the i-th case, j-th rater, and a value of 1 means a positive evaluation.

Details about other datasets in the package can be found in the reference manual.

4.1 Load data

library(ONEST)
data("sp142_bin")

4.2 Plot the data and get the outputs

The following code is equivalent to ONEST_main(sp142_bin) and can only be applied to the example dataset sp142_bin to decrease the time to build the vignettes. Please use the ONEST_main function instead in practice.

# figure(1): Plot of the agreement percentage in the order of columns in the inputs;
# figure(2): Plot of the 100 randomly chosen permutations;
# figure(3): Plot of the empirical confidence interval;
# figure(4): Barchart: the x axis is the case number and the Y axis is the number of pathologists that called that case positive, sorted from lowest to highest on the y axis;
# figure(5): Plot of the proportion of identical reads among a set of pathologists;
# figure(6): Plot of the difference between the proportion of identical reads among a set of pathologists;

# ONEST_main(sp142_bin)
data('empirical')
ONEST_vignettes(sp142_bin,empirical)

#> $consistency
#>       consist_p consist_low
#>  [1,] 0.6911795   0.6427088
#>  [2,] 0.5367693   0.4640632
#>  [3,] 0.4595634   0.3747395
#>  [4,] 0.4209597   0.3300768
#>  [5,] 0.4016573   0.3077448
#>  [6,] 0.3920057   0.2965783
#>  [7,] 0.3871797   0.2909948
#>  [8,] 0.3847665   0.2882029
#>  [9,] 0.3835598   0.2868068
#> [10,] 0.3829564   0.2861087
#> [11,] 0.3826547   0.2857597
#> [12,] 0.3825039   0.2855851
#> [13,] 0.3824284   0.2854978
#> [14,] 0.3823907   0.2854542
#> [15,] 0.3823718   0.2854324
#> [16,] 0.3823624   0.2854214
#> [17,] 0.3823577   0.2854160
#> 
#> $difference
#>        diff_consist    diff_high
#>  [1,] -1.544102e-01 1.786456e-01
#>  [2,] -7.720588e-02 8.932368e-02
#>  [3,] -3.860371e-02 4.466273e-02
#>  [4,] -1.930243e-02 2.233203e-02
#>  [5,] -9.651598e-03 1.116646e-02
#>  [6,] -4.826038e-03 5.583506e-03
#>  [7,] -2.413163e-03 2.791919e-03
#>  [8,] -1.206665e-03 1.396057e-03
#>  [9,] -6.033806e-04 6.980838e-04
#> [10,] -3.017172e-04 3.490731e-04
#> [11,] -1.508736e-04 1.745539e-04
#> [12,] -7.544503e-05 8.728646e-05
#> [13,] -3.772701e-05 4.364843e-05
#> [14,] -1.886594e-05 2.182703e-05
#> [15,] -9.434279e-06 1.091503e-05
#> [16,] -4.717841e-06 5.458327e-06
#> 
#> $estimates
#>      size_case size_rater         p    p_plus   p_minus
#> [1,]        68         18 0.4984245 0.2794118 0.1029412
#> 
#> $empirical
#>       lower_bound      mean upper_bound
#>  [1,]   0.6029412 0.7898235   0.9264706
#>  [2,]   0.5294118 0.6951176   0.8529412
#>  [3,]   0.4558824 0.6306912   0.7941176
#>  [4,]   0.4264706 0.5833529   0.7352941
#>  [5,]   0.3970588 0.5447941   0.6911765
#>  [6,]   0.3823529 0.5124412   0.6617647
#>  [7,]   0.3676471 0.4878088   0.6176471
#>  [8,]   0.3676471 0.4642941   0.5882353
#>  [9,]   0.3529412 0.4468235   0.5735294
#> [10,]   0.3529412 0.4298824   0.5441176
#> [11,]   0.3529412 0.4145588   0.5147059
#> [12,]   0.3529412 0.4013088   0.5000000
#> [13,]   0.3529412 0.3902059   0.4852941
#> [14,]   0.3529412 0.3786765   0.4705882
#> [15,]   0.3529412 0.3684853   0.4558824
#> [16,]   0.3529412 0.3608382   0.4411765
#> [17,]   0.3529412 0.3529412   0.3529412

4.3 The ONEST score test

A small p-value from this score test indicates significant evidence that the observers’ agreement will converge to a non-zero proportion.

data("sp142_bin")
ONEST_inflation_test(sp142_bin)
#> p_value 
#>       0

4.4 Code to run other examples

# (1) With example dataset sp263_bin:
# data("sp263_bin") ONEST_main(sp263_bin) ONEST_inflation_test(sp263_bin)

# (2) With example dataset NCNN_sp142:
# data("NCCN_sp142") ONEST_main(NCCN_sp142) ONEST_inflation_test(NCCN_sp142)

# (3) With example dataset NCNN_sp142_t:
# data("NCCN_sp142_t") ONEST_main(NCCN_sp142_t) ONEST_inflation_test(NCCN_sp142_t)

# (4) With example dataset NCCN_22c3_t:
# data("NCCN_22c3_t") ONEST_main(NCCN_22c3_t) ONEST_inflation_test(NCCN_22c3_t)

References


  1. Reisenbichler, E. S., Han, G., Bellizzi, A., Bossuyt, V., Brock, J., Cole, K., Fadare, O., Hameed, O., Hanley, K., Harrison, B. T., Kuba, M. G., Ly, A., Miller, D., Podoll, M., Roden, A. C., Singh, K., Sanders, M. A., Wei, S., Wen, H., Pelekanou, V., Yaghoobi, V., Ahmed, F., Pusztai, L., and Rimm, D. L. (2020) “Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer,” Mod Pathol, DOI: 10.1038/s41379-020-0544-x; PMID: 32300181.↩︎

  2. Rimm, D. L., Han, G., Taube, J. M., Yi, E. S., Bridge, J. A., Flieder, D. B., Homer, R., West, W. W., Wu, H., Roden, A. C., Fujimoto, J., Yu, H., Anders, R., Kowalewski, A., Rivard, C., Rehman, J., Batenchuk, C., Burns, V., Hirsch, F. R., and Wistuba,, II (2017) “A Prospective, Multi-institutional, Pathologist-Based Assessment of 4 Immunohistochemistry Assays for PD-L1 Expression in Non-Small Cell Lung Cancer,” JAMA Oncol, 3(8), 1051-1058, DOI: 10.1001/jamaoncol.2017.0013, PMID: 28278348.↩︎