The author assume a lot of users may be radiologists or statisticians who want to compare modalities such as MRI, CT, PET,…etc.
Radiograph
Basic words | Truth = Positive | Truth = negative |
---|---|---|
Reader’s positive | TP | FP |
Reader’s negative | TN | FN |
We only use TP and FP, thus we call it hit and false alarms, respectively. That is:
Basic words | Truth = Positive | Truth = negative |
---|---|---|
Reader’s positive | hit | False alarm |
Reader’s negative | TN | FN |
Number of hits are denoted by h
and number of false alarms are denoted by f
in the R console, respectively.
Number of hits are denoted by \(H\) in TeX and number of false alarms are denoted by \(F\) in TeX.
Suppose that there are \(N_I\) images (e.g., radiograph) in which there is \(N_L\) lesions that should be detected by radiologists. Each image may contain no lesions. Radiologist identify suspicious locations of lesions for each image if he suspects that there are lesions with his confidence level that is number \(1,2, ..., c, ..., C\). So, radiologist can answer multiple locations for a single image, this multiples differ from ordinal ROC analysis which allows each reader only single dichotomous answer for each image. Summarizing his true positive \(H_c\) and false positive (false alarm) \(F_c\) for each confidence level generate a FROC dataset \((F_c,H_c)\). Now, we introduced the notations, \(N_L\), \(N_I\), \(H_c\), \(F_c\), \(C\). In the R console, these notations are represented by NL, NI, h, f, C
.
If \(C=5\), then the dataset for FROC analysis is the follows;
Confidence Level | No. of Hits | No. of False alarms |
---|---|---|
5 = definitely present | \(H_{5}\) | \(F_{5}\) |
4 = probably present | \(H_{4}\) | \(F_{4}\) |
3 = equivocal | \(H_{3}\) | \(F_{3}\) |
2 = probably absent | \(H_{2}\) | \(F_{2}\) |
1 = questionable | \(H_{1}\) | \(F_{1}\) |
dat <- list(
#Confidence level.
c = c(3,2,1),
#Number of hits for each confidence level.
h = c(97,32,31),
#Number of false alarms for each confidence level.
f = c(1,14,74),
#Number of lesions
NL= 259,
#Number of images
NI= 57,
#Number of confidence level
C= 3
)
This code means the following data:
Number of Confidence Level | Number of Hits | Number of False alarms |
---|---|---|
3 = definitely present | \(H_{3}=97\) | \(F_{3}=1\) |
2 = equivocal | \(H_{2}=32\) | \(F_{2}=14\) |
1 = questionable | \(H_{1}=31\) | \(F_{1}=74\) |
##### Minor remark Note that the maximal number of confidence level, denoted by C
, are included, however, confidence level vector c
should not be specified. If specified, will be ignored , since it is created by c <-c(rep(C:1))
in the program and do not refer from user input data, where C
is the highest number of confidence levels. Should write down your hits and false alarms vector so that it is compatible with this automatically created vector c
.
Note that the confidence level vector is not required in the above code, but we assume it is a following vector:
Do not confuse with c(1,2,3)
and this order never permitted to users.
Note that the above example data is endowed in this package as the following object:
Please use BayesianFROC::create_dataset()
to make a your own dataset.
R console | Definitions |
---|---|
h |
positive integer vector, representing Number of hits for each reader, confidence level and modality. |
f |
positive integer vector, representing Number of false alarms for each reader, confidence level and modality. |
NL |
positive integer, representing Number of Lesions. |
NI |
positive integer, representing Number of Images. |
C |
A natural number. The highest confidence level, representing reader’s most highest confidence, that is Definitely lesions exist. |
It is simple to fit FROC models to data, that is run the function BayesianFROC::fit_Bayesian_FROC()
as follows:
# I do not know why, but a computer cannot find Rcpp function. So I have to attach the package Rcpp. This is not desired one for me.
library(Rcpp)
# Prepare dataset
dat <- BayesianFROC::dataList.Chakra.1 # data shown in the above example.
#Fitting
fit <-BayesianFROC::fit_Bayesian_FROC(dat)
The following will be done by BayesianFROC::fit_Bayesian_FROC()
rstan::stan()
.\[H_{c } \sim \text{Binomial} ( p_{c}, N_{L} ),\] \[F_{c } \sim \text{Poisson} ( (\lambda _{c} -\lambda _{c+1} )\times N_{I} ),\] \[\lambda _{c} = - \log \Phi ( z_{c } ),\] \[p_{c} =\Phi (\frac{z_{c +1}-\mu}{\sigma})-\Phi (\frac{z_{c}-\mu}{\sigma}). \] In this model, \(z_{c},c=1,\cdots,C+1\), \(\mu\), and \(\sigma\) are the parameters to be estimated.
For the details, please see the authors paper. Note that this model is used if default value ModifiedPoisson = FALSE
is retained .
Some minor change.
In the function BayesianFROC::fit_Bayesian_FROC()
, if you enter ModifiedPoisson = TRUE
then the above model is change into
\[F_{c } \sim \text{Poisson} ( (\lambda _{c} -\lambda _{c+1} )\times N_{L} ),\] for false alarms. Then this change the interpretation of parameters \(\lambda_c\) from false rates per image to per lesion.
# I do not why, but my machine cannot find some function in Rcpp. So I have to load the package Rcpp. I think users does not need to load Rcpp.
library(Rcpp)
# Prepare dataset
dat <- BayesianFROC::dataList.Chakra.1 # data
#Fitting
fit <-BayesianFROC::fit_Bayesian_FROC(dat)
#Interpretation of Outputs The results of BayesianFROC::fit_Bayesian_FROC(dat)
are as follows:
The correspondence of notations between the R console and the author’s paper:
R console | The author’s paper(*) (LateX) | Definition |
---|---|---|
A |
\(A\) | AUC. ( the area under the AFROC curve ) |
z[1] |
\(z_1\) | Threshold of the bi-normal assumption for confidence level 1 |
z[2] |
\(z_2\) | Threshold of the bi normal assumption for confidence level 2 |
z[3] |
\(z_3\) | Threshold of the bi-normal assumption for confidence level 3 |
z[4] |
\(z_4\) | Threshold of the bi-normal assumption for confidence level 4 |
m |
\(\mu\) | Mean of the Latent Gaussian variable for signal |
v |
\(\sigma\) | Standard deviation of the Latent Gaussian variable for signal |
p[1] |
\(p_1\) | Hit rate for confidence level 1 |
p[2] |
\(p_2\) | Hit rate for confidence level 2 |
p[3] |
\(p_3\) | Hit rate for confidence level 3 |
p[4] |
\(p_4\) | Hit rate for confidence level 4 |
l[1] |
\(\lambda_1\) | False alarm rate for confidence level 1 |
l[2] |
\(\lambda_2\) | False alarm rate for confidence level 2 |
l[3] |
\(\lambda_3\) | False alarm rate for confidence level 3 |
(*) The author’s paper: Bayesian Models for Free-response Receiver Operating Characteristic Analysis.
Note that v
= \(\sqrt{\sigma^2} \neq \sigma^2\).
From here, we show the case of single reader and single modality.
For multiple readers and multiple modalities case, please show the other vignette.
If user has any questions, please tell me.
If I am wrong, then please let me know. My background is mathematics, especially Differential geometry. So I can understand any mathematical materials,
tsunoda.issei1111 at gmail.com
abbreviation | word | meaning | TeX | R console |
---|---|---|---|---|
Reader | Radiologist or doctor or etc | reader try to find lesion (nodule) from | Subscript \(r\) for reader ID and \(R\) for the number of readers | qd for reader ID and Q for the number of readers |
SRSC | single reader and single case | case indicating reader | srsc |
|
MRMC | multiple reader and multiple case | case indicating reader | ||
H , TP | Hit | Number of True Positive | \(H\) | h |
F , FP | Number of False Positive | \(F\) | f |
|
AUC | Area under the curve | curve indicating AFROC curve | Single reader and single modality case it is denoted by \(A\). In MRMC case, \(A_m\) for m-th modality or \(A_{m,r}\) for the \(m\)-th modality and \(r\)-th reader | A for one indexed array or AA for array having two subscripts, \(A_m\)=A[md] , \(A_{m,r}\) = AA[md,qd] where \(m\)=md indicating modality ID, and \(r\) = rd indicating reader ID. |
Signal | nodule or lesion | Non-healthy case | ||