Welcome to the ‘Get started’ page of the jfa
package. In this vignette you are able to find detailed examples of how you can incorporate the functions provided by the package into your statistical sampling workflow.
To concretely illustrate jfa
’s functionality, we consider the BuildIt
data set that is included in the package (for more info, see ?BuildIt
). This data set contains a population of 3500 invoices paid to a fictional construction company. Each invoice has an identification number (ID
), a recorded value (bookValue
), and a corresponding audit (true) value (auditValue
).
Note: The information in the auditValue
column is added for illustrative purposes, as it is unknown to the auditor before having audited any items from a sample.
First, we load the jfa
package and the BuildIt
data. The first 10 invoices from the data set are displayed below.
library(jfa)
data('BuildIt')
head(BuildIt, n = 10)
## ID bookValue auditValue
## 1 82884 242.61 242.61
## 2 25064 642.99 642.99
## 3 81235 628.53 628.53
## 4 71769 431.87 431.87
## 5 55080 620.88 620.88
## 6 93224 501.76 501.76
## 7 24331 466.01 466.01
## 8 81460 295.20 295.20
## 9 14608 216.48 216.48
## 10 79064 243.43 243.43
For a fully explained walkthrough of jfa
’s workflow functionality using the BuildIt
data set, see Workflow: Classical audit sampling. For a Bayesian version of the walkthrough, see Workflow: Bayesian audit sampling.
auditPrior()
: The basicsThe auditPrior()
function allows you to create a prior distribution for audit sampling. More specifically, this function sets up the prior distribution for the misstatement parameter in the statistical model that is specified later on. One advantage of Bayesian inference for auditors is that the prior distribution can be used to incorporate existing information into the statistical procedure, possibly yielding a decrease in sample size, and an increase in efficiency. The type of audit information that can be incorporated depends on the information that is available to the auditor. See the vignette Planning: Prior distributions or the accompanying article for a detailed explanation of the types of audit information that jfa
is able to incorporate into a prior distribution.
With the prior distribution in hand, Bayesian planning and evaluation can be performed by providing the returned object from the auditPrior()
function as input for the prior
argument in the planning()
and evaluation()
functions.
planning()
: The basicsFirst of all, planning a sample requires that you have knowledge of the objective of your sampling procedure. Generally, a sampling objective can be one (or both) of the following: to test the population misstatement against a performance materiality (i.e., the maximum tolerable misstatement in the population), or to estimate the population misstatement with a minimum precision. Furthermore, it is advised to obtain knowledge of the assumed distribution of the expected data (binomial
, poisson
, or hypergeometric
), and the expected (or tolerable) errors in the sample. When planning an audit sample, it is strongly advised to set the value for the expected errors in the sample conservatively to minimize the chance of the observed errors in the sample exceeding the expected errors, which would imply that insufficient work has been done in the end.
With the BuildIt
data set, because we have access to the booked amounts (monetary values) of each invoice in the population, we are going to consider each monetary unit in the population as a possible unit of inference. In this case, the audit standards assume such data to be distributed according to the Poisson distribution, and so we will use this distribution in this example as well. For illustrative purposes, we will use a strict requirement where the population will only be approved when the sample contains no misstatements.
First, we take a look at how you can use the planning()
function to construct a sample with the objective of testing the misstatement in the population against a performance materiality. In this example, we will set the performance materiality at 5% of the total value of the population.
Sampling objective: Calculate a minimal sample size such that, when no misstatements are found in the sample, you have obtained 95% assurance that the misstatement in the population is lower than 5% of the total value.
Planning a sample with this objective in mind can be done using the code below (specifically by specifying the materiality
argument). Next, a summary of the results can be obtained using the summary()
function. As you can see below, the minimal sample size to achieve 95% assurance with respect to the performance materiality of 5% is 60 monetary units.
<- planning(materiality = 0.05, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Hypotheses: H0: T >= 0.05 vs. H1: T < 0.05
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 60
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.049929
## Expected precision: 0.049929
## Expected p-value: < 2.22e-16
Next, we take a look at how you can use the planning()
function to construct a sample with the objective of estimating the misstatement in the population with a minimum precision. The precision is defined as the difference between the most likely misstatement and the upper confidence bound on the misstatement and is an indication of the accuracy of your estimate. For this example, we will set the minimum precision to 2% of the population value.
Sampling objective: Calculate a minimal sample size such that, when zero misstatements are found in the sample, you have obtained 95% assurance that the misstatement in the population is at most 2% above the most likely misstatement.
Planning a sample with this objective can be done using the code below (specifically by specifying the min.precision
argument). As you can see below, the minimal sample size for to achieve a precision of at least 2% is 150 monetary units.
<- planning(min.precision = 0.02, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Min. precision: 0.02
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 150
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.019971
## Expected precision: 0.019971
selection()
: The basicsSelecting a sample using the selection()
function requires knowledge of the sampling units (i.e., units of inference) in the population. Items can be selected from the population using record sampling (also known as attribute sampling or item sampling) using units = 'items'
, or using monetary unit sampling (MUS) using units = 'values'
. Selection also requires knowledge of the sampling algorithm. Sampling units can be selected with a random sampling scheme using method = 'random'
, with a cell sampling scheme using method = 'cell'
, or with a fixed interval sampling (also known as systematic sampling) scheme using method = 'interval'
.
See the vignette Selection: Sampling methodology for a more detailed explanation the selection algorithms implemented in jfa
.
First, we take a look at how you can use the selection()
function to perform random sampling from the items in the population. As an example, the code below samples 60 invoices from the BuildIt
data set using a random record sampling scheme.
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2 summary(stage2)
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 60
## Sampling units: items
## Method: random sampling
##
## Data:
## Population size: 3500
##
## Results:
## Selected sampling units: 60
## Selected items: 60
## Proportion of size: 0.017143
Next, we take a look at how you can use the selection()
function to perform fixed interval sampling using the monetary units in the population as sampling units. As an example, the code below samples 150 monetary units from the BuildIt
data set using a fixed interval monetary unit sampling scheme.
<- selection(data = BuildIt, size = 150, units = 'values', method = 'interval', values = 'bookValue')
stage2 summary(stage2)
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 150
## Sampling units: monetary units
## Method: fixed interval sampling
## Starting point: 1
##
## Data:
## Population size: 3500
## Population value: 1403221
## Selection interval: 9354.8
##
## Results:
## Selected sampling units: 150
## Proportion of value: 0.0001069
## Selected items: 150
## Proportion of size: 0.042857
The selected sample is stored in the object that is returned by the selection()
function. It can be accesses or extracted by indexing it via $sample
. The first 10 invoices in the previously selected sample of 60 invoices are displayed below. After this step it is up to the auditor to annotate the sample with their audit values.
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2
<- stage2$sample
sample head(sample, n = 10)
## row times ID bookValue auditValue
## 1 1017 1 50755 618.24 618.24
## 2 679 1 20237 669.75 669.75
## 3 2177 1 9517 454.02 454.02
## 4 930 1 85674 257.82 257.82
## 5 1533 1 31051 308.53 308.53
## 6 471 1 84375 824.66 824.66
## 7 2347 1 75616 623.70 623.70
## 8 270 1 82033 352.75 352.75
## 9 1211 1 12877 52.89 52.89
## 10 3379 1 85322 330.24 330.24
evaluation()
: The basicsAfter saving the sample and annotating the invoices in the sample with their audit values you can perform statistical inference about the misstatement in the population with the evaluation()
function. Next to a data sample as input, this function can also be used when you only have access to summary statistics from a data sample (e.g., sample size and number of errors). For an elaborate explanation of how to use this function in the context of each sampling objective, see the package vignettes Evaluation: Testing misstatement and Evaluation: Estimating misstatement.
First, let’s take a look at how you can use the evaluation()
function to evaluate the misstatement in the population using summary statistics from a sample. Suppose that in the previously selected sample of 60 invoices you have found that 1 invoice is missing. Using x = 1
and n = 60
you can provide these summary statistics of the sample to the evaluation()
function. Don’t forget to also specify the sampling objectives using the materiality
or min.precision
arguments. In the following example, a performance materiality of 5% again applies. Also note that these data (\(n\), \(x\)) are best described using a binomial distribution, which is why we specify method = 'binomial'
.
Sampling objective: Evaluate, on the basis of summary statistics of a sample, whether the misstatement in the population exceeds the performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.
<- evaluation(materiality = 0.05, method = 'binomial', conf.level = 0.95, x = 1, n = 60)
stage4 summary(stage4)
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Materiality: 0.05
## Hypotheses: H0: T >= 0.05 vs. H1: T < 0.05
## Method: binomial
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 1
##
## Results:
## Most likely error: 0.016667
## 95 percent confidence interval: [0, 0.07664]
## Precision: 0.059973
## p-value: 0.19155
As you can see above, the 95% upper bound for the misstatement is higher than 5% and therefore the sample does not provide sufficient evidence to conclude that the misstatement is lower than 5%.
Next, we take a look at how you can use the evaluation()
function to evaluate the misstatement using an annotated sample. Returning to our annotated sample from the selection()
function, suppose that you have audited these 60 invoices and have found that they contain 1 misstatement.
$auditValue <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100 sample
You can evaluate the misstatement in the annotated sample using the data
, values
, values.audit
, and times
arguments. For example, the code below evaluates the misstatement in the population with respect to the performance materiality of 5% using the commonly used Stringer bound. You can find more information about which evaluation methods are implemented on the home page.
Sampling objective: Evaluate, on the basis of an annotated sample, whether the misstatement in the population exceeds the allocated performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
summary(stage4)
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Method: stringer
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 0.1617495
##
## Results:
## Most likely error: 0.0026958
## 95 percent confidence interval: [0, 0.053222]
## Precision: 0.050526
report()
: The basicsWith the results from the evaluation()
function in hand, you can use the report()
function to automatically generate a report containing the data, the statistical results and their interpretation, and the conclusion of the sampling procedure with respect to the sampling objectives. The report can be generated by providing the object returned by the evaluation()
function to the report()
function.
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
report(stage4, file = 'report.html', format = 'html_document') # Generates .html report