Welcome to the ‘Get started’ page of the jfa
package. In this vignette you are able to find detailed examples of how you can incorporate the functions provided by the package into your statistical sampling workflow.
To concretely illustrate jfa
’s functionality, we consider the BuildIt
data set that is included in the package (for more info, see ?BuildIt
). This data set contains a population of 3500 invoices paid to a fictional construction company. Each invoice has an identification number (ID
), a recorded value (bookValue
), and a corresponding audit (true) value (auditValue
).
Note: The information in the auditValue
column is added for illustrative purposes, as it is unknown to the auditor before having audited any items from a sample.
First, we load in the data set and display the first 10 invoices in the population.
library(jfa)
data('BuildIt')
head(BuildIt, n = 10)
## ID bookValue auditValue
## 1 82884 242.61 242.61
## 2 25064 642.99 642.99
## 3 81235 628.53 628.53
## 4 71769 431.87 431.87
## 5 55080 620.88 620.88
## 6 93224 501.76 501.76
## 7 24331 466.01 466.01
## 8 81460 295.20 295.20
## 9 14608 216.48 216.48
## 10 79064 243.43 243.43
For a walkthrough of jfa
’s workflow functionality using the BuildIt
data set, see The audit sampling workflow. For a Bayesian version of the the walkthrough, see The Bayesian audit sampling workflow.
auditPrior()
: The basicsThe auditPrior()
function allows you to create a prior distribution for the misstatement in the population that can be used to incorporate existing information into the sampling workflow. Prior distributions can be constructed based on various types of audit information. See the vignette Prior distributions for a detailed explanation of the types of audit information that jfa
is able to incorporate into a prior distribution. Bayesian planning and evaluation using the prior distribution can be performed by specifying the returned object from the auditPrior()
function as input for the prior
argument in the planning()
and evaluation()
functions.
planning()
: The basicsPlanning a sample using the planning()
function requires knowledge of the objective of the sampling procedure, the assumed distribution of the data (binomial
, poisson
, or hypergeometric
), and the expected errors in the sample. However, it is advised to set the value for the expected errors in the sample conservatively to minimize the probability that the observed errors in the sample exceed the expected errors, which would imply that insufficient work has been done in the end.
Because we have access to the recorded values of the invoices in the population, we are going to consider each monetary unit in the population as a possible unit of inference. The audit standards assume that such data are distributed according to the Poisson distribution and so we will assume this distribution in this example as well. For illustrative purposes, we do not expect to find any errors in the sample.
First, we take a look at how you can use the planning()
function to construct a sample with the objective of testing the misstatement in the population against a performance materiality (i.e., the maximum tolerable misstatement in the population). In this example, we will set the performance materiality at 5% of the total value of the population.
Sampling objective: Calculate a minimal sample size such that, when zero misstatements are found in the sample, there is a 95% ‘chance’ that the misstatement in the population is lower than 5% of the population value.
Constructing a sample with this objective in mind can be done using the code below (specifically by specifying the materiality
argument). A summary of the output can be obtained from the summary()
function. As you can see, the minimal sample size to achieve 95% assurance with respect to the performance materiality of 5% is 60 monetary units.
<- planning(materiality = 0.05, expectedError = 0, likelihood = 'poisson', confidence = 0.95, N = 3500)
stage1 summary(stage1)
## # ------------------------------------------------------------
## # jfa Planning Summary (Classical)
## # ------------------------------------------------------------
## # Input:
## #
## # Confidence: 95%
## # Materiality: 5%
## # Minimum precision: Not specified
## # Likelihood: poisson
## # Expected sample errors: 0
## # ------------------------------------------------------------
## # Output:
## #
## # Sample size: 60
## # ------------------------------------------------------------
## # Statistics:
## #
## # Expected most likely error: 0%
## # Expected upper bound: 4.993%
## # Expected precision: 4.993%
## # ------------------------------------------------------------
Next, we take a look at how you can use the planning()
function to construct a sample with the objective of obtaining a minimum precision. The precision is an indication of the accuracy of your estimate of the misstatement, as it is defined as the difference between the most likely misstatement and the upper confidence bound on the misstatement. For this example, we will set the minimum precision to 2% of the population value.
Sampling objective: Calculate a minimal sample size such that, when zero misstatements are found in the sample, there is a 95% ‘chance’ that the misstatement in the population is at most 2% above the most likely misstatement.
Planning a sample with this objective can be done using the code below (specifically by specifying the minPrecision
argument). As you can see, the minimal sample size for to achieve a precision of at least 2% is 150 monetary units.
<- planning(minPrecision = 0.02, expectedError = 0, likelihood = 'poisson', confidence = 0.95, N = 3500)
stage1 summary(stage1)
## # ------------------------------------------------------------
## # jfa Planning Summary (Classical)
## # ------------------------------------------------------------
## # Input:
## #
## # Confidence: 95%
## # Materiality: Not specified
## # Minimum precision: 2%
## # Likelihood: poisson
## # Expected sample errors: 0
## # ------------------------------------------------------------
## # Output:
## #
## # Sample size: 150
## # ------------------------------------------------------------
## # Statistics:
## #
## # Expected most likely error: 0%
## # Expected upper bound: 1.997%
## # Expected precision: 1.997%
## # ------------------------------------------------------------
selection()
: The basicsSelecting a sample using the selection()
function requires knowledge of the sampling units (i.e., units of inference) in the population. Items can be selected from the population using record sampling (also known as attribute sampling) using units = 'records'
, or using monetary unit sampling (MUS) using units = 'mus'
. Selection also requires knowledge of the sampling algorithm. Sampling units can be selected with a random sampling scheme using algorithm = 'random'
, with a cell sampling scheme using algorithm = 'cell'
, or with a fixed interval sampling (also known as systematic sampling) scheme using algorithm = 'interval'
.
First, we take a look at how you can use the selection()
function to perform random sampling from the items in the population. As an example, the code below samples 60 invoices from the BuildIt
data set using a random record sampling scheme.
<- selection(population = BuildIt, sampleSize = 60, units = 'records', algorithm = 'random')
stage2 summary(stage2)
## # ------------------------------------------------------------
## # jfa Selection Summary
## # ------------------------------------------------------------
## # Input:
## #
## # Population size: 3500
## # Requested sample size: 60
## # Sampling units: Records
## # Algorithm: Random sampling
## # ------------------------------------------------------------
## # Output:
## #
## # Obtained sampling units: 60
## # Obtained sample items: 60
## # ------------------------------------------------------------
## # Statistics:
## #
## # Proportion n/N: 0.017
## # ------------------------------------------------------------
Next, we take a look at how you can use the selection()
function to perform fixed interval sampling using the monetary units in the population as sampling units. As an example, the code below samples 150 monetary units from the BuildIt
data set using a fixed interval monetary unit sampling scheme.
<- selection(population = BuildIt, sampleSize = 150, units = 'mus', algorithm = 'interval', bookValues = 'bookValue')
stage2 summary(stage2)
## # ------------------------------------------------------------
## # jfa Selection Summary
## # ------------------------------------------------------------
## # Input:
## #
## # Population size: 3500
## # Requested sample size: 150
## # Sampling units: Monetary units
## # Algorithm: Fixed interval sampling
## # Interval: 9354.805
## # Starting point: 1
## # ------------------------------------------------------------
## # Output:
## #
## # Obtained sampling units: 150
## # Obtained sample items: 150
## # ------------------------------------------------------------
## # Statistics:
## #
## # Proportion n/N: 0.043
## # Percentage of value: 5.294%
## # ------------------------------------------------------------
The selected sample is stored in the object that is returned by the selection()
function. It can be accesses or extracted by indexing it via $sample
. Let’s take a look at the first ten rows of the previously selected sample of 60 invoices.
<- selection(population = BuildIt, sampleSize = 60, units = 'records', algorithm = 'random')
stage2
<- stage2$sample
sample head(sample, n = 10)
## rowNumber count ID bookValue auditValue
## 1 1017 1 50755 618.24 618.24
## 2 679 1 20237 669.75 669.75
## 3 2177 1 9517 454.02 454.02
## 4 930 1 85674 257.82 257.82
## 5 1533 1 31051 308.53 308.53
## 6 471 1 84375 824.66 824.66
## 7 2347 1 75616 623.70 623.70
## 8 270 1 82033 352.75 352.75
## 9 1211 1 12877 52.89 52.89
## 10 3379 1 85322 330.24 330.24
evaluation()
: The basicsAfter saving the sample and annotating the invoices in the sample with their audit values, you can evaluate the misstatement in the sample with the evaluation()
function. The function can also be used with summary statistics from a data sample. For a more elaborate explanation of how to use this function, see the package vignettes Testing misstatement and Estimating misstatement.
First, let’s take a look at how you can use the evaluation()
function to evaluate the misstatement in the population using summary statistics from a sample. Suppose that in the previously selected sample of 60 invoices you have found that 1 invoice is missing. Using nSumstats = 60
and kSumstats = 1
you can provide these outcomes of the sample to the evaluation()
function. Don’t forget to specify your sampling objectives using the materiality
or minPrecision
arguments. In this example, a performance materiality of 5% applies.
Sampling objective: Evaluate, on the basis of summary statistics of a sample, whether the misstatement in the population exceeds the allocated performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.
<- evaluation(materiality = 0.05, method = 'binomial', confidence = 0.95, N = 3500, nSumstats = 60, kSumstats = 1)
stage4 summary(stage4)
## # ------------------------------------------------------------
## # jfa Evaluation Summary (Classical)
## # ------------------------------------------------------------
## # Input:
## #
## # Confidence: 95%
## # Materiality: 5%
## # Minimum precision: Not specified
## # Sample size: 60
## # Sample errors: 1
## # Sum of taints: 1
## # Method: binomial
## # ------------------------------------------------------------
## # Output:
## #
## # Most likely error: 1.667%
## # Upper bound: 7.664%
## # Precision: 5.997%
## # Conclusion: Do not approve population
## # ------------------------------------------------------------
As you can see, the upper bound on the misstatement is higher than 5% and thus the sample does not provide sufficient evidence to conclude that the misstatement is lower than 5%.
Next, we take a look at how you can use the evaluation()
function to evaluate the misstatement using an annotated sample. Suppose that you have audited the 60 invoices in the sample and have found 1 misstatement.
$auditValue <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100 sample
You can evaluate the misstatement in the annotated sample using the sample
, bookValues
, auditValues
, and counts
arguments. For example, the code below evaluates the misstatement in the population with respect to the performance materiality of 5% using the commonly used Stringer bound. You can find more information about which evaluation methods are implemented on the home page.
Sampling objective: Evaluate, on the basis of an annotated sample, whether the misstatement in the population exceeds the allocated performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.
<- evaluation(materiality = 0.05, method = 'stringer', confidence = 0.95,
stage4 sample = sample, bookValues = 'bookValue', auditValues = 'auditValue',
counts = sample$count, N = 3500)
summary(stage4)
## # ------------------------------------------------------------
## # jfa Evaluation Summary (Classical)
## # ------------------------------------------------------------
## # Input:
## #
## # Confidence: 95%
## # Materiality: 5%
## # Minimum precision: Not specified
## # Sample size: 60
## # Sample errors: 1
## # Sum of taints: 0.162
## # Method: stringer
## # ------------------------------------------------------------
## # Output:
## #
## # Most likely error: 0.27%
## # Upper bound: 5.322%
## # Precision: 5.053%
## # Conclusion: Do not approve population
## # ------------------------------------------------------------
report()
: The basicsAfter obtaining the result from the evaluation()
function you can generate a report containing the data, the statistical results and their interpretation, and the conclusion of the sampling procedure with respect to the input for materiality
and minPrecision
. The report can be automatically generated by providing the object returned by the evaluation()
function to the report()
function.
<- evaluation(materiality = 0.05, method = 'stringer', confidence = 0.95,
stage4 sample = sample, bookValues = 'bookValue', auditValues = 'auditValue',
counts = sample$count, N = 3500)
report(stage4, file = 'report.html', format = 'html_document') # Generates .html report