# Get started

Welcome to the ‘Get started’ page of the jfa package. In this vignette you are able to find detailed examples of how you can incorporate the functions provided by the package into your statistical sampling workflow.

## Example data

To concretely illustrate jfa’s functionality, we consider the BuildIt data set that is included in the package (for more info, see ?BuildIt). This data set contains a population of 3500 invoices paid to a fictional construction company. Each invoice has an identification number (ID), a recorded value (bookValue), and a corresponding audit (true) value (auditValue).

Note: The information in the auditValue column is added for illustrative purposes, as it is unknown to the auditor before having audited any items from a sample.

First, we load the jfa package and the BuildIt data. The first 10 invoices from the data set are displayed below.

library(jfa)

data('BuildIt')
head(BuildIt, n = 10)
##       ID bookValue auditValue
## 1  82884    242.61     242.61
## 2  25064    642.99     642.99
## 3  81235    628.53     628.53
## 4  71769    431.87     431.87
## 5  55080    620.88     620.88
## 6  93224    501.76     501.76
## 7  24331    466.01     466.01
## 8  81460    295.20     295.20
## 9  14608    216.48     216.48
## 10 79064    243.43     243.43

For a fully explained walkthrough of jfa’s workflow functionality using the BuildIt data set, see Workflow: Classical audit sampling. For a Bayesian version of the walkthrough, see Workflow: Bayesian audit sampling.

## (Optional) Using auditPrior(): The basics

The auditPrior() function allows you to create a prior distribution for audit sampling. More specifically, this function sets up the prior distribution for the misstatement parameter in the statistical model that is specified later on. One advantage of Bayesian inference for auditors is that the prior distribution can be used to incorporate existing information into the statistical procedure, possibly yielding a decrease in sample size, and an increase in efficiency. The type of audit information that can be incorporated depends on the information that is available to the auditor. See the vignette Planning: Prior distributions or the accompanying article for a detailed explanation of the types of audit information that jfa is able to incorporate into a prior distribution.

With the prior distribution in hand, Bayesian planning and evaluation can be performed by providing the returned object from the auditPrior() function as input for the prior argument in the planning() and evaluation() functions.

## Using planning(): The basics

First of all, planning a sample requires that you have knowledge of the objective of your sampling procedure. Generally, a sampling objective can be one (or both) of the following: to test the population misstatement against a performance materiality (i.e., the maximum tolerable misstatement in the population), or to estimate the population misstatement with a minimum precision. Furthermore, it is advised to obtain knowledge of the assumed distribution of the expected data (binomial, poisson, or hypergeometric), and the expected (or tolerable) errors in the sample. When planning an audit sample, it is strongly advised to set the value for the expected errors in the sample conservatively to minimize the chance of the observed errors in the sample exceeding the expected errors, which would imply that insufficient work has been done in the end.

With the BuildIt data set, because we have access to the booked amounts (monetary values) of each invoice in the population, we are going to consider each monetary unit in the population as a possible unit of inference. In this case, the audit standards assume such data to be distributed according to the Poisson distribution, and so we will use this distribution in this example as well. For illustrative purposes, we will use a strict requirement where the population will only be approved when the sample contains no misstatements.

### Testing against a performance materiality

First, we take a look at how you can use the planning() function to construct a sample with the objective of testing the misstatement in the population against a performance materiality. In this example, we will set the performance materiality at 5% of the total value of the population.

Sampling objective: Calculate a minimal sample size such that, when no misstatements are found in the sample, you have obtained 95% assurance that the misstatement in the population is lower than 5% of the total value.

Planning a sample with this objective in mind can be done using the code below (specifically by specifying the materiality argument). Next, a summary of the results can be obtained using the summary() function. As you can see below, the minimal sample size to achieve 95% assurance with respect to the performance materiality of 5% is 60 monetary units.

stage1 <- planning(materiality = 0.05, expected = 0, likelihood = 'poisson', conf.level = 0.95)
summary(stage1)
##
##  Classical Audit Sample Planning Summary
##
## Options:
##   Confidence level:              0.95
##   Materiality:                   0.05
##   Hypotheses:                    H0: T >= 0.05 vs. H1: T < 0.05
##   Expected:                      0
##   Likelihood:                    poisson
##
## Results:
##   Minimum sample size:           60
##   Tolerable errors:              0
##   Expected most likely error:    0
##   Expected upper bound:          0.049929
##   Expected precision:            0.049929
##   Expected p-value:              < 2.22e-16

### Obtaining a minimum precision

Next, we take a look at how you can use the planning() function to construct a sample with the objective of estimating the misstatement in the population with a minimum precision. The precision is defined as the difference between the most likely misstatement and the upper confidence bound on the misstatement and is an indication of the accuracy of your estimate. For this example, we will set the minimum precision to 2% of the population value.

Sampling objective: Calculate a minimal sample size such that, when zero misstatements are found in the sample, you have obtained 95% assurance that the misstatement in the population is at most 2% above the most likely misstatement.

Planning a sample with this objective can be done using the code below (specifically by specifying the min.precision argument). As you can see below, the minimal sample size for to achieve a precision of at least 2% is 150 monetary units.

stage1 <- planning(min.precision = 0.02, expected = 0, likelihood = 'poisson', conf.level = 0.95)
summary(stage1)
##
##  Classical Audit Sample Planning Summary
##
## Options:
##   Confidence level:              0.95
##   Min. precision:                0.02
##   Expected:                      0
##   Likelihood:                    poisson
##
## Results:
##   Minimum sample size:           150
##   Tolerable errors:              0
##   Expected most likely error:    0
##   Expected upper bound:          0.019971
##   Expected precision:            0.019971

## Using selection(): The basics

Selecting a sample using the selection() function requires knowledge of the sampling units (i.e., units of inference) in the population. Items can be selected from the population using record sampling (also known as attribute sampling or item sampling) using units = 'items', or using monetary unit sampling (MUS) using units = 'values'. Selection also requires knowledge of the sampling algorithm. Sampling units can be selected with a random sampling scheme using method = 'random', with a cell sampling scheme using method = 'cell', or with a fixed interval sampling (also known as systematic sampling) scheme using method = 'interval'.

See the vignette Selection: Sampling methodology for a more detailed explanation the selection algorithms implemented in jfa.

### Record sampling

First, we take a look at how you can use the selection() function to perform random sampling from the items in the population. As an example, the code below samples 60 invoices from the BuildIt data set using a random record sampling scheme.

set.seed(1)
stage2 <- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
summary(stage2)
##
##  Audit Sample Selection Summary
##
## Options:
##   Requested sample size:         60
##   Sampling units:                items
##   Method:                        random sampling
##
## Data:
##   Population size:               3500
##
## Results:
##   Selected sampling units:       60
##   Selected items:                60
##   Proportion of size:            0.017143

### Monetary unit sampling (MUS)

Next, we take a look at how you can use the selection() function to perform fixed interval sampling using the monetary units in the population as sampling units. As an example, the code below samples 150 monetary units from the BuildIt data set using a fixed interval monetary unit sampling scheme.

stage2 <- selection(data = BuildIt, size = 150, units = 'values', method = 'interval', values = 'bookValue')
summary(stage2)
##
##  Audit Sample Selection Summary
##
## Options:
##   Requested sample size:         150
##   Sampling units:                monetary units
##   Method:                        fixed interval sampling
##   Starting point:                1
##
## Data:
##   Population size:               3500
##   Population value:              1403221
##   Selection interval:            9354.8
##
## Results:
##   Selected sampling units:       150
##   Proportion of value:           0.0001069
##   Selected items:                150
##   Proportion of size:            0.042857

### Extracting the sample

The selected sample is stored in the object that is returned by the selection() function. It can be accesses or extracted by indexing it via $sample. The first 10 invoices in the previously selected sample of 60 invoices are displayed below. After this step it is up to the auditor to annotate the sample with their audit values. set.seed(1) stage2 <- selection(data = BuildIt, size = 60, units = 'items', method = 'random') sample <- stage2$sample
head(sample, n = 10)
##     row times    ID bookValue auditValue
## 1  1017     1 50755    618.24     618.24
## 2   679     1 20237    669.75     669.75
## 3  2177     1  9517    454.02     454.02
## 4   930     1 85674    257.82     257.82
## 5  1533     1 31051    308.53     308.53
## 6   471     1 84375    824.66     824.66
## 7  2347     1 75616    623.70     623.70
## 8   270     1 82033    352.75     352.75
## 9  1211     1 12877     52.89      52.89
## 10 3379     1 85322    330.24     330.24

## Using evaluation(): The basics

After saving the sample and annotating the invoices in the sample with their audit values you can perform statistical inference about the misstatement in the population with the evaluation() function. Next to a data sample as input, this function can also be used when you only have access to summary statistics from a data sample (e.g., sample size and number of errors). For an elaborate explanation of how to use this function in the context of each sampling objective, see the package vignettes Evaluation: Testing misstatement and Evaluation: Estimating misstatement.

### Summary statistics from the sample

First, let’s take a look at how you can use the evaluation() function to evaluate the misstatement in the population using summary statistics from a sample. Suppose that in the previously selected sample of 60 invoices you have found that 1 invoice is missing. Using x = 1 and n = 60 you can provide these summary statistics of the sample to the evaluation() function. Don’t forget to also specify the sampling objectives using the materiality or min.precision arguments. In the following example, a performance materiality of 5% again applies. Also note that these data ($$n$$, $$x$$) are best described using a binomial distribution, which is why we specify method = 'binomial'.

Sampling objective: Evaluate, on the basis of summary statistics of a sample, whether the misstatement in the population exceeds the performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.

stage4 <- evaluation(materiality = 0.05, method = 'binomial', conf.level = 0.95, x = 1, n = 60)
summary(stage4)
##
##  Classical Audit Sample Evaluation Summary
##
## Options:
##   Confidence level:               0.95
##   Materiality:                    0.05
##   Materiality:                    0.05
##   Hypotheses:                     H0: T >= 0.05 vs. H1: T < 0.05
##   Method:                         binomial
##
## Data:
##   Sample size:                    60
##   Number of errors:               1
##   Sum of taints:                  1
##
## Results:
##   Most likely error:              0.016667
##   95 percent confidence interval: [0, 0.07664]
##   Precision:                      0.059973
##   p-value:                        0.19155

As you can see above, the 95% upper bound for the misstatement is higher than 5% and therefore the sample does not provide sufficient evidence to conclude that the misstatement is lower than 5%.

### Annotated sample

Next, we take a look at how you can use the evaluation() function to evaluate the misstatement using an annotated sample. Returning to our annotated sample from the selection() function, suppose that you have audited these 60 invoices and have found that they contain 1 misstatement.

sample$auditValue <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100

You can evaluate the misstatement in the annotated sample using the data, values, values.audit, and times arguments. For example, the code below evaluates the misstatement in the population with respect to the performance materiality of 5% using the commonly used Stringer bound. You can find more information about which evaluation methods are implemented on the home page.

Sampling objective: Evaluate, on the basis of an annotated sample, whether the misstatement in the population exceeds the allocated performance materiality such that there is a 5% ‘chance’ of incorrectly concluding that the population is free of material misstatement.

stage4 <- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
summary(stage4)
##
##  Classical Audit Sample Evaluation Summary
##
## Options:
##   Confidence level:               0.95
##   Materiality:                    0.05
##   Method:                         stringer
##
## Data:
##   Sample size:                    60
##   Number of errors:               1
##   Sum of taints:                  0.1617495
##
## Results:
##   Most likely error:              0.0026958
##   95 percent confidence interval: [0, 0.053222]
##   Precision:                      0.050526

## Using report(): The basics

With the results from the evaluation() function in hand, you can use the report() function to automatically generate a report containing the data, the statistical results and their interpretation, and the conclusion of the sampling procedure with respect to the sampling objectives. The report can be generated by providing the object returned by the evaluation() function to the report() function.

stage4 <- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')

report(stage4, file = 'report.html', format = 'html_document') # Generates .html report