Welcome to the ‘Get started’ vignette of the `jfa`

package. This vignette provides a simple explanation of the functions in the package and how they facilitate the statistical audit sampling workflow. See the other vignettes for a more detailed explanation of the functionality of the package.

To concretely illustrate `jfa`

‘s functionality, we consider the `BuildIt`

data set that is included in the package (for more info, see `?BuildIt`

). This data set contains a population of 3500 invoices paid to the fictional ’BuildIt’ construction company. Each invoice has an identification number (`ID`

), a recorded value (`bookValue`

), and a corresponding audit (true) value (`auditValue`

).

**Note:** The information in the `auditValue`

column is added for illustrative purposes since it will only be known to the auditor after having inspected a sample of invoices.

First, we load the `jfa`

package and the `BuildIt`

data set. The first 10 invoices from the data set are displayed below.

```
library(jfa)
data('BuildIt')
head(BuildIt, n = 10)
```

```
## ID bookValue auditValue
## 1 82884 242.61 242.61
## 2 25064 642.99 642.99
## 3 81235 628.53 628.53
## 4 71769 431.87 431.87
## 5 55080 620.88 620.88
## 6 93224 501.76 501.76
## 7 24331 466.01 466.01
## 8 81460 295.20 295.20
## 9 14608 216.48 216.48
## 10 79064 243.43 243.43
```

For a fully illustrated walkthrough of `jfa`

’s workflow functionality using the `BuildIt`

data set, see Workflow: Classical audit sampling. For a Bayesian version of the illustrated walkthrough, see Workflow: Bayesian audit sampling.

`auditPrior()`

: The basicsThe `auditPrior()`

function can be used to create a prior distribution for the misstatement parameter in a statistical audit sampling model. In an audit sampling context, an advantage of Bayesian inference is that the prior distribution can be used to incorporate existing information into the statistical procedure. Incorporating existing information can potentially yield a decrease in sample size and an increase in efficiency. The type of audit information that can be incorporated depends on the information that is available in the context of the audit. See the vignette Planning: Prior distributions or the accompanying article for a detailed explanation of the types of audit information that `jfa`

is able to incorporate into a prior distribution.

With the prior distribution in hand, Bayesian audit sampling can be performed by providing the object returned by the `auditPrior()`

function as input for the `prior`

argument in subsequent calls to the `planning()`

and `evaluation()`

functions.

`planning()`

: The basicsPlanning a minimum sample size requires knowledge of the conditions that lead to acceptance of the population (i.e., the sampling objectives). Generally, a sampling objective can be one (or both) of the following:

**Hypothesis testing**: Obtain measures of evidence for the claim that the misstatement in the population is lower than a given performance materiality (i.e., the maximum tolerable misstatement).**Estimation**: Obtain measures of accuracy for the claim that the misstatement in the population is a certain value (with a minimum precision).

Next to determining the sampling objective(s), it is also important to determine the statistical distribution linking the sample outcomes to the population misstatement (e.g., `poisson`

, `binomial`

, or `hypergeometric`

). All three distributions are standard in an audit sampling context because they are (approximations) of the hypergeometric distribution, but `poisson`

is the default in `jfa`

because it is the most conservative.

Lastly, it is advised to obtain knowledge of the expected (or tolerable) errors in the sample. It is strongly recommended to set the value for the expected errors in the sample conservatively to minimize the chance of the observed errors in the sample exceeding the expected errors, which would imply that insufficient work has been done in the end.

With the `BuildIt`

data set, because the booked amounts (monetary values) of each invoice in the population are given, an auditor may want to make a statement about the amount of misstatement in the population. For illustrative purposes we will tolerate zero misstatements in the sample.

First, let’s take a look at how you can use the `planning()`

function to calculate the minimum sample size for testing the hypothesis that the misstatement in the population is lower than the performance materiality. In this example the performance materiality is set to 5% of the total population value, meaning that the population may not contain more than 5% misstatement.

**Sampling objective**: Calculate a minimum sample size such that, when no misstatements are found in the sample, there is a 95% chance that the misstatement in the population is lower than 5% of the population value.

A minimum sample size for this sampling objective can be calculated by specifying the `materiality`

parameter in the `planning()`

function, see the code below. Next, a summary of the statistical results can be obtained using the `summary()`

function. The results show that, given zero tolerable errors, the minimum sample size is 60 units.

```
<- planning(materiality = 0.05, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
```

```
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Hypotheses: H0: T >= 0.05 vs. H1: T < 0.05
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 60
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.049929
## Expected precision: 0.049929
## Expected p-value: < 2.22e-16
```

Next, let’s take a look at how you can use the `planning()`

function to calculate the minimum sample size for estimating the misstatement in the population with a minimum precision. The precision is defined as the difference between the most likely misstatement and the upper confidence bound on the misstatement. For this example, the minimum precision is set to 2% of the population value.

**Sampling objective**: Calculate a minimum sample size such that, when zero misstatements are found in the sample, there is a 95% chance that the misstatement in the population is at most 2% above the most likely misstatement.

A minimum sample size for this sampling objective can be calculated by specifying the `min.precision`

parameter in the `planning()`

function, see the code below. The results show that, given zero tolerable errors, the minimum sample size is 150 units.

```
<- planning(min.precision = 0.02, expected = 0, likelihood = 'poisson', conf.level = 0.95)
stage1 summary(stage1)
```

```
##
## Classical Audit Sample Planning Summary
##
## Options:
## Confidence level: 0.95
## Min. precision: 0.02
## Expected: 0
## Likelihood: poisson
##
## Results:
## Minimum sample size: 150
## Tolerable errors: 0
## Expected most likely error: 0
## Expected upper bound: 0.019971
## Expected precision: 0.019971
```

`selection()`

: The basicsSelecting a sample using the `selection()`

function requires knowledge of units in the population that are eligible for selection (i.e., sampling units). Sampling units can be items or monetary units. Items can be selected from the population using record sampling (also known as attribute sampling or item sampling) with `units = 'items'`

. On the other hand, monetary units can be selected from the population using monetary unit sampling (MUS) with `units = 'values'`

.

Once the sampling units are determined it should also be determined what method is used to select the units (i.e., the selection method). Sampling units can be selected with a fixed interval sampling (also known as systematic sampling) scheme using `method = 'interval'`

(the default), with a cell sampling scheme using `method = 'cell'`

, using random sampling using `method = 'random'`

, or using modified sieve sampling with `method = 'sieve'`

. See the vignette Selection: Sampling methodology for a more detailed explanation the selection methods implemented in `jfa`

.

First, let’s take a look at how the `selection()`

function can be used to perform random record sampling. Random record sampling implies that the sampling units are set to `items`

and the selection method is set to `random`

. The code below selects the 60 planned invoices from the `BuildIt`

data set using such a random record sampling scheme.

```
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2 summary(stage2)
```

```
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 60
## Sampling units: items
## Method: random sampling
##
## Data:
## Population size: 3500
##
## Results:
## Selected sampling units: 60
## Selected items: 60
## Proportion of size: 0.017143
```

Next, let’s take a look at how the `selection()`

function can be used to perform fixed interval monetary unit sampling. Fixed interval monetary unit sampling implies that the sampling units are set to `values`

and the selection method is set to `interval`

. The code below selects 150 monetary units from the `BuildIt`

data set using such a fixed interval monetary unit sampling scheme.

```
<- selection(data = BuildIt, size = 150, units = 'values', method = 'interval', values = 'bookValue')
stage2 summary(stage2)
```

```
##
## Audit Sample Selection Summary
##
## Options:
## Requested sample size: 150
## Sampling units: monetary units
## Method: fixed interval sampling
## Starting point: 1
##
## Data:
## Population size: 3500
## Population value: 1403221
## Selection interval: 9354.8
##
## Results:
## Selected sampling units: 150
## Proportion of value: 0.0001069
## Selected items: 150
## Proportion of size: 0.042857
```

The selected units and corresponding items are stored in the object that is returned by the `selection()`

function. The sample can be extracted from this object by indexing it via `$sample`

, see the code below. After this step it is up to the auditor to annotate the sample.

```
set.seed(1)
<- selection(data = BuildIt, size = 60, units = 'items', method = 'random')
stage2
<- stage2$sample
sample head(sample, n = 10)
```

```
## row times ID bookValue auditValue
## 1 1017 1 50755 618.24 618.24
## 2 679 1 20237 669.75 669.75
## 3 2177 1 9517 454.02 454.02
## 4 930 1 85674 257.82 257.82
## 5 1533 1 31051 308.53 308.53
## 6 471 1 84375 824.66 824.66
## 7 2347 1 75616 623.70 623.70
## 8 270 1 82033 352.75 352.75
## 9 1211 1 12877 52.89 52.89
## 10 3379 1 85322 330.24 330.24
```

`evaluation()`

: The basicsAfter annotating the items in the sample with their audit values you can perform statistical inference about the misstatement in the population with the `evaluation()`

function. Next to a data sample as input, this function can also be used when only summary statistics from a data sample (e.g., sample size and number of errors) are available. For a more elaborate explanation of the output of this function for each sampling objective, see the package vignettes Evaluation: Testing misstatement and Evaluation: Estimating misstatement.

First, let’s take a look at how the `evaluation()`

function can be combined with summary statistics from a sample. Suppose that in the previously selected sample of 60 invoices it is found that a single invoice is missing an autograph. These summary statistics can be provided to the `evaluation()`

function with `x = 1`

and `n = 60`

. The function also requires that you specify the sampling objectives using the `materiality`

or `min.precision`

arguments. Again, a performance materiality of 5% again applies.

```
<- evaluation(materiality = 0.05, method = 'poisson', conf.level = 0.95, x = 1, n = 60)
stage4 summary(stage4)
```

```
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Materiality: 0.05
## Hypotheses: H0: T >= 0.05 vs. H1: T < 0.05
## Method: poisson
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 1
##
## Results:
## Most likely error: 0.016667
## 95 percent confidence interval: [0, 0.079064]
## Precision: 0.062398
## p-value: 0.19915
```

The results indicate that the most likely error in the population is 1.66%. Moreover, the 95% one-sided confidence interval for the population misstatement ranges from 0% to 7.9% and contains the performance materiality. This implies that we cannot reject the hypothesis that the population misstatement is lower than 5%, which is also indicated by a non-significant *p* value (*p* = 0.199).

Next, let’s take a look at how the `evaluation()`

function can be combined with a data sample. Returning to our annotated sample from the `selection()`

function, suppose that in the previously selected sample of 60 invoices it is found that a single invoice has a true value that deviates from its booked value.

```
$auditValue <- sample$bookValue
sample$auditValue[1] <- sample$auditValue[1] - 100 sample
```

These data can be provided to the `evaluation()`

function using the `data`

, `values`

, `values.audit`

, and `times`

arguments. The `method`

argument determines the method of inference. For example, the code below evaluates the misstatement in the population using the commonly used Stringer bound. You can find more information about which evaluation methods are implemented on the home page.

```
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
summary(stage4)
```

```
##
## Classical Audit Sample Evaluation Summary
##
## Options:
## Confidence level: 0.95
## Materiality: 0.05
## Method: stringer
##
## Data:
## Sample size: 60
## Number of errors: 1
## Sum of taints: 0.1617495
##
## Results:
## Most likely error: 0.0026958
## 95 percent confidence interval: [0, 0.053222]
## Precision: 0.050526
```

The results indicate that the most likely error in the population is 1%. Moreover, the 95% one-sided confidence interval for the population misstatement ranges from 0% to 6.5% and contains the performance materiality. The `stringer`

method does not provide a *p* value for hypothesis testing.

`report()`

: The basicsWith the results from the `evaluation()`

function in hand, a call to the `report()`

function automatically generates a report containing the data, the statistical results and their interpretation, and the conclusion of the sampling procedure with respect to the sampling objectives. The object returned by the `evaluation()`

function can be supplied directly to the `report()`

function, see the code below.

```
<- evaluation(materiality = 0.05, method = 'stringer', conf.level = 0.95,
stage4 data = sample, values = 'bookValue', values.audit = 'auditValue',
times = 'times')
report(stage4, file = 'report.html', format = 'html_document') # Generates .html report
```