This vignette focuses on how to create in-text tables with the inTextSummaryTable package.

In this vignette we assume you have ready the data.frame(s) to create the tables. If you have doubts on the data format, please look the introductory vignette at the section “data format”.

We will use the example data available in the clinUtils package. Let’s load the packages and the data, and get started!

    library(inTextSummaryTable)
    library(pander)
    library(tools) # toTitleCase
    library(clinUtils)

    # load example data
    data(dataADaMCDISCP01)
    
    dataAll <- dataADaMCDISCP01
    labelVars <- attr(dataAll, "labelVars")

The getSummaryStatisticsTable creates an in-text table of summary statistics for variable(s) of interest.

The Demographic data (ADSL dataset) is used as example for the summary statistics table.

    dataSL <- dataAll$ADSL

1 Variable(s) to summarize

Variable(s) to summarize in the table are specified via the var parameter.

Different set of statistics are reported depending on the type of variable: Categorical variable or Continuous variable.

See the documentation in section Base statistics for more details on the statistics included by default for each type, via:

? `inTextSummaryTable-stats` 

1.1 Categorical variable

For a discrete/categorical variable, the in-text table can display the counts/percentages of the number of subjects or records for each category of the variable.

1.1.1 Counts of the entire dataset

If no variable is specified (via the var parameter), the counts are displayed for the entire dataset.

    getSummaryStatisticsTable(data = dataSL)

Please note that this is equivalent of setting (var = 'all').

1.1.2 Counts of categories

If a variable is specified (via the var parameter), the counts are displayed for each category.

    getSummaryStatisticsTable(data = dataSL, var = "SEX")

1.1.3 Sort categories

The categories of the variable are sorted alphabetically by default. To sort the categories in a specific order, the variable should be formatted as factor, whose ordered categories are included in its levels.

    # specify manually the order of the categories
    dataSL$SEX <- factor(dataSL$SEX, levels = c("M", "F"))
    getSummaryStatisticsTable(data = dataSL, var = "SEX")
    # order categories based on a numeric variable
    dataSL$SEXN <- ifelse(dataSL$SEX == "M", 2, 1)
    dataSL$SEX <- reorder(dataSL$SEX, dataSL$SEXN)
    getSummaryStatisticsTable(data = dataSL, var = "SEX")

1.1.4 Inclusion of categories not available in the data

By default, the table only includes the categories present in the input data, to ensure a compact table for CSR export.

    dataSLExample <- dataSL
    
    # 'SEX' formatted as character with only male
    dataSLExample$SEX <- "M" # only male
    getSummaryStatisticsTable(data = dataSLExample, var = "SEX")

If extra categories should be represented in the table, the categorical variable should be formatted as a factor, whose levels contain all categories to be displayed in the table.

Furthermore, the parameter: varInclude0 should be set to TRUE or to the specific variable (in case multiple variables are specified) to indicate that categories with 0 counts should be included.

    # 'SEX' formatted as factor, to include also female in the table
    # (even if not available in the data)
    dataSLExample$SEX <- factor("M", levels = c("F", "M"))
    getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = TRUE)
    # or:
    getSummaryStatisticsTable(data = dataSLExample, var = "SEX", varInclude0 = "SEX")

1.1.5 Count table for ‘flag’-variables

A specific type of categorical variable is a ‘flag variable’, which indicates if a record fulfills a specific criteria.

Such variable is typically formatted in the data as:

  • ‘Y’ if the criteria is met for the specific record
  • ‘N’ if the criteria is not fulfilled for the specific record
  • ’’ if the criteria is missing for this record

The name of such variable typically ends with ‘FL’ in a CDISC-compliant ADaM or SDTM dataset.

For example, the subject-level dataset contains the following flag variables:

    labelVars[grep("FL$", colnames(dataSL), value = TRUE)]
##                                    SAFFL                                    ITTFL                                    EFFFL                                  COMP8FL 
##                 "Safety Population Flag"        "Intent-to-Treat Population Flag"               "Efficacy Population Flag"   "Completers of Week 8 Population Flag" 
##                                 COMP16FL                                 COMP24FL                                 DISCONFL                                  DSRAEFL 
##  "Completers of Week 16 Population Flag"  "Completers of Week 24 Population Flag" "Did the Subject Discontinue the Study?"                "Discontinued due to AE?" 
##                                    DTHFL 
##                          "Subject Died?"
    # has the subject discontinued from the study?
    dataSL$DISCONFL
## [1] ""  ""  "Y" "Y" "Y" "Y" "Y"

If this variable is specified in var, the counts for each category is reported:

    getSummaryStatisticsTable(
        data = dataSL,
        var = "SAFFL"
    )

However, the interest is often to only reports the counts for the records fulfilling the criteria (records with ‘Y’). This is the case if the variable is specified via the varFlag parameter too.

    getSummaryStatisticsTable(
        data = dataSL,
        var = "SAFFL",
        varFlag = "SAFFL"
    )

1.1.6 Inclusion of total across categories

To include the total counts across categories, the varTotalInclude parameter should be set to TRUE (or to the specific variable).

    getSummaryStatisticsTable(
        data = dataSL, 
        var = "SEX", 
        varTotalInclude = TRUE
    )

1.2 Continuous variable

For a continuous variable, the in-text table displays standard distribution statistics of the variable.

Please note that missing records (NA) for the variable are filtered, so the count statistics (number of subjects, records, percentage) are based only on the non missing records.

For a continuous variable, the presence of different values for the same subject (and across row/column variables) are checked and an appropriate error message is returned if multiple different values are available.

    getSummaryStatisticsTable(data = dataSL, var = "AGE")

1.3 Continuous and categorical variables in the table

The table can contain a mix of categorical and continuous variables.

    getSummaryStatisticsTable(
        data = dataSL, 
        var = c("AGE", "SEX")
    )

2 Statistics of interest

Statistics of interest and their format are specified via the stats parameter.

If an unique statistic expression is specified, the ‘Statistic’ column doesn’t appear in the table.
In case multiple statistics are specified, these are included as separated row.

2.1 Standard statistic set

A standard set of statistics is specified via specific tags to be passed to the stats function.

The list of available statistics is mentioned in the section ‘Formatted statistics’ in:

    ? `inTextSummaryTable-stats` 

Please see below examples of commonly used statistics.

2.1.1 Categorical table

    # count: n, '%' and m
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = "count"
    )
    # n (%)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = "n (%)"
    )
    # n/N (%)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "SEX",
        stats = "n/N (%)"
    )

2.1.2 Continuous variable

    ## continuous variable
    
    # all summary stats
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "summary"
    )
    # median (range)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "median (range)"
    )
    # median and (range) in a different line:
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "median\n(range)"
    )
    # mean (se)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "mean (se)"
    )
    # mean (sd)
    getSummaryStatisticsTable(
        data = dataSL,
        var = "AGE",
        stats = "mean (sd)"
    )

2.2 Custom statistics formatting (Advanced)

To change the formatting of the statistics, the stats parameter should contain a language object (e.g. expression or call) of the default base set of statistics.

See the documentation in section ‘Base statistics’ for more details on the base statistics included by default, via:

? `inTextSummaryTable-stats` 

For example, the following count table is restricted to the number of subjects per categories:

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("RACE", "SEX"),
        stats = list(N = expression(statN))
    )

The summary statistics table is restricted to the median and range:

    getSummaryStatisticsTable(
        data = dataSL,
        var = c("AGE", "HEIGHTBL", "WEIGHTBL", "BMIBL"),
        varGeneralLab = "Parameter", statsGeneralLab = "",
        colVar = "TRT01P",
        stats = list(
            `median` = expression(statMedian),
            `(min, max)` = expression(paste0("(", statMin, ",", statMax, ")"))
        )
    )