# Introduction to dad

#### 2023-08-30

Below is an overview of the data analysis methods provided by the dad package, and a presentation of the type of data manipulated.

For more information on these elements, see: https://journal.r-project.org/archive/2021/RJ-2021-071/index.html

## Data under consideration

The dad package provides tools for analysing multi-group data. Such data consist of variables observed on individuals, these individuals being organised into groups (or occasions). Hence, there are three types of objects: groups, individuals and variables.

## Implemented methods

For the analysis of such data, a probability density function is associated to each group. Some methods dealing with these functions are implemented:

• Multidimensional scaling (MDS) of probability density functions: function fmdsd (continuous data) or mdsdd (discrete data)
• Hierarchical cluster analysis (HCA) of probability density functions: fhclustd (continuous) or hclustdd (discrete)
• Discriminant analysis (DA) of probability density functions:
• Computation of the misclassification ratio using the one-leave-out method: fdiscd.misclass (continuous) or discdd.misclass (discrete)
• Assignment of groups of individuals, one group after another, for which the class is unknown: fdiscd.predict (continuous) or discdd.predict (discrete)

## Data organisation

In order to facilitate the work with these multi-group data, the dad package uses objects of class "folder" or "folderh". These objects are lists of data frames having particular formats.

### Objects of class folder

Such objects are lists of data frames which have the same column names. Each data frame matches with an occasion (a group of individuals).

An object of class "folder" is created by the functions folder or as.folder (see their help in R).

Example: Ten rosebushes $$A$$, $$B$$, $$\dots$$, $$J$$ were evaluated by 14 assessors, at three sessions, according to several descriptors including their shape Sha, their foliage thickness Den and their symmetry Sym.

library(dad)
data("roses")
x <- roses[, c("Sha", "Den", "Sym", "rose")]
head(x)
##   Sha Den Sym rose
## 1 7.0 6.7 6.7    A
## 2 7.1 7.8 8.1    A
## 3 7.0 6.8 7.4    A
## 4 6.7 4.3 8.1    A
## 5 4.5 7.2 7.8    A
## 6 6.0 7.2 6.1    A

Coerce these data into an object of class "folder":

rosesf <- as.folder(x, groups = "rose")
print(rosesf, max = 9)
## $A ## Sha Den Sym ## 1 7.0 6.7 6.7 ## 2 7.1 7.8 8.1 ## 3 7.0 6.8 7.4 ## [ reached 'max' / getOption("max.print") -- omitted 39 rows ] ## ##$B
##    Sha Den Sym
## 43 8.1 7.7 3.0
## 44 8.6 5.9 6.7
## 45 7.7 6.7 7.4
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
##
## $C ## Sha Den Sym ## 85 0.7 9.3 1.4 ## 86 2.3 7.7 2.4 ## 87 3.6 7.9 7.2 ## [ reached 'max' / getOption("max.print") -- omitted 39 rows ] ## ##$D
##     Sha Den Sym
## 127 9.2 1.8 9.0
## 128 9.0 2.3 9.2
## 129 6.9 2.6 7.6
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
##
## $E ## Sha Den Sym ## 169 5.6 1.7 8.2 ## 170 7.5 3.4 8.6 ## 171 5.8 3.9 5.8 ## [ reached 'max' / getOption("max.print") -- omitted 39 rows ] ## ##$F
##     Sha Den Sym
## 211 8.3 8.0 6.5
## 212 8.4 7.8 3.3
## 213 9.2 8.2 7.6
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
##
## $G ## Sha Den Sym ## 253 8.6 2.0 5.4 ## 254 8.5 2.3 7.9 ## 255 7.6 3.5 7.1 ## [ reached 'max' / getOption("max.print") -- omitted 39 rows ] ## ##$H
##     Sha Den Sym
## 295 6.5 4.3 2.6
## 296 6.6 2.9 2.9
## 297 8.4 5.1 6.4
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
##
## $I ## Sha Den Sym ## 337 4.9 6.5 7.6 ## 338 5.8 6.6 7.9 ## 339 4.3 5.6 6.0 ## [ reached 'max' / getOption("max.print") -- omitted 39 rows ] ## ##$J
##     Sha Den Sym
## 379 4.9 5.2 8.9
## 380 4.6 8.1 8.6
## 381 3.5 7.8 7.4
##  [ reached 'max' / getOption("max.print") -- omitted 39 rows ]
##
## attr(,"class")
## [1] "folder"
## attr(,"same.rows")
## [1] FALSE

### Objects of class folderh

Objects of class "folderh" can be used to avoid redundancies in the data.

In the most useful case, such objects are hierarchical lists of two data frames df1 and df2 related by means of a key which describes the ā1 to Nā relationship between the data frames.

They are created by the function folderh (see its help in R for the case of three data frames or more).

Example: Data about 5 rosebushes (roseflowers$variety). For each rosebush, measures on several flowers (roseflowers$flower).

library(dad)
data(roseflowers)
df1 <- roseflowers$variety df2 <- roseflowers$flower

Build an object of class "folderh":

fh1 <- folderh(df1, "rose", df2)
print(fh1)
## $df1 ## place rose variety ## 34 outdoors 34 v1 ## 40 outdoors 40 v4 ## 60 outdoors 60 v3 ## 66 glasshouse 66 v3 ## 68 glasshouse 68 v4 ## ##$df2
##    rose numflower diameter height nleaves
## 1    34         1     94.5   57.0       8
## 2    34         2     89.5   54.0      10
## 3    40         1     57.0   21.5       9
## 4    40         2     52.5   20.5       5
## 5    40         3     51.5   14.0       7
## 6    60         1     53.0   23.0       4
## 7    60         2     52.0   24.5       9
## 8    66         1     35.0    9.5       4
## 9    66         2     35.0   14.0       6
## 10   66         3     36.0   13.5       7
## 11   68         1     45.5   19.5      10
##
## attr(,"class")
## [1] "folderh"
## attr(,"keys")
## [1] "rose"