frscore

R-CMD-check

Functions for automating repeated CNA analyses of a data set using varying consistency and coverage settings, and calculating the fit-robustness -scores of the resulting models.

rean_cna() repeatedly analyzes a data set x with cna using all combinations of consistency and coverage thresholds that can be formed from values provided in the argument attempt — i.e. performs a reanalysis series where consistency and coverage are varied within a given range at some granularity. Depending on the value of the argument output = c("csf", "asf"), either csfs or asfs are extracted from the results of each analysis, and these are returned in a list where each list element is a data frame of all the csfs or asfs returned with a given con/cov setting, with additional columns ‘cnacon’ and ‘cnacov’ that contain the con/cov values used in the respective cna run. ncsf can be used to control how many csfs are calculated when output = "csf". Primarily a helper function for frscored_cna(). Accepts additional cna arguments which are passed to the cna calls, notably, one should specify the type of input data with the type -argument when x is not binary crisp-set data.

res <- rean_cna(selectCases("A+B+F*g<->R"), attempt = seq(1, 0.7, -0.1), output = "csf")
res

# [[1]]
# outcome         condition consistency coverage complexity inus cnacon cnacov
# 1 R       A + B + F*g <-> R           1        1          4 TRUE      1      1
# 
# [[2]]
# outcome         condition consistency coverage complexity inus cnacon cnacov
# 1 R       A + B + F*g <-> R           1        1          4 TRUE    0.9      1
# 
# [[3]]
# outcome       condition consistency coverage complexity inus cnacon cnacov
# 1 R       A + B + F <-> R   0.9285714        1          3 TRUE    0.8      1
# 2 R       A + B + g <-> R   0.9285714        1          3 TRUE    0.8      1
# .
# .
# .
#

frscore() takes as argument sols a character vector of cna solutions, and calculates for each solution a score based on how many sub-, super-, or sub- and supermodels the solution has in the set of solutions sols. scoretype can be used to control what the score is based on, the default being “full”, which counts both sub- and supermodel-relationships. Only unique models are printed, hence the reference to model “type” in the output. For each model type, a number of tokens, i.e. the number of times a model appears in sols, is printed. verbose = TRUE prints a breakdown of the sub/supermodel-relationships that contribute to the score of each model. The primary purpose of this function is to analyze the degree to which different models of the same data, obtained at different fit threshold settings, agree with each other in the causal ascriptions they make, using sub- and supermodel-relations as proxy for agreement in causal ascriptions between models. Used this way, the scores represent fit-robustness: the models with low scores compared to the others inferred with different fit thresholds either make some idiosyncratic causal ascriptions indicating overfitting, or are uninformative compared to other equally non-idiosyncratic models of the same data. Results are by default printed for top-20 scoring model types, use print.all = TRUE to print all results. The argument normalize controls whether the scores are normalized and if yes, how. This defaults to “truemax”, by which scores are normalized by the highest score obtained by any model, such that the top scoring model type(s) always get score 1. normalize = "idealmax" normalizes by a theoretical maximum score calculated by assuming that all solutions of equal complexity are identical, and for every solution of a given complexity, all solutions with lower complexity are its submodels. The scores are normalized by default, as they have no absolute interpretation but are only meaningful in comparison to other models of same data obtained at different consistency/coverage settings. frscored_cna() automates this process (of analysing data with varying con/cov and calculating the fit-robustness of the resulting models).

res <- rean_cna(selectCases("A+B+F*g<->R"), attempt = seq(1, 0.7, -0.1))
res <- do.call(rbind, res)

fr <- frscore(res[,2])
fr

# FRscore, score type: full 
# -----
#   
#   Model types: 
#   
#   model              score      tokens
# 1  A + B + F*g <-> R 1.00000000      2
# 2  A + B <-> R       0.73684211     12
# 3  A + B + F <-> R   0.63157895      2
# 4  A + B + g <-> R   0.63157895      2 
# .
# .
# .

frscored_cna() performs a reanalysis series on a data set xby calling rean_cna on x, calculates fit-robustness for each model (type) returned (by calling frscore() on the resulting models), and outputs the models and their scores. Arguments fit.range and granularity determine the reanalysis series, e.g. the defaults fit.range = c(1, 0.7) and granularity = 0.1 determine that the input data set x is repeatedly analyzed with cna while varying consistency and coverage between 1 and 0.7 by 0.1, until all possible combinations of con/cov values within that range and that granularity of variation have been tried. Arguments normalize, verbose, and print.all are used similarly to frscore() ,e.g. verbose = TRUE also prints a breakdown of each model’s fit-robustness score. Outputs a data frame of unique models returned in the reanalysis series, their details similarly to the output of cna, with additional columns ‘score’ and ‘tokens’, which display the fit-robustness score of each model (type), and the number of times the model type appeared in the reanalysis results.

If a candidate model is provided as test.model, the result for that model will be printed separately, provided the model is found in the reanalysis series, if not, the function stops.

The return value is an object of class “frscored_cna”, which is a list that contains some additional elements to those that are printed by the print method, but may be of interest.

frsc <- frscored_cna(ct2df(selectCases("A+B+F*g<->R")))
frsc

# FR-scored reanalysis series with fit range 1 to 0.7 with granularity 0.1 
# Score type: full 
# ----- 
#   
#   Model types: 
#   
#   outcome   condition consistency  coverage complexity inus      score  tokens
# 1  R       A+B+F*g<->R   1.0000000 1.0000000          4 TRUE 1.00000000      2
# 3  R         A+B+F<->R   0.9285714 1.0000000          3 TRUE 0.63157895      2
# 4  R         A+B+g<->R   0.9285714 1.0000000          3 TRUE 0.63157895      2
# .
# .
# .

frsc2 <- frscored_cna(ct2df(selectCases("A+B+F*g<->R")), test.model = "A+f<->R")
frsc2

# FR-scored reanalysis series with fit range 1 to 0.7 with granularity 0.1 
# Score type: full 
# ----- 
 
# Candidate model tested: A+f<->R 
 
#    outcome condition consistency  coverage complexity inus      score tokens
# 50 R         A+f<->R   0.8333333 0.7692308          2 TRUE 0.05263158      1