We compute the variation of information (VI) between the partition provided by new_classification and old_classification. The VI between a random partitions (obtained with re-shuffle from original labels in old_classification) and old_classification is also computed. A distribution of VI values from random partitions is built. Finally, from the comparison with this distribution, an empirical p value is given to the VI of the unsupervised cluster analysis.

vi_comparison(old_classification, new_classification, number_iter)

Arguments

old_classification

Character vector. First column of the dataframe returned by function clustering_angular_distance (first element of the output).

new_classification

Character vector.Second column of the dataframe returned by function clustering_angular_distance (first element of the output).

number_iter

Integer value. Specify how many random partition are generated (starting from re-shuffle of labels in old_classification).

Value

Empirical p value.

Author

Gabriele Lubatti gabriele.lubatti@helmholtz-muenchen.de