BayesCVI package is developed for computing and generating plots with and without error bars for Bayesian cluster validity index (BCVI), introduced in Wiroonsri and Preedasawakul(2024), based on several underlying cluster validity indices (CVIs) as listed below. It also allows users to input any other CVIs of their choices. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage).
BayesCVI requires the use of the four packages: e1071
,
mclust
for performing the fuzzy C-means (FCM) and EM algorithms, respectively,
ggplot2
for plotting BCVI and existing CVIs, and UniversalCVI
for some required datasets.
In addition to the evaluation tools, the BayesCVI package also includes 7 simulated datasets intially used for testing BCVI in several perspectives written in Wiroonsri and Preedasawakul(2024).
The underlying CVIs available in this package are listed as follows:
Hard clustering:
Dunn’s index, Calinski–Harabasz index, Davies–Bouldin’s index, Point biserial correlation index, Chou-Su-Lai measure, Davies–Bouldin*’s index, Score function, Starczewski index, Pakhira–Bandyopadhyay–Maulik (for crisp clustering) index, and Wiroonsri index.
Fuzzy clustering:
Xie–Beni index, KWON index, KWON2 index, TANG index , HF index, Wu–Li index, Pakhira–Bandyopadhyay–Maulik (for fuzzy clustering) index, KPBM index, Correlation Cluster Validity index, Generalized C index, Wiroonsri and Preedasawakul index.
Remark
Though BCVI is compatible with any underlying existing CVIs, we recommend users to use either WI or WP as the underlying CVI. BCVI is only effective when underlying indices are present, providing meaningful options for ranking local peaks for the final number of clusters. This point has only been tested with either WI or WP indices.
If you have not already installed mclust
,
e1071
, ggplot2
and UniversalCVI
in your local system, install these package as follows:
install.packages(c('e1071','mclust','ggplot2','UniversalCVI'))
Install BayesCVI
package
install.packages('BayesCVI')
suppressPackageStartupMessages({
library(BayesCVI)
library(UniversalCVI)
library(e1071)
library(mclust)
library(ggplot2)
})
Use B_Wvalid to compute BCVI with WI as the underlying CVI for a clustering results from 2 to 10 groups:
library(BayesCVI)
# The data included in this package.
= B2_data[,1:2]
data
# alpha
= c(5,5,5,20,20,20,0.5,0.5,0.5)
aalpha
= B_Wvalid(x = scale(data), kmax = 10, method = "kmeans", corr = "pearson", nstart = 100, sampling = 1, NCstart = TRUE, alpha = aalpha, mult.alpha = 1/2)
B.WI
# plot the BCVI
= plot_BCVI(B.WI)
pplot $plot_index
pplot$plot_BCVI
pplot$error_bar_plot pplot
Use B_WP.IDX to compute BCVI with WP as the underlying CVI for a clustering results from 2 to 10 groups:
library(BayesCVI)
# The data included in this package.
= B7_data[,1:2]
data
# alpha
= c(20,20,20,5,5,5,0.5,0.5,0.5)
aalpha
= B_WP.IDX(x = scale(data), kmax =10, corr = "pearson", method = "FCM",
B.WP fzm = 2, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE,
alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI
= plot_BCVI(B.WP)
pplot $plot_index
pplot$plot_BCVI
pplot$error_bar_plot pplot
Use BayesCVIS to compute BCVI with any selected underlying CVI for a clustering results from 2 to 10 groups:
library(UniversalCVI)
library(BayesCVI)
= R1_data[,-3]
data
# Compute WP index by WP.IDX using default gamma
= WP.IDX(scale(data), cmax = 10, cmin = 2, corr = 'pearson', method = 'FCM', fzm = 2,
FCM.WP iter = 100, nstart = 20, NCstart = TRUE)
# WP.IDX values
= FCM.WP$WP$WPI
result
= c(20,20,20,5,5,5,0.5,0.5,0.5)
aalpha = BayesCVIs(CVI = result,
B.WP n = nrow(data),
kmax = 10,
opt.pt = "max",
alpha = aalpha,
mult.alpha = 1/2)
# plot the BCVI
= plot_BCVI(B.WP)
pplot $plot_index
pplot$plot_BCVI
pplot$error_bar_plot pplot
Use B_WP.IDX to compute BCVI with WP as the underlying CVI for a clustering results from 2 to 8 groups:
library(UniversalCVI)
library(BayesCVI)
library(imager)
# Download MRI data from https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection
= "https://storage.googleapis.com/kagglesdsdata/datasets/165566/377107/yes/Y164.JPG?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240218%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240218T124934Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=269c3888a6cdc0cb4e9ea127d1e7bef2ecd798260c164acaec727c9cfa19a77428ac3ef792f0267129f20be3a2b8c8ff782f12701a7bd34b1fe7c228f517875906c2e5589c026ed89f2d474e0c3929743a644cdcccbc9567e32c8ee872d03cd77d9d38f4309dd2e5341dc32b04eaae63471d0763e85c4dab7104d0729495c15cc7b983406c4708b65ffc1ffff67ada77bab961cce25ffb4de4a349c81d6dbb35a5e495f8fad105ea3a2478826a70568f09a1cffa8935e29f90ae3be451bc3a2f53f4ac46d6510fc829c5db15d37ba1cb654ec3ab1544e95e451d35689252ee84096bfbd92afdd1afe7243d4555894bfcf7e5f382323f7052a7a98e1548c07955"
x download.file(x,'y.jpg', mode = 'wb')
<- load.image("y.jpg")
IMG1
= data.frame()
IMG.dat
1,"NAME"] = paste0("IMG",1)
IMG.dat[1,"DIM1"] = dim(IMG1)[1]
IMG.dat[1,"DIM2"] = dim(IMG1)[2]
IMG.dat[1,"DIM3"] = dim(IMG1)[3]
IMG.dat[
# convert to RGB
= data.frame(
img.rgb x = rep(1:IMG.dat[1,"DIM2"], each = IMG.dat[1,"DIM1"]),
y = rep(IMG.dat[1,"DIM1"]:1, IMG.dat[1,"DIM2"]),
R = as.vector(get(paste0(IMG.dat[1,"NAME"]))[,,1]),
G = as.vector(get(paste0(IMG.dat[1,"NAME"]))[,,2]),
B = as.vector(get(paste0(IMG.dat[1,"NAME"]))[,,3]))
= img.rgb
IMG1.RGB
= c(25,25,2,2,0.5,0.5,0.5)
aalpha
# use sampling in function to reduce MRI image size
= B_WP.IDX(x = IMG1.RGB[, c("R", "G", "B")], kmax = 8, corr = "pearson", method = "FCM", fzm = 2, sampling = 0.3, iter = 100,
WP.MRI nstart = 20, NCstart = TRUE, alpha = aalpha, mult.alpha = 1/2)
= plot_BCVI(WP.MRI)
pp $plot_index
pp$plot_BCVI
pp$error_bar_plot pp
The BayesCVI package as a whole is distributed under GPL(>=3).