a similarity based sampling algorithm — simi.sampler • moleculaR

Used to resize (under-sample) classification oriented data sets. Returns the samples row numbers.

Usage

simi.sampler(
  data,
  class,
  compare.with = 0,
  plot = F,
  sample.size = min(summary(as.factor(data$class)))
)

data: data frame with class column
class: class of interest number
compare.with: choose the class with which similarity is computed. Defaults to 0, being similarity of each sample with its own group. Any other number will compare with the class represented by that number.
plot: create a plot of the similarity before and after sampling.
sample.size: how many are to be sampled. Defaults to the number of sample in the smallest class.

A vector with row numbers of sampled observations