It calculates the proportion of unrelated words that were combined.

overstemming_index(words, stems)

Arguments

words

is a data.frame containing a column word a a column group so the function can identify groups of words.

stems

is a character vector with the stemming result for each word