R/stem_modified_hunspell.R
stem_modified_hunspell.Rd
This function uses Hunspell Stemmer to stem a vector of words. It uses the (Portuguese Brazilian) dictionary by default, and unlike hunspell::hunspell_stem it returns only one stem per word.
stem_modified_hunspell(words, complete = TRUE)
words | character vector of words to be stemmed |
---|---|
complete | wheter words must be completed or not (T) |
Then it uses the rslp stemmer in the hunspell stemmed result.
As hunspell_stem can return a list of stems for each word, the function takes the stems that appears the most in the vector for each word.
words <- c("balões", "aviões", "avião", "gostou", "gosto", "gostaram") ptstem:::stem_modified_hunspell(words)#> [1] "balões" "aviões" "aviões" "gostou" "gostou" "gostou"