Stem a character vector of words using the selected algorithm.

ptstem_words(words, algorithm = "rslp", complete = T, ...)

ptstem(texts, algorithm = "rslp", n_char = 3, complete = T,
  ignore = NULL, ...)

Arguments

words, texts

character vector of words.

algorithm

string with the name of the algorithm to be used. One of "hunspell", "rslp", "porter" and modified-hunspell.

complete

wheter to complete words or not i.e. change all words with the same stem by the word that appears the most with that stem.

...

other arguments passed to the algorithms.

n_char

minimum number of characters of words to be stemmed. Not used by ptstem_words.

ignore

vector of words and regex's to igore. Words are wrapped around stringr::fixed() for words like 'banana' dont't get excluded when you ignore 'ana'. Also elements are considered a regex when they contain at least one punctuation symbol.

Details

You can choose wheter to complete words or not using the complete argument. By default all algorithms are completing stems. For hunspell, it's better to always use complete = TRUE since even when using complete = FALSE it will complete words.

Complete finds the stem that appears the most in the full corpus. That's why it should not be used when you are stemming in parallel.

Examples

words <- c("balões", "aviões", "avião", "gostou", "gosto", "gostaram") ptstem_words(words, "hunspell")
#> [1] "balões" "aviões" "aviões" "gostou" "gostou" "gostou"
ptstem_words(words)
#> [1] "balões" "aviões" "aviões" "gostou" "gostou" "gostou"
ptstem_words(words, algorithm = "porter", complete = FALSE)
#> [1] "balõ" "aviõ" "aviã" "gost" "gost" "gost"
texts <- c("coma frutas pois elas fazem bem para a saúde.", "não coma doces, eles fazem mal para os dentes.") ptstem(texts, "hunspell")
#> [1] "coma frutas pois elas fazem bem para a saúde." #> [2] "não coma doces, elas fazem mal para os dentes."
ptstem(texts, n_char = 5)
#> [1] "coma frutas pois elas fazem bem para a saúde." #> [2] "não coma doces, eles fazem mal para os dentes."
ptstem(texts, "porter", n_char = 4, complete = FALSE)
#> [1] "com frut pois elas faz bem par a saúd." #> [2] "não com doc, eles faz mal par os dent."
ptstem(words, ignore = "av.*") # words starting with "av" are not stemmed
#> [1] "balões" "aviões" "avião" "gostou" "gostou" "gostou"