Stem Words

Stem a character vector of words using the selected algorithm.

ptstem_words(words, algorithm = "rslp", complete = T, ...)

ptstem(texts, algorithm = "rslp", n_char = 3, complete = T,
  ignore = NULL, ...)

Arguments

words, texts	character vector of words.
algorithm	string with the name of the algorithm to be used. One of `"hunspell"`, `"rslp"`, `"porter"` and `modified-hunspell`.
complete	wheter to complete words or not i.e. change all words with the same stem by the word that appears the most with that stem.
...	other arguments passed to the algorithms.
n_char	minimum number of characters of words to be stemmed. Not used by `ptstem_words`.
ignore	vector of words and regex's to igore. Words are wrapped around `stringr::fixed()` for words like 'banana' dont't get excluded when you ignore 'ana'. Also elements are considered a regex when they contain at least one punctuation symbol.

Details

You can choose wheter to complete words or not using the complete argument. By default all algorithms are completing stems. For hunspell, it's better to always use complete = TRUE since even when using complete = FALSE it will complete words.

Complete finds the stem that appears the most in the full corpus. That's why it should not be used when you are stemming in parallel.

Examples

words <- c("balões", "aviões", "avião", "gostou", "gosto", "gostaram")
ptstem_words(words, "hunspell")
#> [1] "balões" "aviões" "aviões" "gostou" "gostou" "gostou"
ptstem_words(words)
#> [1] "balões" "aviões" "aviões" "gostou" "gostou" "gostou"
ptstem_words(words, algorithm = "porter", complete = FALSE)
#> [1] "balõ" "aviõ" "aviã" "gost" "gost" "gost"

texts <- c("coma frutas pois elas fazem bem para a saúde.",
"não coma doces, eles fazem mal para os dentes.")
ptstem(texts, "hunspell")
#> [1] "coma frutas pois elas fazem bem para a saúde." 
#> [2] "não coma doces, elas fazem mal para os dentes."
ptstem(texts, n_char = 5)
#> [1] "coma frutas pois elas fazem bem para a saúde." 
#> [2] "não coma doces, eles fazem mal para os dentes."
ptstem(texts, "porter", n_char = 4, complete = FALSE)
#> [1] "com frut pois elas faz bem par a saúd."
#> [2] "não com doc, eles faz mal par os dent."
ptstem(words, ignore = "av.*") # words starting with "av" are not stemmed
#> [1] "balões" "aviões" "avião"  "gostou" "gostou" "gostou"

Arguments

Details

Examples

Contents