Apply the Stemming Algorithm for the Portuguese Language to vector of documents. It extracts words using the regex "\b[:alpha:]\b"
rslp_doc( docs, steprules = readRDS(system.file("steprules.rds", package = "rslp")) )
docs | chr vector of documents |
---|---|
steprules | as obtained from the function extract_rules. (only define if you are certain about it). The default is to get the parsed versionof the rules installed with the package. |
V. Orengo, C. Huyck, "A Stemming Algorithmm for the Portuguese Language", SPIRE, 2001, String Processing and Information Retrieval, International Symposium on, String Processing and Information Retrieval, International Symposium on 2001, pp. 0186, doi:10.1109/SPIRE.2001.10024
docs <- c("coma frutas pois elas fazem bem para a saúde.", "não coma doces, eles fazem mal para os dentes.") rslp_doc(docs)#> [1] "com frut poi ela faz bem par a saud." #> [2] "nao com doc, ele faz mal par os dent."