This dataset was used in the article "Exploring Supervised Learning Models for Multi-Label Text Classification in Brazilian Restaurant Reviews". It contains 4,000 manually annotated sentences extracted from user reviews of Brazilian restaurants posted on platforms such as Google Reviews, Facebook, and TripAdvisor.
Each sentence is labeled with one or more thematic categories from the following set: food, service, ambience, general, price, drink, location, and other. The labels are presented as comma-separated strings in the categoria column.
File format: CSV with two columns:
sentenca — the sentence from the user review.categoria — one or more comma-separated thematic labels.Use cases: Multi-label classification, thematic categorization of user reviews, NLP model evaluation in Portuguese.
Download: [download]
Article link: https://sol.sbc.org.br/index.php/eniac/article/view/25698
Computer-BR dataset used on article "Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets". Shared online [download].
Article link: https://link.springer.com/chapter/10.1007/978-3-319-41552-9_8
Dataset based from article "The role of text pre-processing in opinion mining on a social media language dataset" and shared online [download].
Article Link: https://ieeexplore.ieee.org/abstract/document/6984806
Sentences from Google PlayStore in portuguese, with negative and positive labels.
This dataset contains sentiment lexicons for the Portuguese language with 56,755 terms in restaurant-specific domain [download].
"excelente","0.9919043535940205","0.008095646405979519","positivo"
where
| term | p_pos | p_neg | class |
|---|---|---|---|
| excelente | 0.991904 | 0.008096 | positivo |
| agradável | 0.971788 | 0.028212 | positivo |
| ruim | 0.3268206840537858 | 0.6731793159462143 | negativo |
path = 'lexicons-webmedia21.csv'
df = pd.read_csv(path)
df.head()
Please cite the following if you use the data:
Tiago de Melo. Building a Restaurant-Specific Sentiment Lexicon via Probability Theory. In: Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia). 2021. p. 129-132. [link]
This dataset contains sentiment lexicons for the Portuguese language with 32,009 terms in 10 product domains [download].
"laptops", "fácil", "0.9801980198019802", "0.0198019801980198", "positive"
where
| domain | term | p_pos | p_neg | class |
|---|---|---|---|---|
| laptops | fácil | 0.9801980198019802 | 0.0198019801980198 | positive |
| pets | molegngo | 0.0 | 1.0 | negative |
| food | saboroso | 0.9769021739130436 | 0.02309782608695652 | positive |
path = 'sentiprodbr.csv'
df = pd.read_csv(path)
df.head()
Please cite the following if you use the data:
Tiago de Melo. SentiProdBR: Building Domain-Specific Sentiment Lexicons for the Portuguese Language. In: Anais do XXXVI Simpósio Brasileiro de Bancos de Dados. 2021. p. XXX-YYY.