Description

Polatity sentences (Portuguese language)

Computer-BR dataset used on article "Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets". Shared online [download].

Article link: https://link.springer.com/chapter/10.1007/978-3-319-41552-9_8

Dataset based from article "The role of text pre-processing in opinion mining on a social media language dataset" and shared online [download].

Article Link: https://ieeexplore.ieee.org/abstract/document/6984806

Sentences from Google PlayStore in portuguese, with negative and positive labels.

Sentiment Lexicons (Portuguese language)

This dataset contains sentiment lexicons for the Portuguese language with 56,755 terms in restaurant-specific domain [download].

Sample sentiment lexicon

"excelente","0.9919043535940205","0.008095646405979519","positivo"

where

Example

term p_pos p_neg class
excelente 0.991904 0.008096 positivo
agradável 0.971788 0.028212 positivo
ruim 0.3268206840537858 0.6731793159462143 negativo

Code

path = 'lexicons-webmedia21.csv'
df = pd.read_csv(path)
df.head()

Citation

Please cite the following if you use the data:

Tiago de Melo. Building a Restaurant-Specific Sentiment Lexicon via Probability Theory. In: Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia). 2021. p. 129-132. [link]

SentiProdBR (Portuguese Language)

This dataset contains sentiment lexicons for the Portuguese language with 32,009 terms in 10 product domains [download].

Sample sentiment lexicon

"laptops", "fácil", "0.9801980198019802", "0.0198019801980198", "positive"

where

Example

domain term p_pos p_neg class
laptops fácil 0.9801980198019802 0.0198019801980198 positive
pets molegngo 0.0 1.0 negative
food saboroso 0.9769021739130436 0.02309782608695652 positive

Code

path = 'sentiprodbr.csv'
df = pd.read_csv(path)
df.head()

Citation

Please cite the following if you use the data:

Tiago de Melo. SentiProdBR: Building Domain-Specific Sentiment Lexicons for the Portuguese Language. In: Anais do XXXVI Simpósio Brasileiro de Bancos de Dados. 2021. p. XXX-YYY.