Description

Polatity sentences (Portuguese language)

Computer-BR dataset used on article "Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets". Shared online [download].

Article link: https://link.springer.com/chapter/10.1007/978-3-319-41552-9_8

Dataset based from article "The role of text pre-processing in opinion mining on a social media language dataset" and shared online [download].

Article Link: https://ieeexplore.ieee.org/abstract/document/6984806

Sentences from Google PlayStore in portuguese, with negative and positive labels.

Sentiment Lexicons (Portuguese language)

This dataset contains sentiment lexicons for the Portuguese language with 56,755 terms in restaurant-specific domain [download].

Sample sentiment lexicon

"excelente","0.9919043535940205","0.008095646405979519","positivo"

where

excelente - is the term of lexicon.

0.9919043535940205 - probability of term ("excelente") to be positive.

0.008095646405979519 - probability of term ("excelente") to be negative.

positivo - class of the term "excelente".

Example

term	p_pos	p_neg	class
excelente	0.991904	0.008096	positivo
agradável	0.971788	0.028212	positivo
ruim	0.3268206840537858	0.6731793159462143	negativo

Code

path = 'lexicons-webmedia21.csv' df = pd.read_csv(path) df.head()

Citation

Please cite the following if you use the data:

Tiago de Melo. Building a Restaurant-Specific Sentiment Lexicon via Probability Theory. In: Proceedings of the Brazilian Symposium on Multimedia and the Web (WebMedia). 2021. p. 129-132. [link]

SentiProdBR (Portuguese Language)

This dataset contains sentiment lexicons for the Portuguese language with 32,009 terms in 10 product domains [download].

Sample sentiment lexicon

"laptops", "fácil", "0.9801980198019802", "0.0198019801980198", "positive"

where

laptops - is the product domain.

fácil - is the term of lexicon.

0.9801980198019802 - probability of term ("fácil") to be positive.

0.0198019801980198 - probability of term ("fácil") to be negative.

positivo - class of the term "fácil".

Example

domain	term	p_pos	p_neg	class
laptops	fácil	0.9801980198019802	0.0198019801980198	positive
pets	molegngo	0.0	1.0	negative
food	saboroso	0.9769021739130436	0.02309782608695652	positive

Code

path = 'sentiprodbr.csv' df = pd.read_csv(path) df.head()

Citation

Please cite the following if you use the data:

Tiago de Melo. SentiProdBR: Building Domain-Specific Sentiment Lexicons for the Portuguese Language. In: Anais do XXXVI Simpósio Brasileiro de Bancos de Dados. 2021. p. XXX-YYY.