Publicacoes - INESC TEC

Publicações

Publicações por Nuno Ricardo Guimarães

2017

Detecting Journalistic Relevance on Social Media: A two-case study using automatic surrogate features

Autores
Figueira, A; Guimarães, N;

Publicação
Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, Sydney, Australia, July 31 - August 03, 2017

Abstract
The expansion of social networks has contributed to the propagation of information relevant to general audiences. However, this is small percentage compared to all the data shared in such online platforms, which also includes private/personal information, simple chat messages and the recent called ‘fake news’. In this paper, we make an exploratory analysis on two social networks to extract features that are indicators of relevant information in social network messages. Our goal is to build accurate machine learning models that are capable of detecting what is journalistically relevant. We conducted two experiments on CrowdFlower to build a solid ground truth for the models, by comparing the number of evaluations per post against the number of posts classified. The results show evidence that increasing the number of samples will result in a better performance on the relevancy classification task, even when relaxing in the number of evaluations per post. In addition, results show that there are significant correlations between the relevance of a post and its interest and whether is meaningfully for the majority of people. Finally, we achieve approximately 80% accuracy in the task of relevance detection using a small set of learning algorithms. © 2017 Copyright is held by the owner/author(s).

FecharLer Abstract

2016

Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis

Autores
Guimaraes, N; Torgo, L; Figueira, A;

Publicação
KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1

Abstract
In sentiment analysis the polarity of a text is often assessed recurring to sentiment lexicons, which usually consist of verbs and adjectives with an associated positive or negative value. However, in short informal texts like tweets or web comments, the absence of such words does not necessarily indicates that the text lacks opinion. Tweets like "First Paris, now Brussels... What can we do?" imply opinion in spite of not using words present in sentiment lexicons, but rather due to the general sentiment or public opinion associated with terms in a specific time and domain. In order to complement general sentiment dictionaries with those domain and time specific terms, we propose a novel system for lexicon expansion that automatically extracts the more relevant and up to date terms on several different domains and then assesses their sentiment through Twitter. Experimental results on our system show an 82% accuracy on extracting domain and time specific terms and 80% on correct polarity assessment. The achieved results provide evidence that our lexicon expansion system can extract and determined the sentiment of terms for domain and time specific corpora in a fully automatic form.

FecharLer Abstract

2015

AFINA-te - A Healthy Lifestyle Information Website, Online Food Diary and Exercise Log Directly Towards Children

Autores
Guimarães, N; Lebres, VF; Ribeiro, J;

Publicação
CSEDU 2015 - Proceedings of the 7th International Conference on Computer Supported Education, Volume 1, Lisbon, Portugal, 23-25 May, 2015.

Abstract
Childhood obesity is according to the World Health Organization one of the most concerning problems today. Educating children to a healthier lifestyle is a difficult task due to the lack of interest or concern that they demonstrate. The interest that children have in technology and the time they spent online in games or simply surfing the web may be seen as an opportunity to instill knowledge about healthy eating and healthy lifestyle. There are already several online health counseling websites but it seems to exist a lack of such platforms directly towards to children. Afina-te website is an online platform that aims to monitor and educate children to a healthier lifestyle through the exposition of information, interactive applications and educational games. It is also capable of provide feedback about what users eat and the exercise they practice. This paper describes the development and resulting health counseling website.

FecharLer Abstract

2018

Human vs. Automatic Annotation Regarding the Task of Relevance Detection in Social Networks

Autores
Guimaraes, N; Miranda, F; Figueira, A;

Publicação
ADVANCES IN INTERNET, DATA & WEB TECHNOLOGIES

Abstract
The burst of social networks and the possibility of being continuously connected has provided a fast way for information diffusion. More specifically, real-time posting allowed news and events to be reported quicker through social networks than traditional news media. However, the massive data that is daily available makes newsworthy information a needle in a haystack. Therefore, our goal is to build models that can detect journalistic relevance automatically in social networks. In order to do it, it is essential to establish a ground truth with a large number of entries that can provide a suitable basis for the learning algorithms due to the difficulty inherent to the ambiguity and wide scope associated with the concept of relevance. In this paper, we propose and compare two different methodologies to annotate posts regarding their relevance: automatic and human annotation. Preliminary results show that supervised models trained with the automatic annotation methodology tend to perform better than using human annotation in a test dataset labeled by experts.

FecharLer Abstract

2017

Building a Semi-Supervised Dataset to Train Journalistic Relevance Detection Models

Autores
Guimaraes, N; Figueira, A;

Publicação
2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI

Abstract
Annotated data is one of the most important components for supervised learning tasks. To ensure the reliability of the models, this data is usually labeled by several human annotators through volunteering or using Crowdsourcing platforms. However, such approaches are unfeasible (regarding time and cost) in datasets with an enormous number of entries, which in the specific case of journalistic relevance detection in social media posts, is necessary due to the wide scope of topics that can be considered relevant. Therefore, with the goal of building a relevance detection model, we propose an architecture to build a large scale annotated dataset regarding the journalistic relevance of Twitter posts (i.e. tweets). This methodology is based on the predictability of the content in Twitter accounts. Next, we used the retrieved dataset and build relevance detection models, combining text, entities, and sentiment features. Finally, we validated the best model through a smaller manually annotated dataset with posts from Facebook and Twitter. The F1-measure achieved in the validation dataset was 63% which is still far from excellent. However, given the characteristics of the validation data, these results are encouraging since 1) our model is not affected by content from other social networks and 2) our validation dataset was restrained to a specific time interval and specific keywords (which can affect the performance of the model). © 2017 IEEE.

FecharLer Abstract

2018

Twitter as a Source for Time- and Domain-Dependent Sentiment Lexicons

Autores
Guimaraes, N; Torgo, L; Figueira, A;

Publicação
SOCIAL NETWORK BASED BIG DATA ANALYSIS AND APPLICATIONS

Abstract
Sentiment lexicons are an essential component on most state-of-the-art sentiment analysis methods. However, the terms included are usually restricted to verbs and adjectives because they (1) usually have similar meanings among different domains and (2) are the main indicators of subjectivity in the text. This can lead to a problem in the classification of short informal texts since sometimes the absence of these types of parts of speech does not mean an absence of sentiment. Therefore, our hypothesis states that knowledge of terms regarding certain events and respective sentiment (public opinion) can improve the task of sentiment analysis. Consequently, to complement traditional sentiment dictionaries, we present a system for lexicon expansion that extracts the most relevant terms from news and assesses their positive or negative score through Twitter. Preliminary results on a labelled dataset show that our complementary lexicons increase the performance of three state-of-the-art sentiment systems, therefore proving the effectiveness of our approach.

FecharLer Abstract