2020
Autores
Guimaraes, N; Miranda, F; Figueira, A;
Publicação
INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING
Abstract
Social networks have provided the means for constant connectivity and fast information dissemination. In addition, real-time posting allows a new form of citizen journalism, where users can report events from a witness perspective. Therefore, information propagates through the network at a faster pace than traditional media reports it. However, relevant information is a small percentage of all the content shared. Our goal is to develop and evaluate models that can automatically detect journalistic relevance. To do it, we need solid and reliable ground truth data with a significantly large quantity of annotated posts, so that the models can learn to detect relevance over all the spectrum. In this article, we present and confront two different methodologies: an automatic and a human approach. Results on a test data set labelled by experts' show that the models trained with automatic methodology tend to perform better in contrast to the ones trained using human annotated data.
2020
Autores
Guimaraes, N; Figueira, A; Torgo, L;
Publicação
Communications in Computer and Information Science
Abstract
The emergence of online social networks provided users with an easy way to publish and disseminate content, reaching broader audiences than previous platforms (such as blogs or personal websites) allowed. However, malicious users started to take advantage of these features to disseminate unreliable content through the network like false information, extremely biased opinions, or hate speech. Consequently, it becomes crucial to try to detect these users at an early stage to avoid the propagation of unreliable content in social networks’ ecosystems. In this work, we introduce a methodology to extract large corpus of unreliable posts using Twitter and two databases of unreliable websites (OpenSources and Media Bias Fact Check). In addition, we present an analysis of the content and users that publish and share several types of unreliable content. Finally, we develop supervised models to classify a twitter account according to its reliability. The experiments conducted using two different data sets show performance above 94% using Decision Trees as the learning algorithm. These experiments, although with some limitations, provide some encouraging results for future research on detecting unreliable accounts on social networks. © 2020, Springer Nature Switzerland AG.
2020
Autores
Guimaraes, N; Figueira, A; Torgo, L;
Publicação
PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST)
Abstract
The growth of social media as an information medium without restrictive measures on the creation of new accounts led to the rise of malicious agents with the intend to diffuse unreliable information in the network, ultimately affecting the perception of users in important topics such as political and health issues. Although the problem is being tackled within the domain of bot detection, the impact of studies in this area is still limited due to 1) not all accounts that spread unreliable content are bots, 2) human-operated accounts are also responsible for the diffusion of unreliable information and 3) bot accounts are not always malicious (e.g. news aggregators). Also, most of these methods are based on supervised models that required annotated data and updates to maintain their performance through time. In this work, we build a framework and develop knowledge-based metrics to complement the current research in bot detection and characterize the impact and behavior of a Twitter account, independently of the way it is operated (human or bot). We proceed to analyze a sample of the accounts using the metrics proposed and evaluate the necessity of these metrics by comparing them with the scores from a bot detection system. The results show that the metrics can characterize different degrees of unreliable accounts, from unreliable bot accounts with a high number of followers to human-operated accounts that also spread unreliable content (but with less impact on the network). Furthermore, evaluating a sample of the accounts with a bot detection system shown that bots compose around 11% of the sample of unreliable accounts extracted and that the bot score is not correlated with the proposed metrics. In addition, the accounts that achieve the highest values in our metrics present different characteristics than the ones that achieve the highest bot score. This provides evidence on the usefulness of our metrics in the evaluation of unreliable accounts in social networks. Copyright
2020
Autores
Torres, A; Miranda, C;
Publicação
EXPLORING SERVICE SCIENCE (IESS 2020)
Abstract
Service Design (SD) and Design Thinking (DT) evolved in the last decade and have become popular in the research field of service science. However, the application of SD and DT research outcomes into practice is still scarce. To help understanding the differences between research and practice, we conducted 20 semi-structured interviews with professionals and trainees from four organizations that are involved in service innovation projects. The results reveal several similarities and complementarities, (dis)advantages, requests and obstacles, which hinder companies from implementing and using structured SD and DT approaches. The findings present some challenges for both researchers and practitioners on actions they could take to overcome barriers and foster the SD and DT practice within organizations.
2020
Autores
Pereira, FSF; Andrade, T; de Carvalho, ACPLF;
Publicação
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II
Abstract
We present a solution submitted to the Social Media and Harassment Competition held in collaboration with ECML PKDD 2019 Conference. The dataset used is as set of tweets and the first task was on the detection of harassment tweets. To deal with this problem, we proposed a solution based on a gradient tree-boosting algorithm. The second task was categorization harassment tweets according to the type of harassment, a multiclass classification problem. For this problem we proposed a LSTM network model. The solutions proposed for these tasks presented good predictive accuracy.
2020
Autores
Silva, PR;
Publicação
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20)
Abstract
With the advances of the big data era in biology, deep learning have been incorporated in analysis pipelines trying to transform biological information into valuable knowledge. Deep learning demonstrated its power in promoting bioinformatics field including sequence analysis, bio-molecular property and function prediction, automatic medical diagnosis and to analyse cell imaging data. The ambition of this work is to create an approach that can fully explore the relationships across modalities and subjects through mining and fusing features from multi-modality data for cell state classification. The system should be able to classify cell state through multimodal deep learning techniques using heterogeneous data such as biological images, genomics and clinical annotations. Our pilot study addresses the data acquisition process and the framework capable to extract biological parameters from cell images.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.