Publications

Publications by Mário João Antunes

2013

Customized crowds and active learning to improve classification

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Traditional classification algorithms can be limited in their performance when a specific user is targeted. User preferences, e.g. in recommendation systems, constitute a challenge for learning algorithms. Additionally, in recent years user's interaction through crowdsourcing has drawn significant interest, although its use in learning settings is still underused. In this work we focus on an active strategy that uses crowd-based non-expert information to appropriately tackle the problem of capturing the drift between user preferences in a recommendation system. The proposed method combines two main ideas: to apply active strategies for adaptation to each user; to implement crowdsourcing to avoid excessive user feedback. A similitude technique is put forward to optimize the choice of the more appropriate similitude-wise crowd, under the guidance of basic user feedback. The proposed active learning framework allows non-experts classification performed by crowds to be used to define the user profile, mitigating the labeling effort normally requested to the user. The framework is designed to be generic and suitable to be applied, to different' scenarios, whilst customizable for each specific user. A case study on humor classification scenario is used to demonstrate experimentally that the approach can improve baseline active results.

CloseRead Abstract

2013

Defining Semantic Meta-hashtags for Twitter Classification

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, ICANNGA 2013

Abstract
Given the wide spread of social networks, research efforts to retrieve information using tagging from social networks communications have increased. In particular, in Twitter social network, hashtags are widely used to define a shared context for events or topics. While this is a common practice often the hashtags freely introduced by the user become easily biased. In this paper, we propose to deal with this bias defining semantic meta-hashtags by clustering similar messages to improve the classification. First, we use the user-defined hashtags as the Twitter message class labels. Then, we apply the meta-hashtag approach to boost the performance of the message classification. The meta-hashtag approach is tested in a Twitter-based dataset constructed by requesting public tweets to the Twitter API. The experimental results yielded by comparing a baseline model based on user-defined hashtags with the clustered meta-hashtag approach show that the overall classification is improved. It is concluded that by incorporating semantics in the meta-hashtag model can have impact in different applications, e.g. recommendation systems, event detection or crowdsourcing.

CloseRead Abstract

2016

Information System for Automation of Counterfeited Documents Images Correlation

Authors
Vieira, R; Silva, C; Antunes, M; Assis, A;

Publication
INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2016

Abstract
Forgery detection of official documents is a continuous challenge encountered by documents' forensic experts. Among the most common counterfeited documents we may find citizen cards, passports and driving licenses. Forgers are increasingly resorting to more sophisticated techniques to produce fake documents, trying to deceive criminal polices and hamper their work. Having an updated past counterfeited documents image catalogue enables forensic experts to determine if a similar technique or material was already used to forge a document. Thus, through the modus operandi characterization is possible to obtain more information about the source of the counterfeited document. In this paper we present an information system to manage counterfeited documents images that includes a two-fold approach: (i) the storage of images of past counterfeited documents seized by questioned documents forensic experts of the Portuguese Scientific Laboratory in a structured database; and (ii) the automation of the counterfeit identification by comparing a given fraudulent document image with the database images of previously catalogued counterfeited documents. In general, the proposed information system aims to smooth the counterfeit identification and to overcome the error prone, manual and time consuming tasks carried on by forensic experts. Hence, we have used a scalable algorithm under the OpenCV framework, to compare images, match patterns and analyse textures and colours. The algorithm was tested on a subset of counterfeited Portuguese citizen cards, presenting very promising results.

CloseRead Abstract

2017

Adaptive learning for dynamic environments: A comparative approach

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Nowadays most learning problems demand adaptive solutions. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. Various efforts have been pursued in machine learning settings to learn in such environments, specially because of their non-trivial nature, since changes occur between the distribution data used to define the model and the current environment. In this work we present the Drift Adaptive Retain Knowledge (DARK) framework to tackle adaptive learning in dynamic environments based on recent and retained knowledge. DARK handles an ensemble of multiple Support Vector Machine (SVM) models that are dynamically weighted and have distinct training window sizes. A comparative study with benchmark solutions in the field, namely the Learn + +.NSE algorithm, is also presented. Experimental results revealed that DARK outperforms Learn + +.NSE with two different base classifiers, an SVM and a Classification and Regression Tree (CART).

CloseRead Abstract

2018

Adaptive Learning Models Evaluation in Twitter's Timelines

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
Proceedings of the International Joint Conference on Neural Networks

Abstract
Current challenges in machine learning include dealing with temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. This dynamic nature tends to limit the performance of traditional static learning models and dynamic learning strategies must be put forward. However, acquiring the performance of those strategies is not a straightforward issue, as sample's dependency undermines the use of validation techniques, like crossvalidation. In this paper we propose to use the McNemar's test to compare two distinct approaches that tackle adaptive learning in dynamic environments, namely DARK (Drift Adaptive Retain Knowledge) and Learn++. NSE (Learn++ for Non-Stationary Environments). The validation is based on a Twitter case study benchmark constructed using the DOTS (Drift Oriented Tool System) dataset generator. The results obtained demonstrate the usefulness and adequacy of using McNemar's statistical test in dynamic environments where time is crucial for the learning algorithm. © 2018 IEEE.

CloseRead Abstract

2019

A Review on Relations Extraction in Police Reports

Authors
Carnaz, G; Quaresma, P; Nogueira, VB; Antunes, M; Fonseca Ferreira, NM;

Publication
New Knowledge in Information Systems and Technologies - Volume 1, World Conference on Information Systems and Technologies, WorldCIST 2019, Galicia, Spain, 16-19 April, 2019

Abstract
Relation Extraction (RE) is part of Information Extraction (IE) and aims to obtain instances of semantic relations in textual documents. The countless possibilities of relations, the myriad of subjects, the difficulty in identifying emotions and the amount of unstructured and heterogeneous data, have challenged the researchers to define innovative and even more accurate methodologies. This paper presents the evaluation results obtained with a set of RE systems on identifying semantic relations in criminal police reports. We have evaluated different applications with documents in English and Portuguese. The results obtained give us useful insights to continue the research work, and to design the relation extraction system applied to related domain. © 2019, Springer Nature Switzerland AG.

CloseRead Abstract