Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2017

Improving Incremental Recommenders with Online Bagging

Autores
Vinagre, J; Jorge, AM; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Online recommender systems often deal with continuous, potentially fast and unbounded flows of data. Ensemble methods for recommender systems have been used in the past in batch algorithms, however they have never been studied with incremental algorithms that learn from data streams. We evaluate online bagging with an incremental matrix factorization algorithm for top-N recommendation with positiveonly user feedback, often known as binary ratings. Our results show that online bagging is able to improve accuracy up to 35% over the baseline, with small computational overhead.

FecharLer Abstract

2016

Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach

Autores
Colonna, JG; Gama, J; Nakamura, EF;

Publicação
DISCOVERY SCIENCE, (DS 2016)

Abstract
In bioacoustic recognition approaches, a "flat" classifier is usually trained to recognize several species of anuran, where the number of classes is equal to the number of species. Consequently, the complexity of the classification function increases proportionally to the amount of species. To avoid this issue we propose a "hierarchical" approach that decomposes the problem into three taxonomic levels: the family, the genus, and the species level. To accomplish this, we transform the original single-label problem into a multi-dimensional problem (multi-label and multi-class) considering the Linnaeus taxonomy. Then, we develop a top-down method using a set of classifiers organized as a hierarchical tree. Thus, it is possible to predict the same set of species as a flat classifier, and additionally obtain new information about the samples and their taxonomic relationship. This helps us to understand the problem better and achieve additional conclusions by the inspection of the confusion matrices at the three levels of classification. In addition, we carry out our experiments using a Cross-Validation performed by individuals. This form of CV avoids mixing syllables that belong to the same specimens in the testing and training sets, preventing an overestimate of the accuracy and generalizing the predictive capabilities of the system. We tested our system in a dataset with sixty individual frogs, from ten different species, eight genus, and four families, achieving a final Micro-and Average-accuracy equal to 86% and 62% respectively.

FecharLer Abstract

2016

Sequential anomalies: a study in the Railway Industry

Autores
Ribeiro, RP; Pereira, P; Gama, J;

Publicação
MACHINE LEARNING

Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.

FecharLer Abstract

2016

Adaptive Model Rules From High-Speed Data Streams

Autores
Duarte, J; Gama, J; Bifet, A;

Publicação
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA

Abstract
Decision rules are one of the most expressive and interpretable models for machine learning. In this article, we present Adaptive Model Rules (AMRules), the first stream rule learning algorithm for regression problems. In AMRules, the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. In order to maintain a regression model compatible with the most recent state of the process generating data, each rule uses a Page-Hinkley test to detect changes in this process and react to changes by pruning the rule set. Online learning might be strongly affected by outliers. AMRules is also equipped with outliers detection mechanisms to avoid model adaption using anomalous examples. In the experimental section, we report the results of AMRules on benchmark regression problems, and compare the performance of our system with other streaming regression algorithms.

FecharLer Abstract

2015

An overview on the exploitation of time in collaborative filtering

Autores
Vinagre, J; Jorge, AM; Gama, J;

Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Classic Collaborative Filtering (CF) algorithms rely on the assumption that data are static and we usually disregard the temporal effects in natural user-generated data. These temporal effects include user preference drifts and shifts, seasonal effects, inclusion of new users, and items entering the systemand old ones leavinguser and item activity rate fluctuations and other similar time-related phenomena. These phenomena continuously change the underlying relations between users and items that recommendation algorithms essentially try to capture. In the past few years, a new generation of CF algorithms has emerged, using the time dimension as a key factor to improve recommendation models. In this overview, we present a comprehensive analysis of these algorithms and identify important challenges to be faced in the near future.(C) 2015 John Wiley & Sons, Ltd.

FecharLer Abstract

2014

Collaborative Wind Power Forecast

Autores
Almeida, V; Gama, J;

Publicação
ADAPTIVE AND INTELLIGENT SYSTEMS, ICAIS 2014

Abstract
There are several new emerging environments, generating data spatially spread and interrelated. These applications reinforce the importance of the development of analytical systems capable to sense the environment and receive data from different locations. In this study we explore collaborative methodologies in a real-world problem: wind power prediction. Wind power is considered one of the most rapidly growing sources of electricity generation all over the world. The problem consists of monitoring a network of wind farms that collaborate by sharing information in a very short-term forecasting problem. We use an auto-regressive integrated moving average (ARIMA) model. The Symbolic Aggregate Approximation (SAX) is used in the selection of the set of neighbours. We propose two collaborative methods. The first one, based on a centralized management, exchange data-points between nodes. In the second approach, correlated wind farms share their own ARIMA models. In the experimental work we use 1 year data from 16 wind farms. The goal is to predict the energy produced at each farm every hour in the next 6 hours. We compare the proposed methods against ARIMA models trained with data of each one of the farms and with the persistence model at each farm. We observe a small but consistent reduction of the root mean square error (RMSE) of the predictions.

FecharLer Abstract