Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2016

PAMPO: using pattern matching and pos-tagging for effective Named Entities recognition in Portuguese

Authors
Rocha, Conceicao; Jorge, Alipio; Sionara, Roberta; Brito, Paula; Pimenta, Carlos; Rezende, SolangeO.;

Publication
CoRR

Abstract

2016

Detection of Fraud Symptoms in the Retail Industry

Authors
Ribeiro, RP; Oliveira, R; Gama, J;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2016

Abstract
Data mining is one of the most effective methods for fraud detection. This is highlighted by 25% of organizations that have suffered from economic crimes [1]. This paper presents a case study using real-world data from a large retail company. We identify symptoms of fraud by looking for outliers. To identify the outliers and the context where outliers appear, we learn a regression tree. For a given node, we identify the outliers using the set of examples covered at that node, and the context as the conjunction of the conditions in the path from the root to the node. Surprisingly, at different nodes of the tree, we observe that some outliers disappear and new ones appear. From the business point of view, the outliers that are detected near the leaves of the tree are the most suspicious ones. These are cases of difficult detection, being observed only in a given context, defined by a set of rules associated with the node.

2016

Hierarchical time series forecast in electrical grids

Authors
Almeida, V; Ribeiro, R; Gama, J;

Publication
Lecture Notes in Electrical Engineering

Abstract
Hierarchical time series is a first order of importance topic. Effectively, there are several applications where time series can be naturally disaggregated in a hierarchical structure using attributes such as geographical location, product type, etc. Power networks face interesting problems related to its transition to computer-aided grids. Data can be naturally disaggregated in a hierarchical structure, and there is the possibility to look for both single and aggregated points along the grid. Along this work, we applied different hierarchical forecasting methods to them. Three different approaches are compared, two common approaches, bottom-up approach, top-down approach and another one based on the hierarchical structure of data, the optimal regression combination. The evaluation considers short-term forecasting (24-h ahead). Additionally,we discussed the importance associated to the correlation degree among series to improve forecasting accuracy. Our results demonstrated that the hierarchical approach outperforms bottom-up approach at intermediate/high levels. At lower levels, it presents a superior performance in less homogeneous substations, i. e. for the substations linked to different type of customers. Additionally, its performance is comparable to the top-down approach at top levels. This approach revealed to be an interesting tool for hierarchical data analysis. It allows to achieve a good performance at top levels as the top-down approach and at same time it allows to capture series dynamics at bottom levels as the bottom-up. © Springer Science+Business Media Singapore 2016.

2016

Sequential anomalies: a study in the Railway Industry

Authors
Ribeiro, RP; Pereira, P; Gama, J;

Publication
MACHINE LEARNING

Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.

2016

A Survey of Predictive Modeling on Im balanced Domains

Authors
Branco, P; Torgo, L; Ribeiro, RP;

Publication
ACM COMPUTING SURVEYS

Abstract
Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

2016

UBL: an R package for Utility-based Learning

Authors
Branco, P; Ribeiro, RP; Torgo, L;

Publication
CoRR

Abstract

  • 267
  • 497