Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2013

Contextual Anomalies in Medical Data

Autores
Vasco, D; Rodrigues, PP; Gama, J;

Publicação
2013 IEEE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Anomalies in data can cause a lot of problems in the data analysis processes. Thus, it is necessary to improve data quality by detecting and eliminating errors and inconsistencies in the data, known as the data cleaning process [1]. Since detection and correction of anomalies requires detailed domain knowledge, the involvement of experts in the field is essential to the success of the process of cleaning the data. However, considering the size of data to be processed, this process should be as automatic as possible so as to minimize the time spent [1]. © 2013 IEEE.

FecharLer Abstract

2016

Evolving Centralities in Temporal Graphs: A Twitter Network Analysis

Autores
Pereira, FSF; Amo, Sd; Gama, J;

Publicação
MDM (Workshops)

Abstract

2015

Improving Mass Transit Operations by Using AVL-Based Systems: A Survey

Autores
Moreira Matias, L; Mendes Moreira, J; de Sousa, JF; Gama, J;

Publicação
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Intelligent transportation systems based on automated data collection frameworks are widely used by the major transit companies around the globe. This paper describes the current state of the art on improving both planning and control on public road transportation companies using automatic vehicle location (AVL) data. By surveying this topic, the expectation is to help develop a better understanding of the nature, approaches, challenges, and opportunities with regard to these problems. This paper starts by presenting a brief review on improving the network definition based on historical location-based data. Second, it presents a comprehensive review on AVL-based evaluation techniques of the schedule plan (SP) reliability, discussing the existing metrics. Then, the different dimensions on improving the SP reliability are presented in detail, as well as the works addressing such problem. Finally, the automatic control strategies are also revised, along with the research employed over the location-based data. A comprehensive discussion on the techniques employed is provided to encourage those who are starting research on this topic. It is important to highlight that there are still gaps in AVL-based literature, such as the following: 1) long-term travel time prediction; 2) finding optimal slack time; or 3) choosing the best control strategy to apply in each situation in the event of schedule instability. Hence, this paper includes introductory model formulations, reference surveys, formal definitions, and an overview of a promising area, which is of interest to any researcher, regardless of the level of expertise.

FecharLer Abstract

2013

Random rules from data streams

Autores
Almeida, E; Kosina, P; Gama, J;

Publicação
SAC

Abstract
Existing works suggest that random inputs and random features produce good results in classification. In this paper we study the problem of generating random rule sets from data streams. One of the most interpretable and flexible models for data stream mining prediction tasks is the Very Fast Decision Rules learner (VFDR). In this work we extend the VFDR algorithm using random rules from data streams. The proposed algorithm generates several sets of rules. Each rule set is associated with a set of Natt attributes. The proposed algorithm maintains all properties required when learning from stationary data streams: online and any-time classification, processing each example once. Copyright 2013 ACM.

FecharLer Abstract

2017

Acute Kidney Injury Detection: An Alarm System to Improve Early Treatment

Autores
Nogueira, AR; Ferreira, CA; Gama, J;

Publicação
ISMIS

Abstract
This work aims to help in the correct and early diagnosis of the acute kidney injury, through the application of data mining techniques. The main goal is to be implemented in Intensive Care Units (ICUs) as an alarm system, to assist health professionals in the diagnosis of this disease. These techniques will predict the future state of the patients, based on his current medical state and the type of ICU. Through the comparison of three different approaches (Markov Chain Model, Markov Chain Model ICU Specialists and Random Forest), we came to the conclusion that the best method is the Markov Chain Model ICU Specialists.

FecharLer Abstract

2013

Avoiding Anomalies in Data Stream Learning

Autores
Gama, J; Kosina, P; Almeida, E;

Publicação
DISCOVERY SCIENCE

Abstract
The presence of anomalies in data compromises data quality and can reduce the effectiveness of learning algorithms. Standard data mining methodologies refer to data cleaning as a pre-processing before the learning task. The problem of data cleaning is exacerbated when learning in the computational model of data streams. In this paper we present a streaming algorithm for learning classification rules able to detect contextual anomalies in the data. Contextual anomalies are surprising attribute values in the context defined by the conditional part of the rule. For each example we compute the degree of anomaliness based on the probability of the attribute-values given the conditional part of the rule covering the example. The examples with high degree of anomaliness are signaled to the user and not used to train the classifier. The experimental evaluation in real-world data sets shows the ability to discover anomalous examples in the data. The main advantage of the proposed method is the ability to inform the context and explain why the anomaly occurs.

FecharLer Abstract