Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2015

Eigenspace method for spatiotemporal hotspot detection

Autores
Fanaee T, H; Gama, J;

Publicação
EXPERT SYSTEMS

Abstract
Hotspot detection aims at identifying sub-groups in the observations that are unexpected, with respect to some baseline information. For instance, in disease surveillance, the purpose is to detect sub-regions in spatiotemporal space, where the count of reported diseases (e.g. cancer) is higher than expected, with respect to the population. The state-of-the-art method for this kind of problem is the space-time scan statistics, which exhaustively search the whole space through a sliding window looking for significant spatiotemporal clusters. Space-time scan statistics makes some restrictive assumptions about the distribution of data, the shape of the hotspots and the quality of data, which can be unrealistic for some non-traditional data sources. A novel methodology called EigenSpot is proposed where instead of an exhaustive search over the space, it tracks the changes in a space-time occurrences structure. The new approach does not only present much more computational efficiency but also makes no assumption about the data distribution, hotspot shape or the data quality. The principal idea is that with the joint combination of abnormal elements in the principal spatial and the temporal singular vectors, the location of hotspots in the spatiotemporal space can be approximated. The experimental evaluation, both on simulated and real data sets, reveals the effectiveness of the proposed method.

FecharLer Abstract

2015

Exploring multi-relational temporal databases with a propositional sequence miner

Autores
Ferreira, CA; Gama, J; Costa, VS;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
In this work, we introduce the MuSer, a propositional framework that explores temporal information available in multi-relational databases. At the core of this system is an encoding technique that translates the temporal information into a propositional sequence of events. By using this technique, we are able to explore the temporal information using a propositional sequence miner. With this framework, we mine each class partition individually and we do not use classical aggregation strategies, like window aggregation. Moreover, in this system we combine feature selection and propositionalization techniques to cast a multi-relational classification problem into a propositional one. We empirically evaluate the MuSer framework using two real databases. The results show that mining each partition individually is a time-and memory-efficient strategy that generates a high number of highly discriminative patterns.

FecharLer Abstract

2015

Multi-aspect-streaming tensor analysis

Autores
Fanaee T, H; Gama, J;

Publicação
KNOWLEDGE-BASED SYSTEMS

Abstract
Tensor analysis is a powerful tool for multiway problems in data mining, signal processing, pattern recognition and many other areas. Nowadays, the most important challenges in tensor analysis are efficiency and adaptability. Still, the majority of techniques are not scalable or not applicable in streaming settings. One of the promising frameworks that simultaneously addresses these two issues is Incremental Tensor Analysis (ITA) that includes three variants called Dynamic Tensor Analysis (DTA), Streaming Tensor Analysis (STA) and Window-based Tensor Analysis (WTA). However, ITA restricts the tensor's growth only in time, which is a huge constraint in scalability and adaptability of other modes. We propose a new approach called multi-aspect-streaming tensor analysis (MASTA) that relaxes this constraint and allows the tensor to concurrently evolve through all modes. The new approach, which is developed for analysis-only purposes, instead of relying on expensive linear algebra techniques is founded on the histogram approximation concept. This consequently brought simplicity, adaptability, efficiency and flexibility to the tensor analysis task. The empirical evaluation on various data sets from several domains reveals that MASTA is a potential technique with a competitive value against ITA algorithms.

FecharLer Abstract

2015

Improving Mass Transit Operations by Using AVL-Based Systems: A Survey

Autores
Moreira Matias, L; Mendes Moreira, J; de Sousa, JF; Gama, J;

Publicação
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Intelligent transportation systems based on automated data collection frameworks are widely used by the major transit companies around the globe. This paper describes the current state of the art on improving both planning and control on public road transportation companies using automatic vehicle location (AVL) data. By surveying this topic, the expectation is to help develop a better understanding of the nature, approaches, challenges, and opportunities with regard to these problems. This paper starts by presenting a brief review on improving the network definition based on historical location-based data. Second, it presents a comprehensive review on AVL-based evaluation techniques of the schedule plan (SP) reliability, discussing the existing metrics. Then, the different dimensions on improving the SP reliability are presented in detail, as well as the works addressing such problem. Finally, the automatic control strategies are also revised, along with the research employed over the location-based data. A comprehensive discussion on the techniques employed is provided to encourage those who are starting research on this topic. It is important to highlight that there are still gaps in AVL-based literature, such as the following: 1) long-term travel time prediction; 2) finding optimal slack time; or 3) choosing the best control strategy to apply in each situation in the event of schedule instability. Hence, this paper includes introductory model formulations, reference surveys, formal definitions, and an overview of a promising area, which is of interest to any researcher, regardless of the level of expertise.

FecharLer Abstract

2015

Visualization for streaming telecommunications networks

Autores
Sarmento, R; Cordeiro, M; Gama, J;

Publicação
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

Abstract
Regular services in telecommunications produce massive volumes of relational data. In this work the data produced in telecommunications is seen as a streaming network, where clients are the nodes and phone calls are the edges. Visualization techniques are required for exploratory data analysis and event detection. In social network visualization and analysis the goal is to get more information from the data taking into account actors at the individual level. Previous methods relied on aggregating communities, k-Core decompositions and matrix feature representations to visualize and analyse the massive network data. Our contribution is a group visualization and analysis technique of influential actors in the network by sampling the full network with a top-k representation of the network data stream. © Springer International Publishing 2015.

FecharLer Abstract

2015

Validating the coverage of bus schedules: A Machine Learning approach

Autores
Mendes Moreira, J; Moreira Matias, L; Gama, J; de Sousa, JF;

Publicação
INFORMATION SCIENCES

Abstract
Nowadays, every public transportation company uses Automatic Vehicle Location (AVL) systems to track the services provided by each vehicle. Such information can be used to improve operational planning. This paper describes an AVL-based evaluation framework to test whether the actual Schedule Plan fits, in terms of days covered by each schedule, the network's operational conditions. Firstly, clustering is employed to group days with similar profiles in terms of travel times (this is done for each different route). Secondly, consensus clustering is used to obtain a unique set of clusters for all routes. Finally, a set of rules about the groups content is drawn based on appropriate decision variables. Each group will correspond to a different schedule and the rules identify the days covered by each schedule. This methodology is simultaneously an evaluator of the schedules that are offered by the company (regarding its coverage) and an advisor on possible changes to such offer. It was tested by using data collected for one year in a company running in Porto, Portugal. The results are sound. The main contribution of this paper is that it proposes a way to combine Machine Learning techniques to add a novel dimension to the Schedule Plan evaluation methods: the day coverage. Such approach meets no parallel in the current literature.

FecharLer Abstract