2016
Authors
Moreira Matias, L; Cats, O; Gama, J; Mendes Moreira, J; de Sousa, JF;
Publication
APPLIED SOFT COMPUTING
Abstract
Recent advances in telecommunications created new opportunities for monitoring public transport operations in real-time. This paper presents an automatic control framework to mitigate the Bus Bunching phenomenon in real-time. The framework depicts a powerful combination of distinct Machine Learning principles and methods to extract valuable information from raw location-based data. State-of-the-art tools and methodologies such as Regression Analysis, Probabilistic Reasoning and Perceptron's learning with Stochastic Gradient Descent constitute building blocks of this predictive methodology. The prediction's output is then used to select and deploy a corrective action to automatically prevent Bus Bunching. The performance of the proposed method is evaluated using data collected from 18 bus routes in Porto, Portugal over a period of one year. Simulation results demonstrate that the proposed method can potentially reduce bunching by 68% and decrease average passenger waiting times by 4.5%, without prolonging in-vehicle times. The proposed system could be embedded in a decision support system to improve control room operations. (C) 2016 Published by Elsevier B.V.
2016
Authors
Pinage, FA; dos Santos, EM; Portela da Gama, JMP;
Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Data mining and machine learning algorithms can be employed to perform a variety of tasks. However, since most of these problems may depend on environments that change over time, performing classification tasks in dynamic environments has been a challenge in data mining research domain in the last decades. Currently, in the literature, the most common strategies used to detect changes are based on accuracy monitoring, which relies on previous knowledge of the data in order to identify whether or not correct classifications are provided. However, such a feedback can be infeasible in practical problems. In this work, we present a comprehensive overview of current machine learning/data mining approaches proposed to deal with dynamic environments problems. The objective is to highlight the main drawbacks and open issues, as well as future directions and problems worthy of investigation. In addition, we provide the definitions of the main terms used to represent this problem in the literature, such as concept drift and novelty detection. WIREs Data Mining Knowl Discov 2016, 6:156-166. doi: 10.1002/widm.1184 For further resources related to this article, please visit the .
2016
Authors
Moreira Matias, L; Gama, J; Mendes Moreira, J;
Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III
Abstract
Learning from data streams is a challenge faced by data science professionals from multiple industries. Most of them struggle hardly on applying traditional Machine Learning algorithms to solve these problems. It happens so due to their high availability on ready-to-use software libraries on big data technologies (e.g. SparkML). Nevertheless, most of them cannot cope with the key characteristics of this type of data such as high arrival rate and/or non-stationary distributions. In this paper, we introduce a generic and yet simplistic framework to fill this gap denominated Concept Neurons. It leverages on a combination of continuous inspection schemas and residual-based updates over the model parameters and/or the model output. Such framework can empower the resistance of most of induction learning algorithms to concept drifts. Two distinct and hence closely related flavors are introduced to handle different drift types. Experimental results on successful distinct applications on different domains along transportation industry are presented to uncover the hidden potential of this methodology.
2016
Authors
Cordeiro, M; Sarmento, RP; Gama, J;
Publication
SOCIAL NETWORK ANALYSIS AND MINING
Abstract
The amount and the variety of data generated by today's online social and telecommunication network services are changing the way researchers analyze social networks. Facing fast evolving networks with millions of nodes and edges are, among other factors, its main challenge. Community detection algorithms in these conditions have also to be updated or improved. Previous state-of-the-art algorithms based on the modularity optimization (i.e. Louvain algorithm), provide fast, efficient and robust community detection on large static networks. Nonetheless, due to the high computing complexity of these algorithms, the use of batch techniques in dynamic networks requires to perform network community detection for the whole network in each one of the evolution steps. This fact reveals to be computationally expensive and unstable in terms of tracking of communities. Our contribution is a novel technique that maintains the community structure always up-to-date following the addition or removal of nodes and edges. The proposed algorithm performs a local modularity optimization that maximizes the modularity gain function only for those communities where the editing of nodes and edges was performed, keeping the rest of the network unchanged. The effectiveness of our algorithm is demonstrated with the comparison to other state-of-the-art community detection algorithms with respect to Newman's Modularity, Modularity with Split Penalty, Modularity Density, number of detected communities and running time.
2016
Authors
Fanaee T, H; Gama, J;
Publication
NEUROCOMPUTING
Abstract
A traffic tensor or simply origin x destination x time is a new data model for conventional origin/destination (O/D) matrices. Tensor models are traffic data analysis techniques which use this new data model to improve performance. Tensors outperform other models because both temporal and spatial fluctuations of traffic patterns are simultaneously taken into account, obtaining results that follow a more natural pattern. Three major types of fluctuations can occur in traffic tensors: mutations to the overall traffic flows, alterations to the network topology and chaotic behaviors. How can we detect events in a system that is faced with all types of fluctuations during its life cycle? Our initial studies reveal that the current design of tensor models face some difficulties in dealing with such a realistic scenario. We propose a new hybrid tensor model called HTM that enhances the detection ability of tensor models by using a parallel tracking technique on the traffic's topology. However, tensor decomposition techniques such as Tucker, a key step for tensor models, require a complicated parameter that not only is difficult to choose but also affects the model's quality. We address this problem examining a recent technique called adjustable core size Tucker decomposition (ACS-Tucker). Experiments on simulated and real-world data sets from different domains versus several techniques indicate that the proposed model is effective and robust, therefore it constitutes a viable alternative for analysis of the traffic tensors.
2016
Authors
Morales, GDF; Bifet, A; Khan, L; Gama, J; Fan, W;
Publication
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016
Abstract
The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in IoT stream mining. This tutorial is a gentle introduction to mining IoT big data streams. The first part introduces data stream learners for classification, regression, clustering, and frequent pattern mining. The second part deals with scalability issues inherent in IoT applications, and discusses how to mine data streams on distributed engines such as Spark, Flink, Storm, and Samza. © 2016 Copyright held by the owner/author(s).
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.