Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2012

A survey on learning from data streams: current and future trends

Authors
Gama, J;

Publication
Progress in AI

Abstract
Nowadays, there are applications in which the data are modeled best not as persistent tables, but rather as transient data streams. In this article, we discuss the limitations of current machine learning and data mining algorithms. We discuss the fundamental issues in learning in dynamic environments like continuously maintain learning models that evolve over time, learning and forgetting, concept drift and change detection. Data streams produce a huge amount of data that introduce new constraints in the design of learning algorithms: limited computational resources in terms of memory, cpu power, and communication bandwidth. We present some illustrative algorithms, designed to taking these constrains into account, for decision-tree learning, hierarchical clustering and frequent pattern mining. We identify the main issues and current challenges that emerge in learning from data streams that open research lines for further developments. © 2011 Springer-Verlag.

2011

Ubiquitous Knowledge Discovery Introduction

Authors
Gama, J; May, M;

Publication
INTELLIGENT DATA ANALYSIS

Abstract

2010

Validation of both number and coverage of bus schedules using AVL data

Authors
Matias, L; Gama, J; Moreira, JM; de Sousa, JF;

Publication
13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Madeira, Portugal, 19-22 September 2010

Abstract
It is well known that the definition of bus schedules is critical for the service reliability of public transports. Several proposals have been suggested, using data from Automatic Vehicle Location (AVL) systems, in order to enhance the reliability of public transports. In this paper we study the optimum number of schedules and the days covered by each one of them, in order to increase reliability. We use the Dynamic Time Warping distance in order to calculate the similarities between two different dimensioned irregularly spaced data sequences before the use of data clustering techniques. The application of this methodology with the K-Means for a specific bus route demonstrated that a new schedule for the weekends in non-scholar periods could be considered due to its distinct profile from the remaining days. For future work, we propose to apply this methodology to larger data sets in time and in number, corresponding to different bus routes, in order to find a consensual cluster between all the routes. ©2010 IEEE.

2008

Knowledge discovery from sensor data

Authors
Ganguly, AR; Gama, J; Omitaomu, OA; Gaber, MM; Vatsavai, RR;

Publication
Knowledge Discovery from Sensor Data

Abstract
As sensors become ubiquitous, a set of broad requirements is beginning to emerge across high-priority applications including disaster preparedness and management, adaptability to climate change, national or homeland security, and the management of critical infrastructures. This book presents innovative solutions in offline data mining and real-time analysis of sensor or geographically distributed data. It discusses the challenges and requirements for sensor data based knowledge discovery solutions in high-priority application illustrated with case studies. It explores the fusion between heterogeneous data streams from multiple sensor types and applications in science, engineering, and security. © 2009 by Taylor & Francis Group, LLC.

2008

Introduction

Authors
Ganguly, AR; Gama, J; Omitaomu, OA; Gaber, MM; Vatsavai, RR;

Publication
Knowledge Discovery from Sensor Data

Abstract

2007

OLINDDA: A cluster-based approach for detecting novelty and concept drift in data streams

Authors
Spinosa, EJ; de Carvalho, APDF; Gama, J;

Publication
APPLIED COMPUTING 2007, VOL 1 AND 2

Abstract
A machine learning approach that is capable of treating data streams presents new challenges and enables the analysis of a variety of real problems in which concepts change over time. In this scenario, the ability to identify novel concepts as well as to deal with concept drift axe two important attributes. This paper presents a technique based on the k-means clustering algorithm aimed at considering those two situations in a single learning strategy. Experimental results performed with data from various domains provide insight into how clustering algorithms can be used for the discovery of new concepts in streams of data.

  • 64
  • 97