Publications

Publications by LIAAD

2014

Data stream mining in ubiquitous environments: state-of-the-art and current directions

Authors
Gaber, MM; Gama, J; Krishnaswamy, S; Gomes, JB; Stahl, F;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
In this article, we review the state-of-the-art techniques in mining data streams for mobile and ubiquitous environments. We start the review with a concise background of data stream processing, presenting the building blocks for mining data streams. In a wide range of applications, data streams are required to be processed on small ubiquitous devices like smartphones and sensor devices. Mobile and ubiquitous data mining target these applications with tailored techniques and approaches addressing scarcity of resources and mobility issues. Two categories can be identified for mobile and ubiquitous mining of streaming data: single-node and distributed. This survey will cover both categories. Mining mobile and ubiquitous data require algorithms with the ability to monitor and adapt the working conditions to the available computational resources. We identify the key characteristics of these algorithms and present illustrative applications. Distributed data stream mining in the mobile environment is then discussed, presenting the Pocket Data Mining framework. Mobility of users stimulates the adoption of context-awareness in this area of research. Context-awareness and collaboration are discussed in the Collaborative Data Stream Mining, where agents share knowledge to learn adaptive accurate models. Conflict of interest: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the .

CloseRead Abstract

2014

Challenges in Learning from Streaming Data Extended Abstract

Authors
Gama, J;

Publication
ICT Innovations 2014 - World of Data, Ohrid, Macedonia, 1-4 October, 2014

Abstract
Machine learning studies automatic methods for acquisition of domain knowledge with the goal of improving systems performance as the result of experience. In the past two decades, machine learning research and practice has focused on batch learning usually with small data sets. The rationale behind this practice is that examples are generated at random accordingly to some stationary probability distribution. Most learners use a greedy, hill-climbing search in the space of models. They are prone to overfitting, local maximas, etc. Data are scarce and statistic estimates have high variance. A paradigmatic example is the TDIT algorithm to learn decision trees [14]. As the tree grows, less and fewer examples are available to compute the sufficient statistics, variance increase leading to model instability Moreover, the growing process re-uses the same data, exacerbating the overfitting problem. Regularization and pruning mechanisms are mandatory. © Springer International Publishing Switzerland 2015.

CloseRead Abstract

2014

Ensembles of Adaptive Model Rules from High-Speed Data Streams

Authors
Duarte, J; Gama, J;

Publication
Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine 2014, New York City, USA, August 24, 2014

Abstract
The volume and velocity of data is increasing at astonishing rates. In order to extract knowledge from this huge amount of information there is a need for efficient on-line learning algorithms. Rule-based algorithms produce models that are easy to understand and can be used almost offhand. Ensemble methods combine several predicting models to improve the quality of prediction. In this paper, a new on-line ensemble method that combines a set of rule-based models is proposed to solve regression problems from data streams. Experimental results using synthetic and real time-evolving data streams show the proposed method significantly improves the performance of the single rule-based learner, and outperforms two state-of-the-art regression algorithms for data streams.

CloseRead Abstract

2014

Keynote speakers

Authors
Gama, J;

Publication
IEEE Symposium on Computers and Communications, ISCC 2014, Funchal, Madeira, Portugal, June 23-26, 2014

Abstract

2014

High-resolution mapping of transcriptional dynamics across tissue development reveals a stable mRNA-tRNA interface

Authors
Schmitt, BM; Rudolph, KLM; Karagianni, P; Fonseca, NA; White, RJ; Talianidis, L; Odom, DT; Marioni, JC; Kutter, C;

Publication
GENOME RESEARCH

Abstract
The genetic code is an abstraction of how mRNA codons and tRNA anticodons molecularly interact during protein synthesis; the stability and regulation of this interaction remains largely unexplored. Here, we characterized the expression of mRNA and tRNA genes quantitatively at multiple time points in two developing mouse tissues. We discovered that mRNA codon pools are highly stable over development and simply reflect the genomic background; in contrast, precise regulation of tRNA gene families is required to create the corresponding tRNA transcriptomes. The dynamic regulation of tRNA genes during development is controlled in order to generate an anticodon pool that closely corresponds to messenger RNAs. Thus, across development, the pools of mRNA codons and tRNA anticodons are invariant and highly correlated, revealing a stable molecular interaction interlocking transcription and translation.

CloseRead Abstract

2014

AND Parallelism for ILP: The APIS System

Authors
Camacho, R; Ramos, R; Fonseca, NA;

Publication
INDUCTIVE LOGIC PROGRAMMING: 23RD INTERNATIONAL CONFERENCE

Abstract
Inductive Logic Programming (ILP) is a well known approach to Multi-Relational Data Mining. ILP systems may take a long time for analyzing the data mainly because the search (hypotheses) spaces are often very large and the evaluation of each hypothesis, which involves theorem proving, may be quite time consuming in some domains. To address these efficiency issues of ILP systems we propose the APIS (And ParallelISm for ILP) system that uses results from Logic Programming AND-parallelism. The approach enables the partition of the search space into sub-spaces of two kinds: sub-spaces where clause evaluation requires theorem proving; and sub-spaces where clause evaluation is performed quite efficiently without resorting to a theorem prover. We have also defined a new type of redundancy (Coverage-equivalent redundancy) that enables the prune of significant parts of the search space. The new type of pruning together with the partition of the hypothesis space considerably improved the performance of the APIS system. An empirical evaluation of the APIS system in standard ILP data sets shows considerable speedups without a lost of accuracy of the models constructed.

CloseRead Abstract