Publications

Publications by LIAAD

2005

Introduction

Authors
Gama, J; Pires, JM; Cardoso, M; Marques, NC; Cavique, L;

Publication
Progress in Artificial Intelligence, 12th Portuguese Conference on Artificial Intelligence, EPIA 2005, Covilhã, Portugal, December 5-8, 2005, Proceedings

Abstract

2005

Lecture Notes in Artificial Intelligence: Introduction

Authors
Gama, J; Moura Pires, J; Cardoso, M; Marques, NC; Cavique, L;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2005

Extracting knowledge from databases and warehouses (EKDB&W 2005) - Introduction

Authors
Gama, J; Moura Pires, J; Cardoso, M; Marques, NC; Cavique, L;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract

2005

Learning decision trees from dynamic data streams

Authors
Gama, J; Medas, P;

Publication
JOURNAL OF UNIVERSAL COMPUTER SCIENCE

Abstract
This paper presents a system for induction of forest of functional trees from data streams able to detect concept drift. The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, which works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. For multi-class problems the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples. Naive-Bayes in inner nodes play two different roles. They can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect changes in the class-distribution of the examples that traverse the node. When a change in the class-distribution is detected, all the sub-tree rooted at that node will be pruned. The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect changes in the distribution of the examples are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations. This aspect is a main advantage in the context of high-speed data streams. This methodology was tested with artificial and real-world data sets. The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect drift in the distribution of the examples.

CloseRead Abstract

2005

Data streams - J.UCS special issue

Authors
Aguilar Ruiz, JS; Gama, J;

Publication
JOURNAL OF UNIVERSAL COMPUTER SCIENCE

Abstract

2005

On predicting protein secondary structure from their aminoacid sequences using Inductive Logic Programming

Authors
Magalhaes, A; Fonseca, NA;

Publication
2005 PORTUGUESE CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
We address the problem of predicting the stability of secondary structure motifs of proteins given their linear sequence of residues. Our study is restricted to the prediction of helix structures. We have applied an Inductive Logic Programming (ILP) system to automatically synthesise the predictive rules. ILP systems are well known for being able to induce comprehensible models for data. Furthermore, the models components are definitions provided by a domain expert which makes the model more likely to be helpful in the understanding of the underlying process that produced the data. Our methodology has two stages. First, the system induces a model (set of rules) using just structural information and groupings of the residues to avoid biases by the domain expert. In the second stage, the residues properties are used to make the induced rules Chemically/Biologically appealing. We claim that this methodology is also valuable for general Structure-Activity Relationship (SAR) problems.

CloseRead Abstract