Publications

Publications by LIAAD

2009

UbiDis: a Flexible and General top-level Middleware to Manage Applications in Grids and Clusters

Authors
Fonseca, NA; Dutra, I;

Publication
IBERGRID: 3RD IBERIAN GRID INFRASTRUCTURE CONFERENCE PROCEEDINGS

Abstract
From an application point of view, the Grid computing with its powerful processing power and large amounts of data storage offers the possibility to process large quantities of data, to run computationally-intensive operations, or both. For instance, in computational biological pipelines, one often has to process large quantities of data in individually computationally-intensive operations. To process this data in the Grid, hundreds, or even thousands of jobs need to be submitted and their results processed. Obviously, performing these tasks manually is unfeasible. On the other hand, developing software to this end, specifically for a single application, is unproductive because if the application changes, or the Grid submission engine changes, then the code needs to be rewritten. In this paper we present a middleware that facilitates the submission of jobs to grids (or clusters) and helps handling their results. The middleware, that we call UbiDis (Ubiquitous Distribution), copies all files necessary for running the program to the UI or front-end host (in a Grid or cluster), compiles programs on the UI or front-end (if necessary), generates and submits the jobs, and copies the outputs to the local machine. Furthermore, UbiDis transparently generates jobs to different job managers, allowing the user to easily and quickly change the location to where the jobs are submitted. Finally, we illustrate the usefulness of UbiDis using two applications.

CloseRead Abstract

2009

Comparative Study of Classification Algorithms Using Molecular Descriptors in Toxicological DataBases

Authors
Pereira, M; Costa, VS; Camacho, R; Fonseca, NA; Simoes, C; Brito, RMM;

Publication
ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS

Abstract
The rational development of new drugs is a complex and expensive process, comprising several steps. Typically, it starts by screening databases of small organic molecules for chemical structures with potential of binding to a target receptor and prioritizing the most promising ones. Only a few of these will be selected for biological evaluation and further refinement through chemical synthesis. Despite the accumulated knowledge by pharmaceutical companies that continually improve the process of finding new drugs, a myriad of factors affect the activity of putative candidate molecules in vivo and the propensity for causing adverse and toxic effects is recognized as the major hurdle behind the current "target-rich, lead-poor" scenario. In this study we evaluate the use of several Machine Learning algorithms to find useful rules to the elucidation and prediction of toxicity using ID and 2D molecular descriptors. The results indicate that: i) Machine Learning algorithms can effectively use ID molecular descriptors to construct accurate and simple models; ii) extending the set of descriptors to include 2D descriptors improve the accuracy of the models.

CloseRead Abstract

2009

Visually Guiding and Controlling the Search While Mining Chemical Structures

Authors
Pereira, M; Costa, VS; Camacho, R; Fonseca, NA;

Publication
DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS

Abstract
In this paper we present the work in progress on LogCHEM, an ILP based tool for discriminative interactive mining of chemical fragments. In particular, we describe the integration with a molecule visualisation software that allows the chemist to graphically control the search for interesting patterns in chemical fragments. Furthermore, we show how structured information, such as rings, functional groups like carboxyl, amine, methyl, ester, etc are integrated and exploited in LogCHEM.

CloseRead Abstract

2009

Partitional Clustering of Protein Sequences - An Inductive Logic Programming Approach

Authors
Fonseca, NA; Costa, VS; Camacho, R; Vieira, C; Vieira, J;

Publication
DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS

Abstract
We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that; the method proposed Produces understand able descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs.

CloseRead Abstract

2009

BIORED - A Genetic Algorithm for Pattern Detection in Biosequences

Authors
Pereira, P; Silva, F; Fonseca, NA;

Publication
2ND INTERNATIONAL WORKSHOP ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (IWPACBB 2008)

Abstract
We present a new, efficient and scalable tool, named BIORED, for pattern discovery in proteomic and genomic sequences. It uses a genetic algorithm to find interesting patterns in the form of regular expressions, and a new efficient pattern matching procedure to count pattern occurrences. We studied the performance, scalability and usefulness of BIORED using several databases of biosequences. The results show that BIORED was successful in finding previously known patterns, thus an excellent indicator for its potential. BIORED is available for download under the GNU Public License at http://www.dcc.fc.up.pt/bi-ored/. An online demo is available at the same address.

CloseRead Abstract

2009

Improving the efficiency of inductive logic programming systems

Authors
Fonseca, NA; Costa, VS; Rocha, R; Camacho, R; Silva, F;

Publication
SOFTWARE-PRACTICE & EXPERIENCE

Abstract
Inductive logic programming (ILP) is a sub-field of machine learning that provides an excellent framework for multi-relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well-known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright (C) 2008 John Wiley & Sons, Ltd.

CloseRead Abstract