Publications

Publications by Rui Camacho

2012

From Networks to Trees

Authors
Alves, M; Alves, J; Camacho, R; Soares, P; Pereira, L;

Publication
6TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS

Abstract
Phylogenetic networks are a useful way of displaying relationships between nucleotide or protein sequences. They diverge from phylogenetic trees as networks present cycles, several possible evolutionary histories of the sequences analysed, while a tree presents a single evolutionary relationship. Networks are especially useful in studying markers with a high level of homoplasy (same mutation happening more than once during evolution) like the control region of mitochondrial DNA (mtDNA), where the researcher does not need to compromise with a single explanation for the evolution suggested by the data. However in many instances, trees are required. One case where this happens is in the founder analysis methodology that aims at estimating migration times of human populations along history and prehistory. Currently, the founder analysis methodology implicates the creation of networks, from where a probable tree will be extracted by hand by the researcher, a time-consuming process, prone to errors and to the ambiguous decisions of the researcher. In order to automate the founder analysis methodology an algorithm that extracts a single probable tree from a network in a fast, systematic way is presented here.

CloseRead Abstract

1999

Numerical reasoning with an ILP system capable of lazy evaluation and customised search

Authors
Srinivasan, A; Camacho, R;

Publication
JOURNAL OF LOGIC PROGRAMMING

Abstract
Using problem-specific background knowledge, computer programs developed within the framework of Inductive Logic Programming (ILP) have been used to construct restricted first-order logic solutions to scientific problems. However, their approach to the analysis of data with substantial numerical content has been largely limited to constructing clauses that: (a) provide qualitative descriptions ("high", "low" etc.) of the values of response variables; and (b) contain simple inequalities restricting the ranges of predictor variables. This has precluded the application of such techniques to scientific and engineering problems requiring a more sophisticated approach. A number of specialised methods have been suggested to remedy this. In contrast, we have chosen to take advantage of the fact that the existing theoretical framework for ILP places very few restrictions of the nature of the background knowledge. We describe two issues of implementation that make it possible to use background predicates that implement well-established statistical and numerical analysis procedures. Any improvements in analytical sophistication that result are evaluated empirically using artificial and real-life data. Experiments utilising artificial data are concerned with extracting constraints for response variables in the text-book problem of balancing a pole on a cart. They illustrate the use of clausal definitions of arithmetic and trigonometric functions, inequalities, multiple linear regression, and numerical derivatives. A non-trivial problem concerning the prediction of mutagenic activity of nitroaromatic molecules is also examined. In this case, expert chemists have been unable to devise a model for explaining the data. The result demonstrates the combined use by an ILP program of logical and numerical capabilities to achieve an analysis that includes linear modelling, clustering and classification. In all experiments, the predictions obtained compare favourably against benchmarks set by more traditional methods of quantitative methods, namely, regression and neural-network.

CloseRead Abstract

2006

Quantitative pharmacophore models with inductive logic programming

Authors
Srinivasan, A; Page, D; Camacho, R; King, R;

Publication
MACHINE LEARNING

Abstract
Three-dimensional models, or pharmacophores, describing Euclidean constraints on the location on small molecules of functional groups (like hydrophobic groups, hydrogen acceptors and donors, etc.), are often used in drug design to describe the medicinal activity of potential drugs (or 'ligands'). This medicinal activity is produced by interaction of the functional groups on the ligand with a binding site on a target protein. In identifying structure-activity relations of this kind there are three principal issues: (1) It is often difficult to "align" the ligands in order to identify common structural properties that may be responsible for activity; (2) Ligands in solution can adopt different shapes (or 'conformations') arising from torsional rotations about bonds. The 3-D molecular substructure is typically sought on one or more low-energy conformers; and (3) Pharmacophore models must, ideally, predict medicinal activity on some quantitative scale. It has been shown that the logical representation adopted by Inductive Logic Programming (ILP) naturally resolves many of the difficulties associated with the alignment and multi-conformation issues. However, the predictions of models constructed by ILP have hitherto only been nominal, predicting medicinal activity to be present or absent. In this paper, we investigate the construction of two kinds of quantitative pharmacophoric models with ILP: (a) Models that predict the probability that a ligand is "active"; and (b) Models that predict the actual medicinal activity of a ligand. Quantitative predictions are obtained by the utilising the following statistical procedures as background knowledge: logistic regression and naive Bayes, for probability prediction; linear and kernel regression, for activity prediction. The multi-conformation issue and, more generally, the relational representation used by ILP results in some special difficulties in the use of any statistical procedure. We present the principal issues and some solutions. Specifically, using data on the inhibition of the protease Thermolysin, we demonstrate that it is possible for an ILP program to construct good quantitative structure-activity models. We also comment on the relationship of this work to other recent developments in statistical relational learning.

CloseRead Abstract

2006

Guest editorial

Authors
Camacho, R; King, RD; Srinivasan, A;

Publication
Machine Learning

Abstract

2006

A commodity platform for Distributed Data Mining - the HARVARD System

Authors
Camacho, R;

Publication
6th Industrial Conference on Data Mining, Poster Proceedings, ICDM 2006, Leipzig, Germany, July 14-15, 2006

Abstract

1991

A multi-agent environment in robotics

Authors
Oliveira, EC; Camacho, R; Ramos, C;

Publication
Robotica

Abstract
The use of Multi-Agent Systems as a Distributed AI paradigm for Robotics is the principal aim of our present work. In this paper we consider the needed concepts and a suitable architecture for a set of Agents in order to make it possible for them to cooperate in solving non-trivial tasks. Agents are sets of different software modules, each one implementing a function required for cooperation. A Monitor, an Acquaintance and Self-knowledge Modules, an Agenda and an Input queue, on the top of each Intelligent System, are fundamental modules that guarantee the process of cooperation, while the overall aim is devoted to the community of cooperative Agents. These Agents, which our testbed concerns, include Vision, Planner, World Model and the Robot itself.

CloseRead Abstract