Publicacoes - INESC TEC

Publicações

Publicações por CRACS

2007

An integrated approach to feature invention and model construction for drug activity prediction

Autores
Davis, J; Costa, VS; Ray, S; Page, D;

Publicação
ACM International Conference Proceeding Series

Abstract
We present a new machine learning approach for 3D-QSAR, the task of predicting binding affinities of molecules to target proteins based on 3D structure. Our approach predicts binding affinity by using regression on substructures discovered by relational learning. We make two contributions to the state-of-the-art. First, we use multiple-instance (MI) regression, which represents a molecule as a set of 3D conformations, to model activity. Second, the relational learning component employs the "Score As You Use" (SAYU) method to select substructures for their ability to improve the regression model. This is the first application of SAYU to multiple-instance, real-valued prediction. We evaluate our approach on three tasks and demonstrate that (i) SAYU outperforms standard coverage measures when selecting features for regression, (ii) the MI representation improves accuracy over standard single feature-vector encodings and (iii) combining SAYU with MI regression is more accurate for 3D-QSAR than either approach by itself.

FecharLer Abstract

2007

Change of Representation for Statistical Relational Learning

Autores
Davis, J; Ong, I; Struyf, J; Page, EBD; Costa, VS;

Publicação
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
Statistical relational learning (SRL) algorithms learn statistical models from relational data, such as that stored in a relational database. We previously introduced view learning for SRL, in which the view of a relational database can be automatically modified, yielding more accurate statistical models. The present paper presents SAYU-VISTA, an algorithm which advances beyond the initial view learning approach in three ways. First, it learns views that introduce new relational tables, rather than merely new fields for an existing table of the database. Second, new tables or new fields are not limited to being approximations to some target concept; instead, the new approach performs a type of predicate invention. The new approach avoids the classical problem with predicate invention, of learning many useless predicates, by keeping only new fields or tables (i.e., new predicates) that immediately improve the performance of the statistical model. Third, retained fields or tables can then be used in the definitions of further new fields or tables. We evaluate the new view learning approach on three relational classification tasks.

FecharLer Abstract

2007

Automatic constraint partitioning to speed up CLP execution

Autores
Pereira, MR; Vargas, PK; Stelling de Castro, MCS; Franca, FMG; Dutra, ID;

Publicação
19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS

Abstract
Speedup in distributed executions of Constraint Logic Programming (CLP) applications are directed related to a good constraint partitioning algorithm. In this work we study different mechanisms to distribute constraints to processors based on straightforward mechanisms such as Round-Robin and Block distribution, and on a more sophisticated automatic distribution method, Grouping-Sink, that takes into account the connectivity of the constraint network graph. This aims at reducing the communication overhead in distributed environments. Our results show that Grouping-Sink is, in general, the best alternative for partitioning constraints as it produces results as good or better than Round-Robin or Blocks with low communication rate.

FecharLer Abstract

2007

GRAND: toward scalability in a grid environment

Autores
Vargas, PK; Dutra, IC; do Nascimento, VD; Santos, LAS; da Silva, LC; Geyer, CFR; Schulze, B;

Publicação
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Abstract
One of the challenges in Grid computing research is to provide a means to automatically submit, manage, and monitor applications whose main characteristic is to be composed of a large number of tasks. The large number of explicit tasks, generally placed on a centralized job queue, can cause several problems: (1) they can quickly exhaust the memory of the submission machine; (2) they can deteriorate the response time of the submission machine due to these demanding too many open ports to manage remote execution of each of the tasks; (3) they may cause network traffic congestion if all tasks try to transfer input and/or output files across the network at the same time; (4) they make it impossible for the user to follow execution progress without an automatic tool or interface; (5) they may depend on fault-tolerance mechanisms implemented at application level to ensure that all tasks terminate successfully. In this work we present and validate a novel architectural model, GRAND (Grid Robust ApplicatioN Deployment), whose main objective is to deal with the submission of a large numbers of tasks. Copyright (c) 2006 John Wiley & Sons, Ltd.

FecharLer Abstract

2007

Grid applications in EELA

Autores
Abarca, R; Acero, A; Aparicio, G; Baeza, C; Barbera, R; Blanco, F; Blanquer, I; Carrillo, M; Luis Chaves, JL; Cofino, A; Cruz, J; Diniz, M; Domingues, G; Teresa Dova, MT; Dutra, I; Echeverria, F; Enriquez, L; Fernandez Lima, F; Fernandez Nodarse, F; Fernandez, M; Fernandez, V; Franca, F; Manuel Gutierrez, JM; Hernandez, A; Hernandez, V; Isea, R; Lima, P; Lopez, D; Mayo, R; Miguel, R; Montes, E; Ricardo Mora, HR; Moreveli Espinoza, M; Nellen, L; Pereira, G; Pezoa, R; Porto, A; Salinas, L; Silva, E; Tolla, C;

Publicação
IBERGRID: 1ST IBERIAN GRID INFRASTRUCTURE CONFERENCE PROCEEDINGS

Abstract
Several international Projects and Collaborations have emerged in the last years due to the increasing demand for Grid resources. One important aspect of these initiatives deals with the gridification of computing intensive scientific applications otherwise difficult to run efficiently. The EELA Project (E-Infrastructure shared between Europe and Latin America) is a collaboration of Latin America and Europe Institutions which has developed a performance e-Infrastructure for e-Science applications in the fields of Biomedicine, High Energy Physics, e-Learning and Climate. Nowadays many groups have already ported their applications on the EELA Grid and are obtaining first results. This paper describes the first year of EELA and the progress achieved so far.

FecharLer Abstract

2007

Storage and retrieval on P2P networks: A DHT based protocol

Autores
Bessa, S; Correia, ME; Brandao, P;

Publicação
2007 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3

Abstract
In this paper we present the development, implementation and simulation of a simple Distributed Hash Table (DHT) protocol for a Peer to peer (P2P) overlay network inspired by small world [3, 2] concepts. Our simulation and implementation, done on the Peersim [10] java network simulator, showed results consistent with other state of the art DHT implementations with a more simple and pragmatic approach for the graph construction algorithm. We present the results Of Simulating this protocol on large P2P networks and compare them with the results obtained in Symphony [14], another small world inspired DHT.

FecharLer Abstract