2005
Authors
Burnside, ElizabethS.; Davis, Jesse; Costa, VitorSantos; Dutra, InesdeCastro; Jr., CharlesE.Kahn; Fine, Jason; Page, David;
Publication
AMIA 2005, American Medical Informatics Association Annual Symposium, Washington, DC, USA, October 22-26, 2005
Abstract
The development of large mammography databases provides an opportunity for knowledge discovery and data mining techniques to recognize patterns not previously appreciated. Using a database from a breast imaging practice containing patient risk factors, imaging findings, and biopsy results, we tested whether inductive logic programming (ILP) could discover interesting hypotheses that could subsequently be tested and validated. The ILP algorithm discovered two hypotheses from the data that were 1) judged as interesting by a subspecialty trained mammographer and 2) validated by analysis of the data itself.
1997
Authors
Costa, VS; Bianchini, R; de, CDI;
Publication
International Symposium on Parallel Symbolic Computation, Proceedings, PASCO
Abstract
Parallel logic programming systems are sophisticated examples of symbolic computing systems. They address problems such as dynamic memory allocation, scheduling irregular execution patterns, and managing different types of implicit parallelism. Most parallel logic programming systems have been developed for bus-based shared-memory architectures. The complexity of parallel logic programming systems and the large amount of data they process raises the question of whether logic programming systems can still obtain good performance on scalable architectures, such as distributed shared-memory systems. In this work we use execution-driven simulation to investigate the access patterns and caching behaviour exhibited by a parallel logic programming system, Andorra-I. We show that the system obtains reasonable performance, but that it does not scale well. By studying the behaviour of the major data structures in Andorra-I in detail, we conclude that this result is largely a consequence of the scheduling and work manipulation implementation used in the system. We also show that the Andorra-I's data structures exhibit widely-varying memory access patterns and caching behaviour, which not only depend on the number of processors, but also on the amount and type of parallelism available in the application program. Some of these data structures clearly favour invalidate-based cache coherence protocols, while others favour update-based protocols. Since most of Andorra-I's data structures are common to other parallel logic programming systems, we believe that these systems can greatly benefit from flexible coherence schemes where either the compiler can specify the protocol to be used for each data structure or the protocol can adapt to varying memory access patterns.
1999
Authors
Silva, MG; Dutra, IC; Bianchini, R; Costa, VS;
Publication
PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES
Abstract
In this work we investigate how different machine settings for a hardware Distributed Shared Memory (DSM) architecture affect the performance of parallel logic programming (PLP) systems. We use execution-driven simulation of a DASH-like multiprocessor to study the impact of the cache block size, the cache size, the network bandwidth, the write buffer size, and the coherence protocol on the performance of Andorra-I, a PLP system capable of exploiting implicit parallelism in Prolog programs. Among several other observations, we find that PLP systems favour small cache blocks regardless of the coherence protocol, while they favour large cache sizes only in the case of invalidate-based coherence. We conclude that the cache block size, the cache size, the network bandwidth, and the coherence protocol have a significant impact on the performance, while the size of the write buffer is somewhat irrelevant.
2000
Authors
De Castro Dutra, I; Costa, VS; Bianchini, R;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
In this paper we use execution-driven simulation of a scalable multiprocessor to evaluate the performance of the Andorra-I parallel logic programming system under invalidate and update-based protocols. We use two versions of Andorra-I. One of them was originally designed for bus-based multiprocessors, while the other is optimised for scalable architectures. We study a well-known invalidate protocol and two different update-based protocols. Our results show that for our sample logic programs the update-based protocols outperform their invalidate-based counterpart for the original version of Andorra-I. In contrast, the optimised version of Andorra-I benefits the most from the invalidate-based protocol, but a hybrid update-based protocol performs as well as the invalidate protocol in most cases. We conclude that parallel logic programming systems can consistently benefit from hybrid update-based protocols. © Springer-Verlag Berlin Heidelberg 2000.
2003
Authors
Dutra, I; Page, D; Costa, VS; Shavlik, J; Waddell, M;
Publication
EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS
Abstract
Large-scale applications that require executing very large numbers of tasks are only feasible through parallelism. In this work we present a system that automatically handles large numbers of experiments and data in the context of machine learning. Our system controls all experiments, including re-submission of failed jobs and relies on available resource managers to spawn jobs through pools of machines. Our results show that we can manage a very large number of experiments, using a reasonable amount of idle CPU cycles, with very little user intervention.
2003
Authors
De Dutra, IC; Page, D; Costa, VS; Shavlik, J;
Publication
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Abstract
Ensembles have proven useful for a variety of applications, with a variety of machine learning approaches. While Quinlan has applied boosting to FOIL, the widely-used approach of bagging has never been employed in ILP. Bagging has the advantage over boosting that the different members of the ensemble can be learned and used in parallel. This advantage is especially important for ILP where run-times often are high. We evaluate bagging on three different application domains using the complete-search ILP system, Aleph. We contrast bagging with an approach where we take advantage of the non-determinism in ILP search, by simply allowing Aleph to run multiple times, each time choosing "seed" examples at random.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.