1997
Authors
Lopes, LMB; Silva, FMA;
Publication
SOFTWARE-PRACTICE & EXPERIENCE
Abstract
Run-time work distribution in parallel programming systems is usually accomplished through the use of dynamic scheduling heuristics. Their sensitivity to run-time information such as global work-load, task granularity, data dependencies, locality of information, among others, is essential when trying to optimize performance. Adaptive schedulers that base their decisions on feed-back from the system are therefore of special importance. We have developed and used a general purpose parallel programming system, the pSystem, that also served as a test-bed environment on which we have experimented and studied the performance of distinct scheduling heuristics. Currently, we have two versions of the system: one based on Unix processes; and the other on Solaris threads. Threads (particularly user-level threads) are usually associated with low execution overheads, since they require minimal interaction with the operating system kernel This suggests that lower grain parallelism may be more effectively exploited with a thread-based parallel programming system. Performance analysis of both implementations over a Set of well known benchmarks, with various schedulers, shows that threads scale better under higher system loads and/or when the granularity of the tasks being executed is below a given threshold value. This paper starts with a description of the design and implementation of the pSystem computational model, followed by a detailed description of several experiments and the analysis of their results. (C) 1997 John Wiley & Sons, Ltd.
1997
Authors
Correia, ME; Silva, F; Costa, VS;
Publication
LOGIC PROGRAMMING - PROCEEDINGS OF THE 1997 INTERNATIONAL SYMPOSIUM
Abstract
One of the advantages of logic programming in the fact that it offers many sources of implicit parallelism, such as and-parallelism and or-parallelism Recently, research has been concentrated on integrating the different forms of parallelism into a single combined system. In this work we concentrate on the problem of integrating or-parallelism and independent and-parallelism for parallel Prolog systems. We contend that previous data structures require pure recomputation and therefore do not allow for orthogonality between and parallelism and or-parallelism. In contrast, we submit that a simpler solution, the sparse binding array, does guarantee this goal, and explain in detail how independent and-parallelism and or-parallelism can thus be efficiently combined.
1997
Authors
Costa, VS; Bianchini, R; de, CDI;
Publication
International Symposium on Parallel Symbolic Computation, Proceedings, PASCO
Abstract
Parallel logic programming systems are sophisticated examples of symbolic computing systems. They address problems such as dynamic memory allocation, scheduling irregular execution patterns, and managing different types of implicit parallelism. Most parallel logic programming systems have been developed for bus-based shared-memory architectures. The complexity of parallel logic programming systems and the large amount of data they process raises the question of whether logic programming systems can still obtain good performance on scalable architectures, such as distributed shared-memory systems. In this work we use execution-driven simulation to investigate the access patterns and caching behaviour exhibited by a parallel logic programming system, Andorra-I. We show that the system obtains reasonable performance, but that it does not scale well. By studying the behaviour of the major data structures in Andorra-I in detail, we conclude that this result is largely a consequence of the scheduling and work manipulation implementation used in the system. We also show that the Andorra-I's data structures exhibit widely-varying memory access patterns and caching behaviour, which not only depend on the number of processors, but also on the amount and type of parallelism available in the application program. Some of these data structures clearly favour invalidate-based cache coherence protocols, while others favour update-based protocols. Since most of Andorra-I's data structures are common to other parallel logic programming systems, we believe that these systems can greatly benefit from flexible coherence schemes where either the compiler can specify the protocol to be used for each data structure or the protocol can adapt to varying memory access patterns.
1997
Authors
Costa, VS; Bianchini, R; Dutra, IdC;
Publication
Fifth Euromicro Workshop on Parallel and Distributed Processing (PDP '97), January 22-24, 1997, University of Westminster, London, UK
Abstract
1996
Authors
Costa, VS; Correia, ME; Silva, F;
Publication
Anais do VIII International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 1996)
Abstract
1996
Authors
Costa, VS; Warren, DHD; Yang, R;
Publication
New Generation Comput.
Abstract
Andorra-I is an experimental parallel Prolog system which transparently exploits both dependent and-parallelism and or-parallelism. One of the main components of Andorra-I is its preprocessor. In order to obtain efficient execution of programs in Andorra-I, the preprocessor includes a compiler for Andorra-I. The compiler includes a determinacy analyser and a clause compiler, and generates code for a specialised abstract machine. In this paper we discuss the main issues in the Andorra-I compiler, presenting its abstract instruction set and describing the algorithms used in its implementation.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.