Publicacoes - INESC TEC

Publicações

Publicações por HASLab

2009

Search Optimizations in Structured Peer-to-peer Systems

Autores
Lopes, N; Baquero, C;

Publicação
2009 18TH IEEE INTERNATIONAL WORKSHOP ON ENABLING TECHNOLOGIES: INFRASTRUCTURES FOR COLLABORATIVE ENTERPRISES

Abstract
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow non-uniform distributions, often power law distributions, creating intense local storage hotspots and network bottlenecks on specific hosts. Current techniques like caching cannot, alone, cope with this issue. We propose a distributed data structure based on a decentralized balanced tree to balance storage data and network load more uniformly across hosts. The results show that the data structure is capable of balancing resources, in particular when performing multiple keyword searches.

FecharLer Abstract

2009

Probabilistic Estimation of Network Size and Diameter

Autores
Cardoso, JCS; Baquero, C; Almeida, PS;

Publicação
LADC: 2009 4TH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE COMPUTING

Abstract
Determining the size of a network and its diameter are important functions in distributed systems, as there are a number of algorithms which rely on such parameters, or at least on estimates of those values. The Extrema Propagation technique allows the estimation of the size of a network in a fast, distributed and fault tolerant manner. The technique was previously studied in a simulation setting where rounds advance synchronously and where there is no message loss. This work presents two main contributions. The first, is the study of the Extrema Propagation technique under asynchronous rounds and integrated in the Network Friendly Epidemic Multicast (NeEM) framework. The second, is the evaluation of a diameter estimation technique associated with the Extrema Propagation. This study also presents a small enhancement to the Extrema Propagation in terms of communication cost and points out some other possible enhancements. Results show that there is a clear trade-off between time and communication that must be considered when configuring the protocol-a faster convergence time implies a higher communication cost Results also show that its possible to reduce the total communication cost by more than 18% using a simple approach. The diameter estimation technique is shown to have a relative error of less than 10% even when using a small sample of nodes.

FecharLer Abstract

2009

Fast Estimation of Aggregates in Unstructured Networks

Autores
Baquero, C; Almeida, PS; Menezes, R;

Publicação
ICAS: 2009 FIFTH INTERNATIONAL CONFERENCE ON AUTONOMIC AND AUTONOMOUS SYSTEMS

Abstract
Aggregation of data values plays an important role on distributed computations, in particular over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network Size to dimension distributed hash tables or estimates of the average system load to direct load-balancing. Distributed aggregation using non-idempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multi-path routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps equals the diameter; and it is fully, distributed, with no single point of failure and the result produced at every node.

FecharLer Abstract

2009

Forby: Providing Groupware Features Relying on Distributed File System Event Dissemination

Autores
Sousa, P; Preguica, N; Baquero, C;

Publicação
GROUPWARE-DESIGN: IMPLEMENTATION, AND USE, PROCEEDINGS

Abstract
Intensive research and development has been conducted in the design and creation of groupware systems for distributed users. While for some activities, these groupware tools are widely used, for other activities the impact in the groupware community has been smaller and can be improved. One reason for this fact is that the mostly common used applications do not support collaborative features and users are reluctant to change to a different application. In this paper we discuss how available file system mechanisms can help to address this problem. In this context, we present Forby, a system that allows to provide groupware features to distributed users by combining filesystem monitoring and distributed event dissemination. To demonstrate our solution, we present three systems that rely on Forby for providing groupware features to users running unmodified applications.

FecharLer Abstract

2009

Fault-Tolerant Aggregation by Flow Updating

Autores
Jesus, P; Baquero, C; Almeida, PS;

Publicação
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, PROCESSINGS

Abstract
Data aggregation plays an important role in the design of scalable systems, allowing the determination of meaningful system-wide properties to direct the execution of distributed applications. In the particular case of wireless sensor networks, data collection is often only practicable if aggregation is performed. Several aggregation algorithms have been proposed in the last few years, exhibiting different properties in terms of accuracy, speed and communication tradeoffs. Nonetheless, existing approaches are found lacking in terms of fault tolerance. In this paper, we introduce a novel fault-tolerant averaging based data aggregation algorithm. It tolerates substantial message loss (link failures), while competing algorithms in the same class can be affected by a Single lost message. The algorithm is based on manipulating flows (in the graph theoretical sense), that are updated using idempotent messages, providing it with unique robustness capabilities. Furthermore, evaluation results obtained by comparing it with other averaging approaches have revealed that it outperforms them in terms of time and message complexity.

FecharLer Abstract

2009

Shortcut fusion rules for the derivation of circular and higher-order monadic programs

Autores
Pardo, A; Fernandes, JP; Saraiva, J;

Publicação
Proceedings of the 2009 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM 2009, Savannah, GA, USA, January 19-20, 2009

Abstract
Functional programs often combine separate parts using intermediate data structures for communicating results. These programs are modular, easier to understand and maintain, but suffer from inefficiencies due to the generation of those gluing data structures. To eliminate such redundant data structures, some program transformation techniques have been proposed. One such technique is shortcut fusion, and has been studied in the context of both pure and monadic functional programs. Recently, we have extended standard shortcut fusion: in addition to intermediate structures, the program parts may now communicate context information, and it still is possible to eliminate those structures. This is achieved by transforming the original function composition into a circular program. This new technique, however, has been studied in the context of purely functional programs only. In this paper, we propose an extension to this new form of fusion,but in the context of monadic programming: we derive monadic circular p ograms from strict ones, maintaining the global effects. Later, the circularities in the derived programs are traded by highorder definitions, using a well-known program transformation technique. We finally obtain very efficient deforested programs. An important feature of our extensions is that they can beuniformly defined for a wide class of data types and monads, using generic calculation rules. ©2009 ACM.

FecharLer Abstract