Publications

Publications by HASLab

2013

WEIGHTED AUTOMATA AS COALGEBRAS IN CATEGORIES OF MATRICES

Authors
Oliveira, JN;

Publication
INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE

Abstract
The evolution from non-deterministic to weighted automata represents a shift from qualitative to quantitative methods in computer science. The trend calls for a language able to reconcile quantitative reasoning with formal logic and set theory, which have for so many years supported qualitative reasoning. Such a lingua franca should be typed, polymorphic, diagrammatic, calculational and easy to blend with conventional notation. This paper puts forward typed linear algebra as a candidate notation for such a unifying role. This notation, which emerges from regarding matrices as morphisms of suitable categories, is put at work in describing weighted automata as coalgebras in such categories. Some attention is paid to the interface between the index-free (categorial) language of matrix algebra and the corresponding index-wise, set-theoretic notation.

CloseRead Abstract

2013

Typing linear algebra: A biproduct-oriented approach

Authors
Macedo, HD; Oliveira, JN;

Publication
SCIENCE OF COMPUTER PROGRAMMING

Abstract
Interested in formalizing the generation of fast running code for linear algebra applications, the authors show how an index-free, calculational approach to matrix algebra can be developed by regarding matrices as morphisms of a category with biproducts. This shifts the traditional view of matrices as indexed structures to a type-level perspective analogous to that of the pointfree algebra of programming. The derivation of fusion, cancellation and abide laws from the biproduct equations makes it easy to calculate algorithms implementing matrix multiplication, the central operation of matrix algebra, ranging from its divide-and-conquer version to its vectorization implementation. From errant attempts to learn how particular products and coproducts emerge from biproducts, not only blocked matrix algebra is rediscovered but also a way of extending other operations (e.g. Gaussian elimination) blockwise, in a calculational style, is found. The prospect of building biproduct-based type checkers for computer algebra systems such as MATLAB (TM) is also considered.

CloseRead Abstract

2013

Alloy Meets the Algebra of Programming: A Case Study

Authors
Oliveira, JN; Ferreira, MA;

Publication
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING

Abstract
Relational algebra offers to software engineering the same degree of conciseness and calculational power as linear algebra in other engineering disciplines. Binary relations play the role of matrices with similar emphasis on multiplication and transposition. This matches with Alloy's lemma "everything is a relation" and with the relational basis of the Algebra of Programming (AoP). Altogether, it provides a simple and coherent approach to checking and calculating programs from abstract models. In this paper, we put Alloy and the Algebra of Programming together in a case study originating from the Verifiable File System mini-challenge put forward by Joshi and Holzmann: verifying the refinement of an abstract file store model into a journaled (FLASH) data model catering to wear leveling and recovery from power loss. Our approach relies on diagrams to graphically express typed assertions. It interweaves model checking (in Alloy) with calculational proofs in a way which offers the best of both worlds. This provides ample evidence of the positive impact in software verification of Alloy's focus on relations, complemented by induction-free proofs about data structures such as stores and lists.

CloseRead Abstract

2013

Improving transaction abort rates without compromising throughput through judicious scheduling

Authors
Alonso, AN; Pereira, J;

Publication
SAC

Abstract
Althought optimistic concurrency control protocols have increasingly been used in distributed database management systems, they imply a trade-off between the number of transactions that can be executed concurrently, hence, the peak throughput, and transactions aborted due to conflicts. We propose a novel optimistic concurrency control mechanism that controls transaction abort rate by minimizing the time during which transactions are vulnerable to abort, without compromising throughput. Briefly, we throttle transaction execution with an adaptive mechanism based on the state of the transaction queues while allowing out-of-order execution based on expected transaction latency. Preliminary evaluation shows that this provides a substantial improvement in committed transaction throughput. Copyright 2013 ACM.

CloseRead Abstract

2013

DEDIS: distributed exact deduplication for primary storage infrastructures

Authors
Paulo, J; Pereira, J;

Publication
SoCC

Abstract
Deduplication is now widely accepted as an efficient technique for reducing storage costs at the expense of some processing overhead, being increasingly sought in primary storage systems [7, 8] and cloud computing infrastructures holding Virtual Machine (VM) volumes [2, 1, 5]. Besides a large number of duplicates that can be found across static VM images [3], dynamic general purpose data from VM volumes allows space savings from 58% up to 80% if deduplicated in a cluster-wide fashion [1, 4]. However, some of these volumes persist latency sensitive data which limits the overhead that can be incurred in I/O operations. Therefore, this problem must be addressed by a cluster-wide distributed deduplication system for such primary storage volumes.

CloseRead Abstract

2013

Towards an accurate evaluation of deduplicated storage systems

Authors
Paulo, J; Reis, P; Pereira, J; Sousa, A;

Publication
COMPUTER SYSTEMS SCIENCE AND ENGINEERING

Abstract
Deduplication has proven to be a valuable technique for eliminating duplicate data in backup and archival systems and is now being applied to new storage environments with distinct requirements and performance trade-offs. Namely, deduplication system are now targeting large-scale cloud computing storage infrastructures holding unprecedented data volumes with a significant share of duplicate content. It is however hard to assess the usefulness of deduplication in particular settings and what techniques provide the best results. In fact, existing disk I/O benchmarks follow simplistic approaches for generating data content leading to unrealistic amounts of duplicates that do not evaluate deduplication systems accurately. Moreover, deduplication systems are now targeting heterogeneous storage environments, with specific duplication ratios, that benchmarks must also simulate. We address these issues with DEDISbench, a novel micro-benchmark for evaluating disk I/O performance of block based deduplication systems. As the main contribution, DEDISbench generates content by following realistic duplicate content distributions extracted from real datasets. Then, as a second contribution, we analyze and extract the duplicates found on three real storage systems, proving that DEDISbench can easily simulate several workloads. The usefulness of DEDISbench is shown by comparing it with Bonnie++ and IOzone open-source disk I/O micro-benchmarks on assessing two open-source deduplication systems, Opendedup and Lessfs, using Ext4 as a baseline. Our results lead to novel insight on the performance of these file systems.

CloseRead Abstract