Publications

Publications by Francisco Miguel Cruz

2014

Workload-aware table splitting for NoSQL

Authors
Cruz, F; Maia, F; Oliveira, R; Vilaca, R;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
Massive scale data stores, which exhibit highly desirable scalability and availability properties are becoming pivotal systems in nowadays infrastructures. Scalability achieved by these data stores is anchored on data independence; there is no clear relationship between data, and atomic inter-node operations are not a concern. Such assumption over data allows aggressive data partitioning. In particular, data tables are horizontally partitioned and spread across nodes for load balancing. However, in current versions of these data stores, partitioning is either a manual process or automated but simply based on table size. We argue that size based partitioning does not lead to acceptable load balancing as it ignores data access patterns, namely data hotspots. Moreover, manual data partitioning is cumbersome and typically infeasible in large scale scenarios. In this paper we propose an automated table splitting mechanism that takes into account the system workload. We evaluate such mechanism showing that it simple, non-intrusive and effective. Copyright 2014 ACM.

CloseRead Abstract

2010

On the Expressiveness and Trade-Offs of Large Scale Tuple Stores

Authors
Vilaça, R; Cruz, F; Oliveira, RC;

Publication
On the Move to Meaningful Internet Systems, OTM 2010 - Confederated International Conferences: CoopIS, IS, DOA and ODBASE, Hersonissos, Crete, Greece, October 25-29, 2010, Proceedings, Part II

Abstract

2011

Assessing NoSQL Databases for Telecom Applications

Authors
Cruz, F; Gomes, P; Oliveira, R; Pereira, J;

Publication
13TH IEEE INTERNATIONAL CONFERENCE ON COMMERCE AND ENTERPRISE COMPUTING (CEC 2011)

Abstract
The constant evolution of access technologies are turning Internet access more ubiquitous, faster, better and cheaper. In connection with the proliferation of Internet access, Cloud Computing is changing the way users look at data, moving from local applications and installations to remote services, accessible from any device. This new paradigm presents numerous opportunities that even traditional businesses like telecoms cannot ignore, in particular, enabling new and more cost effective solutions to old problems. The work presented in this paper provides a detailed description of how a telecom application can be migrated to a NoSQL database. Particularly, by pointing out the necessary change of how we reason about data as well as the data structures that support it, in order to take full advantage of Cloud Computing. In addition, we also present a preliminary evaluation of different data persistency paradigms based on a fully tunable simulation platform that mimics the operation of a telecom business.

CloseRead Abstract

2010

On the Expressiveness and Trade-Offs of Large Scale Tuple Stores

Authors
Vilaca, R; Cruz, F; Oliveira, R;

Publication
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2010, PT II

Abstract
Massive-scale distributed computing is a challenge at our doorstep. The current exponential growth of data calls for massive-scale capabilities of storage and processing. This is being acknowledged by several major Internet players embracing the cloud computing model and offering first generation distributed tuple stores. Having all started from similar requirements, these systems ended up providing a similar service: A simple tuple store interface, that allows applications to insert, query, and remove individual elements. Furthermore, while availability is commonly assumed to be sustained by the massive scale itself, data consistency and freshness is usually severely hindered. By doing so, these services focus on a specific narrow trade-off between consistency, availability, performance, scale, and migration cost, that is much less attractive to common business needs. In this paper we introduce Data Droplets, a novel tuple store that shifts the current trade-off towards the needs of common business users, providing additional consistency guarantees and higher level data processing primitives smoothing the migration path for existing applications. We present a detailed comparison between Data Droplets and existing systems regarding their data model, architecture and trade-offs. Preliminary results of the system's performance under a realistic workload are also presented.

CloseRead Abstract