2022
Authors
Faria, N; Costa, D; Pereira, J; Vilaça, R; Ferreira, L; Coelho, F;
Publication
19th IEEE Annual Consumer Communications & Networking Conference, CCNC 2022, Las Vegas, NV, USA, January 8-11, 2022
Abstract
There is an increasing demand for stateful edge computing for both complex Virtual Network Functions (VNFs) and application services in emerging 5G networks. Managing a mutable persistent state in the edge does however bring new architectural, performance, and dependability challenges. Not only it has to be integrated with existing cloud-based systems, but also cope with both operational and analytical workloads and be compatible with a variety of SQL and NoSQL database management systems. We address these challenges with AIDA-DB, a polyglot data management architecture for the edge and cloud continuum. It leverages recent development in distributed transaction processing for a reliable mutable state in operational workloads, with a flexible synchronization mechanism for efficient data collection in cloud-based analytical workloads. © 2022 IEEE.
2022
Authors
Costa, D; Pereira, J; Vilaca, R; Faria, N;
Publication
37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING
Abstract
Wide availability of edge computing platforms, as expected in emerging 5G networks, enables a computing continuum between centralized cloud services and the edge of the network, close to end-user devices. This is particularly appealing for online analytics as data collected by devices is made available for decisionmaking. However, cloud-based parallel-distributed data processing platforms are not able to directly access data on the edge. This can be circumvented, at the expense of freshness, with data synchronization that periodically uploads data to the cloud for processing. In this work, we propose an adaptive database synchronization system that makes distributed data in edge nodes available dynamically to the cloud by balancing between reducing the amount of data that needs to be transmitted and the computational effort needed to do so at the edge. This adapts to the availability of CPU and network resources as well as to the application workload.
2022
Authors
Moreno, M; Vilaca, R; Ferreira, PG;
Publication
BMC BIOINFORMATICS
Abstract
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.
2013
Authors
Matos, Miguel; Gomes, Pedro; Vilaça, Ricardo; Beernaert, Leander; Oliveira, Rui Carlos Mendes de;
Publication
Abstract
All companies developing their business on the Web, not only giants like Google or Facebook but also small com- panies focused on niche markets, face scalability issues in data management. The case study of this paper is the content management systems for classified or commercial advertise-ments on the Web. The data involved has a very significant growth rate and a read-intensive access pattern with a reduced update rate. Typically, data is stored in traditional file systems hosted on dedicated servers or Storage Area Network devices due to the generalization and ease of use of file systems. However, this ease in implementation and usage has a disadvantage: the centralized nature of these systems leads to availability, elasticity and scalability problems. The scenario under study, undemanding in terms of the system's consistency and with a simple interaction model, is suitable to a distributed database, such as Cassandra, conceived precisely to dynamically handle large volumes of data. In this paper, we analyze the suitability of Cassandra as a substitute for file systems in content management systems. The evaluation, conducted using real data from a produc- tion system, shows that using Cassandra, one can easily get horizontal scalability of storage, redundancy across multiple independent nodes, and load distribution imposed by the periodic activities of safeguarding data, while ensuring a comparable performance to that of a file system.
2024
Authors
Silva, CA; Vilaça, R; Pereira, A; Bessa, RJ;
Publication
RENEWABLE & SUSTAINABLE ENERGY REVIEWS
Abstract
High-performance computing relies on performance-oriented infrastructures with access to powerful computing resources to complete tasks that contribute to solve complex problems in society. The intensive use of resources and the increase in service demand due to emerging fields of science, combined with the exascale paradigm, climate change concerns, and rising energy costs, ultimately means that the decarbonization of these centers is key to improve their environmental and financial performance. Therefore, a review on the main opportunities and challenges for the decarbonization of high-performance computing centers is essential to help decision-makers, operators and users contribute to a more sustainable computing ecosystem. It was found that state-of-the-art supercomputers are growing in computing power, but are combining different measures to meet sustainability concerns, namely going beyond energy efficiency measures and evolving simultaneously in terms of energy and information technology infrastructure. It was also shown that policy and multiple entities are now targeting specifically HPC, and that identifying synergies with the energy sector can reveal new revenue streams, but also enable a smoother integration of these centers in energy systems. Computing-intensive users can continue to pursue their scientific research, but participating more actively in the decarbonization process, in cooperation with computing service providers. Overall, many opportunities, but also challenges, were identified, to decrease carbon emissions in a sector mostly concerned with improving hardware performance.
2023
Authors
Faria, N; Pereira, J; Alonso, AN; Vilaca, R; Koning, Y; Nes, N;
Publication
PROCEEDINGS OF THE VLDB ENDOWMENT
Abstract
Transactions have been a key issue in database management for a long time and there are a plethora of architectures and algorithms to support and implement them. The current state-of-the-art is focused on storage management and is tightly coupled with its design, leading, for instance, to the need for completely new engines to support new features such as Hybrid Transactional Analytical Processing (HTAP). We address this challenge with a proposal to implement transactional logic in a query language such as SQL. This means that our approach can be layered on existing analytical systems but that the retrieval of a transactional snapshot and the validation of update transactions runs in the server and can take advantage of advanced query execution capabilities of an optimizing query engine. We demonstrate our proposal, TiQuE, on MonetDB and obtain an average 500x improvement in transactional throughput while retaining good performance on analytical queries, making it competitive with the state-of-the-art HTAP systems.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.