Publicacoes - INESC TEC

Publicações

2020

I2B+tree: Interval B plus tree variant towards fast indexing of time-dependent data

Autores
Carneiro, E; de Carvalho, AV; Oliveira, MA;

Publicação
2020 15TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2020)

Abstract
Index structures are fast-access methods. In the past, they were often used to minimise fetch operations to external storage devices (secondary memory). Nowadays, this also holds for increasingly large amounts of data residing in main-memory (primary memory). Examples of software that deals with this fact are in-memory databases and mobile device applications. Within this scope, this paper focuses on index structures to store, access and delete interval-based time-dependent (temporal) data from very large datasets, in the most efficient way. Index structures for this domain have specific characteristics, given the nature of time and the requirement to index time intervals. This work presents an open-source time-efficiency focused variant of the original Interval B+ tree. We designate this variant Improved Interval B+ tree (I2B+ tree). Our contribution adds to the performance of the delete operation by reducing the amount of traversed nodes to access siblings. We performed an extensive analysis of insert, range queries and deletion operations, using multiple datasets with growing volumes of data, distinct temporal distributions and tree parameters (time-split and node order). Results of the experiments validate the logarithmic performance of these operations and propose the best-observed tree parameter ranges.

FecharLer Abstract

2020

Characterizing the hypergraph-of-entity and the structural impact of its extensions

Autores
Devezas, J; Nunes, S;

Publicação
APPLIED NETWORK SCIENCE

Abstract
The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density measure based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. We then repeat the analysis over three extensions-materialized through synonym, context, and tf_bin hyperedges-in order to assess their structural impact in the hypergraph. Finally, we focus on the application-specific aspects of the hypergraph-of-entity, in the domain of information retrieval. We analyze the correlation between the retrieval effectiveness and the structural features of the representation model, proposing ranking and anomaly indicators, as useful guides for modifying or extending the hypergraph-of-entity.

FecharLer Abstract

2020

Do technological factors impact differently on rural and urban new venture performance? Empirical evidence from the Portuguese case

Autores
Pato, L; Teixeira, AAC;

Publicação
Rural Entrepreneurship and Innovation in the Digital Era

Abstract
Research on the relationship between entrepreneurship and context has gained considerable attention in recent years. However, this stream of literature has yet to adequately address the topic of entrepreneurship in rural areas. This chapter intends to fill this gap by investigating the extent to which technological-related factors affect the performance of new ventures located in rural and urban areas. Based on a sample of 408 newly created ventures located in Portuguese business incubators (BIs) and science parks (SPs), and employing logistic estimations, two main conclusions were derived. They are 1) support from BIs/SPs matters the most to the export and global innovation performance of new ventures located in rural areas and 2) support from universities and other higher education institutions, and the regularity of research and development (R&D) collaborations between new ventures and R&D institutions are more relevant to the turnover and innovation performance of new ventures located in urban areas than those in rural areas. © 2021, IGI Global.

FecharLer Abstract

2020

Chikungunya Virus Inhibitor Study based on Molecular Docking Experiments

Autores
Saraiva, AA; Jeferson, S; Miranda, C; Sousa, JVM; Ferreira, NMF; Neto, JESB; Soares, S; Valente, A;

Publicação
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS

Abstract
Chikungunya virus disease transmitted by the sting of the mosquito 'Aedes aegypti' presenting an epidemic in some regions. In order to have an early diagnosis and the best treatment technique, it establishes the study of inhibitors for laboratory elaboration of a drug from molecular docking. As a result you have a better chance of using Suramin followed by Silibin.

FecharLer Abstract

2020

Simulating Tariff Impact in Electrical Energy Consumption Profiles With Conditional Variational Autoencoders

Autores
Bregere, M; Bessa, RJ;

Publicação
IEEE ACCESS

Abstract
The implementation of efficient demand response (DR) programs for household electricity consumption would benefit from data-driven methods capable of simulating the impact of different tariffs schemes. This paper proposes a novel method based on conditional variational autoencoders (CVAE) to generate, from an electricity tariff profile combined with weather and calendar variables, daily consumption profiles of consumers segmented in different clusters. First, a large set of consumers is gathered into clusters according to their consumption behavior and price-responsiveness. The clustering method is based on a causality model that measures the effect of a specific tariff on the consumption level. Then, daily electrical energy consumption profiles are generated for each cluster with CVAE. This non-parametric approach is compared to a semi-parametric data generator based on generalized additive models. Experiments in a publicly available data set show that, the proposed method presents comparable performance to the semi-parametric one when it comes to generating the average value of the original data (13% difference in root mean square error). The main contribution from this new method is the capacity to reproduce rebound and side effects in the generated consumption profiles. Indeed, the application of a special electricity tariff over a time window may also affect consumption outside this time window. Another contribution is that the proposed clustering approach is capturing the reaction to a tariff change. When compared to a clustering method with classical features (min, max and average consumption), the improvement in the Calinski-Harabasz index was 128% for consumers associated with tariff changes.

FecharLer Abstract

2020

Failure Detection of an Air Production Unit in Operational Context

Autores
Barros, M; Veloso, B; Pereira, PM; Ribeiro, RP; Gama, J;

Publicação
IoT Streams/ITEM@PKDD/ECML

Abstract
The transformation of industrial manufacturing with computers and automation with smart systems leads us to monitor and log of industrial equipment events. It is possible to apply analytic approaches, and to find interpretive results for strategic decision making, providing advantages such as failure detection and predictive maintenance. Over the last years, many researchers have been studying the application of machine learning techniques to improve such tasks. In this context, we develop a system capable of detect anomalies on an Air Production Unit (APU), taking into consideration the peak frequency of each sensor. The study started with the analysis of the sensors installed on the APU, defining its normal behavior and its failure mode. Using that information, we define rules, to monitor the APU, to detect anomalies on its components, and to predict possible failures. The definition of rules was based on the peak frequency analysis, which allowed the setting of boundaries of normality for the APU working modes and, thus, the identification of anomalies.

FecharLer Abstract

1511
4387