Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

2025

Parametric models for distributional data

Autores
Brito, P; Silva, APD;

Publicação
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Abstract
We present parametric probabilistic models for numerical distributional variables. The proposed models are based on the representation of each distribution by a location measure and inter-quantile ranges, for given quantiles, thereby characterizing the underlying empirical distributions in a flexible way. Multivariate Normal distributions are assumed for the whole set of indicators, considering alternative structures of the variance-covariance matrix. For all cases, maximum likelihood estimators of the corresponding parameters are derived. This modelling allows for hypothesis testing and multivariate parametric analysis. The proposed framework is applied to Analysis of Variance and parametric Discriminant Analysis of distributional data. A simulation study examines the performance of the proposed models in classification problems under different data conditions. Applications to Internet traffic data and Portuguese official data illustrate the relevance of the proposed approach.

2025

Paraconsistency for the Working Software Engineer (Extended Abstract)

Autores
Barbosa, LS;

Publicação
SOFTWARE ENGINEERING AND FORMAL METHODS, SEFM 2024

Abstract
Modelling complex information systems often entails the need for dealing with scenarios of inconsistency in which several requirements either reinforce or contradict each other. This lecture summarises recent joint work with Juliana Cunha, Alexandre Madeira and Ana Cruz on a variant of transition systems endowed with positive and negative accessibility relations, and a metric space over the lattice of truth values. Such structures are called paraconsistent transition systems, the qualifier stressing a connection to paraconsistent logic, a logic taking inconsistent information as potentially informative. A coalgebraic perspective on this family of structures is also discussed.

2025

LLM-Based Framework for Synthetic Data Generation in Portuguese Clinical NER

Autores
Henriques, L; Guimarães, N; Jorge, A;

Publicação
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I

Abstract
The ever-increasing volume of data produced in Healthcare demands solutions capable of automatically extracting the relevant elements of their narratives. However, given privacy regulations, bureaucratic procedures, and annotation efforts, the development of said solutions via Natural Language Processing (NLP) systems becomes hindered due to training data scarcity. Such scarcity increases when we consider languages and language varieties with lower resource availability, such as European and Brazilian Portuguese. To address this problem, we propose a Large Language Model (LLM)-based SDG (Synthetic Data Generation) framework to generate and annotate synthetic clinical texts for medical Named-Entity Recognition (NER). The SDG framework consists of a system/user prompt augmented with real examples, powered by GPT-4o. Our results show that, by feeding the framework few real clinical annotated texts, we can generate synthetic data capable of increasing the performance of NER models with respect to their non-augmented counterparts. In addition, the reduction of the BLEU scores in the generated texts indicates a decrease in the risk of privacy disclosure while ensuring greater lexical diversity. These results highlight the potential of synthetic data as a solution to overcome human annotation bottlenecks and privacy concerns, laying the groundwork for future research in clinical NLP across tasks, domains, and low-resource languages. © 2025 Elsevier B.V., All rights reserved.

2025

Annual Hourly E-Mobility Modelling and Assessment in Climate Neutral Positive Energy Districts

Autores
Schneider, S; Baptista, J;

Publicação
2025 IEEE International Conference on Environment and Electrical Engineering and 2025 IEEE Industrial and Commercial Power Systems Europe (EEEIC / I&CPS Europe)

Abstract
This paper presents a full-year hourly district emobility model and its integration into a Positive Energy District simulation and assessment model including building operation, use and embodied energy and emissions. The aim of this work is to model the operation and energy flexibility potential of an EV fleet in a district through mono- and bi-directional charging and enable its assessment in terms of self-utilization of local and volatile regional RES surpluses. Results of example residential, office, school and supermarket use cases show an increase in self-utilization of local PV of up to 30% due to EV inclusion, even if PV installation size exceeds legal building code requirements by a factor of two to four. Bi-Directional charging can cut annual grid electricity by up to 30% but require an increase in battery full equivalent cycles of 20%. © 2025 Elsevier B.V., All rights reserved.

2025

A New Proposal of Layer Insertion in Stacked Autoencoder Neural Networks

Autores
Santos Viana, Fd; Pereira, BVL; Santos, M; Soares, C; Almeida Neto, Ad;

Publicação
Progress in Artificial Intelligence - 24th EPIA Conference on Artificial Intelligence, EPIA 2025, Faro, Portugal, October 1-3, 2025, Proceedings, Part I

Abstract
One strategy for constructing an artificial neural network with multiple hidden layers is to insert layers incrementally in stages. However, for this approach to be effective, each newly added layer must be properly aligned with the previous layers to avoid degradation of the network output and preserve the already learned knowledge. Ideally, inserting new layers should expand the network’s search space, enabling it to explore more complex representations and ultimately improve overall performance. In this work, we present a novel method for layer insertion in stacked autoencoder networks. The method developed maintains the learning obtained before the layer insertion and allows the acquisition of new knowledge; therefore, it is denoted collaborative. This approach allows this kind of neural network to evolve and learn effectively, while significantly reducing the design time. Unlike traditional methods, it addresses the common challenges associated with manually defining the number of layers and the number of neurons in each layer. By automating this aspect of network design, the proposed method promotes scalability and adaptability between tasks. The effectiveness of the approach was validated on multiple binary classification datasets using neural networks initialized with various architectures. The experimental results demonstrate that the method maintains performance while streamlining the architectural design process. © 2025 Elsevier B.V., All rights reserved.

2025

Reusing ML Models in Dynamic Data Environments: Data Similarity-Based Approach for Efficient MLOps

Autores
Peixoto, E; Torres, D; Carneiro, D; Silva, B; Marques, R;

Publicação
BIG DATA AND COGNITIVE COMPUTING

Abstract
The rapid integration of Machine Learning (ML) in organizational practices has driven demand for substantial computational resources, incurring both high economic costs and environmental impact, particularly from energy consumption. This challenge is amplified in dynamic data environments, where ML models must be frequently retrained to adapt to evolving data patterns. To address this, more sustainable Machine Learning Operations (MLOps) pipelines are needed for reducing environmental impacts while maintaining model accuracy. In this paper, we propose a model reuse approach based on data similarity metrics, which allows organizations to leverage previously trained models where applicable. We introduce a tailored set of meta-features to characterize data windows, enabling efficient similarity assessment between historical and new data. The effectiveness of the proposed method is validated across multiple ML tasks using the cosine and Bray-Curtis distance functions, which evaluate both model reuse rates and the performance of reused models relative to newly trained alternatives. The results indicate that the proposed approach can reduce the frequency of model retraining by up to 70% to 90% while maintaining or even improving predictive performance, contributing to more resource-efficient and sustainable MLOps practices.

  • 158
  • 4391