Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2026

ImageSeek: A Hybrid Text-to-Image Image Retrieval System for Domain-Specific Collections

Autores
Duarte, R; Silva, R; Branco, A; Proença, H; Campos, R;

Publicação
ECIR (4)

Abstract
Large image collections are typically organized around basic metadata and keyword tags, making content discovery challenging for users seeking specific visual information. Although images may be accompanied by descriptive text, traditional retrieval systems often struggle to bridge the semantic gap between textual descriptions and visual content. In this demo, we present ImageSeek, a hybrid text-to-image retrieval system designed to enhance search effectiveness by combining text and image-based retrieval methods through an asymmetric score adjustment mechanism. The system leverages multilingual CLIP models to encode both visual and textual information, creating unified representations for cross-modal retrieval. Users can search through natural language queries in any supported language, with results ranked using a hybrid approach that treats image-based retrieval as a reliable baseline while harmonizing text-based scores through position-dependent adjustments. The demonstration system operates on a dataset of 42,333 images from the Portuguese Presidency website, providing an appropriate testbed for multimodal retrieval performance. The web application enables direct comparison between conventional CLIP-based retrieval and our hybrid approach, supporting image searches under the same conditions on external platforms, including Google Images and the Arquivo.pt image search system, enabling comparative analysis of the results. To evaluate its effectiveness, ImageSeek allows users to experience differences between retrieval modes while exploring domain-specific visual content.

2026

A Comparative Study of Deep Learning Approaches for Leishmania Detection in Microscopic Images

Autores
Monteiro, E; Nogueira, DM; Gomes, EF;

Publicação
BIOSTEC (1)

Abstract

2026

Turning web data into official statistics: Classifying Portuguese retail products with NLP models

Autores
Machado, JDU; Veloso, B;

Publicação
STATISTICAL JOURNAL OF THE IAOS

Abstract
The growing availability of online data creates new opportunities to improve the timeliness and detail of official statistics, particularly in domains such as price monitoring and inflation measurement. However, leveraging web-scraped data for official use requires alignment with standardized classification frameworks such as the European Classification of Individual Consumption According to Purpose (ECOICOP). We train two natural-language models, a lightweight convolutional neural network (CNN) and a fine-tuned BERTimbau transformer, to classify Portuguese food and beverage items into ECOICOP categories. Using 100,000 product titles scraped from six national supermarket sites and labeled via a human-in-the-loop workflow, the CNN reaches a macro-F1 of 92.19 % with minimal computing cost, while the transformer attains 94.00 %, the first such result for Portuguese. Both models are published on Hugging Face, enabling reproducible inference at scale while the source data remain confidential. The study delivers the first open-source Portuguese ECOICOP classifiers for food and beverage products, a replicable low-resource labeling workflow, and a benchmark of accuracy-speed trade-offs to guide researchers in similar tasks.

2026

A Parametric Information-gain to Improve Online Tree-based Machine Learning Models

Autores
Costa, VV; Costa, D; Veloso, B; Rocha, EM;

Publicação

Abstract
Decision trees are a cornerstone of interpretable machine learning and are widely used for their simplicity and effectiveness in classification tasks. To address the growing need for models that can operate on continuous, unbounded data, decision trees have been reinvented for the data stream setting, where they must learn incrementally under constraints such as limited memory, evolving distributions, and delayed supervision. A critical component of these tree-based models, particularly those based on the Hoeffding Trees, is the split criterion, which determines how the input space is partitioned. This study introduces a new split criterion for stream-based Hoeffding trees, based on a unified five-parameter entropic formulation that generalizes several well-known measures, including Shannon, Gini, Tsallis, and Rényi entropies. While such formulations have been explored in batch learning, their application to streaming scenarios has not been made. By incorporating this criterion into a variety of established streaming classifiers and evaluating performance on standard benchmark datasets, we demonstrate consistent and statistically significant improvements over existing methods, including those implemented in the River library. Notably, we report gains of up to 40% in immediate evaluation metrics, along with consistent wins and some draws on the prequential Macro-F1, with no observed losses against baseline criteria. The generality of the approach introduces additional computational overhead and also enables greater expressiveness and adaptability in handling uncertainty and nonstationary data. This work advances the integration of information-theoretic principles into online learning and highlights the importance of efficient hyperparameter tuning and adaptive entropy selection in streaming environments.

2026

Deep neural networks in medical microbiology for bacterial colonies classification

Autores
José Duarte Pereira; Bruno Veloso; João Gama;

Publicação
Scientific Reports

Abstract
Abstract While automation has transformed many areas inside clinical laboratories, microbiology still relies heavily on manual tasks, particularly the culture of samples on agar plates and their subsequent manual review for microorganism identification and antibiotic susceptibility profiling. Bacterial colony detection and classification require trained professionals, making the process time-consuming and prone to human error. Developing deep learning models to automate these tasks could improve microbiology workflows and accelerate clinical decision-making. In this study we trained and evaluated five object detection architectures (Faster R-CNN and RetinaNet with ResNet-50 and ResNet-101 backbones, and YOLOv8) on the Annotated Germs for Automated Recognition (AGAR) dataset for bacterial colony classification. Transfer learning, cross-subset generalization, and Weighted Box Fusion (WBF) ensemble methods were applied to enhance and characterize performance. Additionally, we created and publicly released a curated dataset of 165 agar plate images containing colonies of S. aureus , P. aeruginosa , and E. coli cultured across four distinct culture media. YOLOv8m achieved a mean Average Precision (mAP) of 69.0% on the AGAR dataset, outperforming the best Detectron2 model (Faster R-CNN ResNet-101, 63.1%) by 5.9 percentage points. A four-model WBF ensemble combining both architectures reached 70.5% mAP (95% CI: 68.4–71.7). Cross-subset evaluation showed that a single model trained on the full dataset generalizes well to individual imaging conditions, making subset-specific fine-tuning largely unnecessary. On the curated dataset, a mixed ensemble reached 58.7% mAP (95% CI: 57.1–63.7). These results demonstrate that architecture choice and training data diversity are the primary drivers of performance for colony detection on agar plates.

2026

A machine learning analysis to identify biomarkers on Holter data of white matter lesions in Fabry disease patients

Autores
Araújo, B; Moura, AR; Veloso, B; Azevedo, O; Gago, MF; Erlhagen, W; Bicho, E; Ferreira, F;

Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Fabry disease (FD) is a rare genetic disorder associated with cardiac abnormalities and often overlooked brain white matter lesions (WMLs). Despite the importance of early WMLs detection, diagnosis is frequently delayed. The aim is to identify electrocardiographic biomarkers linked to WMLs in middle-aged FD patients using machine learning, assessing their potential as non-invasive diagnostic tools. This retrospective study analyzed electrocardiographic data from FD patients aged 40-59. A feature selection process based on variance inflation factor analysis identified nine relevant features, including heart rate variability and QT interval parameters. Machine learning classifiers-logistic regression, support vector machines, random forest, and k-nearest neighbors-were trained and evaluated using accuracy, sensitivity, specificity, and AUC. SHAP (SHapley Additive exPlanations) analysis was used to interpret model predictions. The random forest model achieved the highest accuracy (0.81) using all nine features. A subset consisting of SDANN 5 and QTc Min also performed well (accuracy 0.75) in other models. SHAP analysis highlighted SDANN 5 as a key predictor. Machine learning applied to ECG data shows promise for early WML detection in FD, supporting the integration of computational methods into diagnostics for complex genetic diseases.

  • 10
  • 529