2026
Authors
Duarte, R; Silva, R; Branco, A; Proença, H; Campos, R;
Publication
ECIR (4)
Abstract
Large image collections are typically organized around basic metadata and keyword tags, making content discovery challenging for users seeking specific visual information. Although images may be accompanied by descriptive text, traditional retrieval systems often struggle to bridge the semantic gap between textual descriptions and visual content. In this demo, we present ImageSeek, a hybrid text-to-image retrieval system designed to enhance search effectiveness by combining text and image-based retrieval methods through an asymmetric score adjustment mechanism. The system leverages multilingual CLIP models to encode both visual and textual information, creating unified representations for cross-modal retrieval. Users can search through natural language queries in any supported language, with results ranked using a hybrid approach that treats image-based retrieval as a reliable baseline while harmonizing text-based scores through position-dependent adjustments. The demonstration system operates on a dataset of 42,333 images from the Portuguese Presidency website, providing an appropriate testbed for multimodal retrieval performance. The web application enables direct comparison between conventional CLIP-based retrieval and our hybrid approach, supporting image searches under the same conditions on external platforms, including Google Images and the Arquivo.pt image search system, enabling comparative analysis of the results. To evaluate its effectiveness, ImageSeek allows users to experience differences between retrieval modes while exploring domain-specific visual content.
2026
Authors
Monteiro, E; Nogueira, DM; Gomes, EF;
Publication
BIOSTEC (1)
Abstract
2026
Authors
Machado, JDU; Veloso, B;
Publication
STATISTICAL JOURNAL OF THE IAOS
Abstract
The growing availability of online data creates new opportunities to improve the timeliness and detail of official statistics, particularly in domains such as price monitoring and inflation measurement. However, leveraging web-scraped data for official use requires alignment with standardized classification frameworks such as the European Classification of Individual Consumption According to Purpose (ECOICOP). We train two natural-language models, a lightweight convolutional neural network (CNN) and a fine-tuned BERTimbau transformer, to classify Portuguese food and beverage items into ECOICOP categories. Using 100,000 product titles scraped from six national supermarket sites and labeled via a human-in-the-loop workflow, the CNN reaches a macro-F1 of 92.19 % with minimal computing cost, while the transformer attains 94.00 %, the first such result for Portuguese. Both models are published on Hugging Face, enabling reproducible inference at scale while the source data remain confidential. The study delivers the first open-source Portuguese ECOICOP classifiers for food and beverage products, a replicable low-resource labeling workflow, and a benchmark of accuracy-speed trade-offs to guide researchers in similar tasks.
2026
Authors
Costa, VV; Costa, D; Veloso, B; Rocha, EM;
Publication
Abstract
2026
Authors
José Duarte Pereira; Bruno Veloso; João Gama;
Publication
Scientific Reports
Abstract
2026
Authors
Araújo, B; Moura, AR; Veloso, B; Azevedo, O; Gago, MF; Erlhagen, W; Bicho, E; Ferreira, F;
Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
Fabry disease (FD) is a rare genetic disorder associated with cardiac abnormalities and often overlooked brain white matter lesions (WMLs). Despite the importance of early WMLs detection, diagnosis is frequently delayed. The aim is to identify electrocardiographic biomarkers linked to WMLs in middle-aged FD patients using machine learning, assessing their potential as non-invasive diagnostic tools. This retrospective study analyzed electrocardiographic data from FD patients aged 40-59. A feature selection process based on variance inflation factor analysis identified nine relevant features, including heart rate variability and QT interval parameters. Machine learning classifiers-logistic regression, support vector machines, random forest, and k-nearest neighbors-were trained and evaluated using accuracy, sensitivity, specificity, and AUC. SHAP (SHapley Additive exPlanations) analysis was used to interpret model predictions. The random forest model achieved the highest accuracy (0.81) using all nine features. A subset consisting of SDANN 5 and QTc Min also performed well (accuracy 0.75) in other models. SHAP analysis highlighted SDANN 5 as a key predictor. Machine learning applied to ECG data shows promise for early WML detection in FD, supporting the integration of computational methods into diagnostics for complex genetic diseases.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.