2026
Autores
Dutra, I; Pechenizkiy, M; Cortez, P; Pashami, S; Pasquali, A; Moniz, N; Jorge, AM; Soares, C; Abreu, PH; Gama, J;
Publicação
ECML/PKDD (10)
Abstract
2026
Autores
Mendes Neves, T; Meireles, L; Mendes Moreira, JC;
Publicação
Lecture Notes in Computer Science
Abstract
Large Events Models (LEMs) are a class of models designed to predict and analyze the sequence of events in soccer matches, capturing the complex dynamics of the game. The original LEM framework, based on a chain of classifiers, faced challenges such as synchronization, scalability issues, and limited context utilization. This paper proposes a unified and scalable approach to model soccer events using a tabular autoregressive model. Our models demonstrate significant improvements over the original LEM, achieving higher accuracy in event prediction and better simulation quality, while also offering greater flexibility and scalability. The unified LEM framework enables a wide range of applications in soccer analytics that we display in this paper, including real-time match outcome prediction, player performance analysis, and game simulation, serving as a general solution for many problems in the field. © 2025 Elsevier B.V., All rights reserved.
2025
Autores
Mazarei, A; Sousa, R; Mendes Moreira, J; Molchanov, S; Ferreira, HM;
Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
Outlier detection is a widely used technique for identifying anomalous or exceptional events across various contexts. It has proven to be valuable in applications like fault detection, fraud detection, and real-time monitoring systems. Detecting outliers in real time is crucial in several industries, such as financial fraud detection and quality control in manufacturing processes. In the context of big data, the amount of data generated is enormous, and traditional batch mode methods are not practical since the entire dataset is not available. The limited computational resources further compound this issue. Boxplot is a widely used batch mode algorithm for outlier detection that involves several derivations. However, the lack of an incremental closed form for statistical calculations during boxplot construction poses considerable challenges for its application within the realm of big data. We propose an incremental/online version of the boxplot algorithm to address these challenges. Our proposed algorithm is based on an approximation approach that involves numerical integration of the histogram and calculation of the cumulative distribution function. This approach is independent of the dataset's distribution, making it effective for all types of distributions, whether skewed or not. To assess the efficacy of the proposed algorithm, we conducted tests using simulated datasets featuring varying degrees of skewness. Additionally, we applied the algorithm to a real-world dataset concerning software fault detection, which posed a considerable challenge. The experimental results underscored the robust performance of our proposed algorithm, highlighting its efficacy comparable to batch mode methods that access the entire dataset. Our online boxplot method, leveraging dataset distribution to define whiskers, consistently achieved exceptional outlier detection results. Notably, our algorithm demonstrated computational efficiency, maintaining constant memory usage with minimal hyperparameter tuning.
2025
Autores
da Silva, JP; Nogueira, AR; Pinto, J; Curral, M; Alves, AC; Sousa, R;
Publicação
EXPERT SYSTEMS
Abstract
Integrating Industry 4.0 and Quality 4.0 optimises manufacturing through IoT and ML, improving processes and product quality. The primary challenge involves identifying patterns in computer numerical control (CNC) machining time-series data to boost manufacturing quality control. The proposed solution involves an experimental study comparing one-class and binary classification algorithms. This study aims to classify time-series data from CNC turning machines, offering insight into monitoring and adjusting tool wear to maintain product quality. The methodology entails extracting spectral features from time-series data to train both one-class and binary classification algorithms, assessing their effectiveness and computational efficiency. Although certain models consistently outperform others, determining the best performing is not possible, as a trade-off between classification and computational performance is observed, with gradient boosting standing out for effectively balancing both aspects. Thus, the choice between one-class and binary classification ultimately relies on dataset's features and task objectives.
2025
Autores
Cunha, LF; Guimarães, N; Mendes, A; Campos, R; Jorge, A;
Publicação
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V
Abstract
In healthcare, diagnoses usually rely on physician expertise. However, complex cases may benefit from consulting similar past clinical reports cases. In this paper, we present MedLink (http://medlink.inesctec.pt), a tool that given a free-text medical report, retrieves and ranks relevant clinical case reports published in health conferences and journals, aiming to support clinical decision-making, particularly in challenging or complex diagnoses. To this regard, we trained two BERT models on the sentence similarity task: a bi-encoder for retrieval and a cross-encoder for reranking. To evaluate our approach, we used 10 medical reports and asked a physician to rank the top 10 most relevant published case reports for each one. Our results show that MedLink’s ranking model achieved NDCG@10 of 0.747. Our demo also includes the visualization of clinical entities (using a NER model) and the production of a textual explanation (using a LLM) to ease comparison and contrasting between reports. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2025
Autores
Campos, R; Jorge, M; Jatowt, A; Bhatia, S; Litvak, M;
Publicação
CEUR Workshop Proceedings
Abstract
[No abstract available]
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.