2009
Authors
Mendes Moreira, J; Jorge, AM; Soares, C; de Sousa, JF;
Publication
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION
Abstract
Integration methods for ensemble learning can use two different approaches: combination or selection. The combination approach (also called fusion) consists on the combination of the predictions obtained by different models in the ensemble to obtain the final ensemble predication. The selection approach selects one (or more) models from the ensemble according to the prediction performance of these models on similar data from the validation set. Usually, the method to select similar data is the k-nearest neighbors with the Euclidean distance. In this paper we discuss other approaches to obtain similar data for the regression problem. We show that using similarity measures according to the target values improves results. We also show that selecting dynamically several models for the prediction task increases prediction accuracy comparing to the selection of just one model.
2006
Authors
Moreira, JM; Jorge, AM; Soares, C; de Sousa, JF;
Publication
FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS
Abstract
This paper describes the study on example selection in regression problems using mu-SVM (Support Vector Machine) linear as prediction algorithm. The motivation case is a study done on real data for a problem of bus trip time prediction. In this study we use three different training sets: all the examples, examples from past days similar to the day where prediction is needed, and examples selected by a CART regression tree. Then, we verify if the CART based example selection approach is appropriate on different regression data sets. The experimental results obtained are promising.
2012
Authors
Jorge, AM; Mendes Moreira, J; De Sousa, JF; Soares, C; Azevedo, PJ;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
In this paper we study the deviation of bus trip duration and its causes. Deviations are obtained by comparing scheduled times against actual trip duration and are either delays or early arrivals. We use distribution rules, a kind of association rules that may have continuous distributions on the consequent. Distribution rules allow the systematic identification of particular conditions, which we call contexts, under which the distribution of trip time deviations differs significantly from the overall deviation distribution. After identifying specific causes of delay the bus company operational managers can make adjustments to the timetables increasing punctuality without disrupting the service. © Springer-Verlag Berlin Heidelberg 2012.
2009
Authors
Moreira, JM; Soares, C; Jorge, AM; de Sousa, JF;
Publication
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS
Abstract
Travel time prediction is an important tool for the planning tasks of mass transit and logistics companies. ID this paper we investigate the use of regression methods for the problem of predicting the travel time of buses in a Portuguese public transportation company. More specifically, we empirically evaluate the impact of varying parameters on the performance of different regression algorithms, such as support vector machines (SVM), random forests (RF) and projection pursuit, regression (PPR). We also evaluate the impact of the focusing tusks (example selection; domain value definition and feature selection) in the accuracy of those algorithms. Concerning the algorithms, we observe that 1) RF is quite robust to the choice of parameters and focusing methods: 2) the choice of parameters for SVM can be made independently of focusing methods while 3) for PPR they should be selected simultaneously. For the focusing methods, we observe that a stronger effect is obtained using example selection, particularly in combination with SVM.
2024
Authors
Mendes Neves, T; Seca, D; Sousa, R; Ribeiro, C; Mendes Moreira, J;
Publication
COMPUTATIONAL ECONOMICS
Abstract
As many automated algorithms find their way into the IT systems of the banking sector, having a way to validate and interpret the results from these algorithms can lead to a substantial reduction in the risks associated with automation. Usually, validating these pricing mechanisms requires human resources to manually analyze and validate large quantities of data. There is a lack of effective methods that analyze the time series and understand if what is currently happening is plausible based on previous data, without information about the variables used to calculate the price of the asset. This paper describes an implementation of a process that allows us to validate many data points automatically. We explore the K-Nearest Neighbors algorithm to find coincident patterns in financial time series, allowing us to detect anomalies, outliers, and data points that do not follow normal behavior. This system allows quicker detection of defective calculations that would otherwise result in the incorrect pricing of financial assets. Furthermore, our method does not require knowledge about the variables used to calculate the time series being analyzed. Our proposal uses pattern matching and can validate more than 58% of instances, substantially improving human risk analysts' efficiency. The proposal is completely transparent, allowing analysts to understand how the algorithm made its decision, increasing the trustworthiness of the method.
2022
Authors
Couceiro, M; Lima, IR; Ulisses, A; Neves, TM; Moreira, JM;
Publication
Proceedings of the 10th International Conference on Sport Sciences Research and Technology Support, icSPORTS 2022, Valletta, Malta, October 27-28, 2022.
Abstract
The broadcast of audio-video sports content is a field with increasingly larger audiences demanding higher quality content and involvement. This growth creates the necessity to develop more content to engage the users and keep this trend. Otherwise, it may stall or even diminish. Therefore, enhancing the user experience, engagement, and involvement during live sports event broadcasts is of utmost importance. This paper proposes a solution to extract event’s information from video, resorting to Computer Vision techniques and Deep Learning algorithms. More specifically, the project encompassed the definition and implementation of field registration, object detection and tracking tasks. Focusing on football sports events, a novel dataset combining several video sources was created and used for analysis and metadata extraction. In particular, the proposed solution can detect and track players with acceptable precision using state-of-the-art methods, like YOLOv5 and DeepSORT. Furthermore, resorting to unsupervised learning techniques, the system provides team segmentation based on the colour of the players’ kits. A series of visual representations regarding the players’ movements on the field enables broadcast enrichment and increased user experience. The presented solution is framed in the H2020 DataCloud project and will be deployed in a cloud environment simplifying its access and utilisation. Copyright © 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.