2025
Authors
Zhao, RR; You, YQ; Sun, JB; Gama, J; Jiang, J;
Publication
INFORMATION PROCESSING & MANAGEMENT
Abstract
Capricious data streams, marked by random emergence and disappearance of features, are common in practical scenarios such as sensor networks. In existing research, they are mainly handled based on linear classifiers, feature correlation or ensemble of trees. There exist deficiencies such as limited learning capacity and high time cost. More importantly, the concept drift problem in them receives little attention. Therefore, drifting capricious data streams are focused on in this paper, and a new algorithm DCFHT (online learning from Drifting Capricious data streams with Flexible Hoeffding Tree) is proposed based on a single Hoeffding tree. DCFHT can achieve non-linear modeling and adaptation to drifts. First, DCFHT dynamically reuses and restructures the tree. The reusable information includes the tree structure and the information stored in each node. The restructuring process ensures that the Hoeffding tree dynamically aligns with the latest universal feature space. Second, DCFHT adapts to drifts in an informed way. When a drift is detected, DCFHT starts training a backup learner until it reaches the ability to replace the primary learner. Various experiments on 22 public and 15 synthetic datasets show that it is not only more accurate, but also maintains relatively low runtime on capricious data streams.
2025
Authors
Zhao, RR; Sun, JB; Gama, J; Jiang, J;
Publication
40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING
Abstract
Capricious data streams make no assumptions on feature space dynamics and are mainly handled based on feature correlation, linear classifier or ensemble of trees. There exist deficiencies such as limited learning capacity, high time cost and low interpretability. To enhance effectiveness and efficiency, capricious data streams are handled through a single tree in this paper, and the proposed algorithm is named OCFHT (Online learning from Capricious data streams with Flexible Hoeffding Tree). OCFHT does not rely on the correlation pattern among features and can achieve non-linear modeling. Its performance is verified by various experiments on 18 public datasets, showing that it is not only more accurate than state-of-the-art algorithms, but also runs faster.
2025
Authors
Reis, P; Serra, AP; Gama, J;
Publication
CoRR
Abstract
2025
Authors
Paim, AM; Gama, J; Veloso, B; Enembreck, F; Ribeiro, RP;
Publication
40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING
Abstract
The learning from continuous data streams is a relevant area within machine learning, focusing on the creation and updating of predictive models in real time as new data becomes available for training and prediction. Among the most widely used methods for this type of task, Hoeffding Trees are highly valued for their simplicity and robustness across a variety of applications and are considered the primary choice for generating decision trees in data stream contexts. However, Hoeffding Trees tend to continuously expand as new data is incorporated, resulting in increased processing time and memory consumption, often without providing significant gains in accuracy. In this study, we propose an instance selection scheme that combines different strategies to regularize Hoeffding Trees and their variants, mitigating excessive growth without compromising model accuracy. The method selects misclassified instances and a fraction of correctly classified instances during the training phase. After extensive experimental evaluation, the instance selection scheme demonstrates superior predictive performance compared to the original models (without selection), for both real and synthetic datasets for data streams, using a reduced subset of examples. Additionally, the method achieves relevant improvements in processing time, model complexity, and memory consumption, highlighting the effectiveness of the proposed instance selection scheme.
2025
Authors
Shaji, N; Tabassum, S; Ribeiro, RP; Gama, J; Santana, P; Garcia, A;
Publication
COMPLEX NETWORKS & THEIR APPLICATIONS XIII, COMPLEX NETWORKS 2024, VOL 1
Abstract
Waste transport management is a critical sector where maintaining accurate records and preventing fraudulent or illegal activities is essential for regulatory compliance, environmental protection, and public safety. However, monitoring and analyzing large-scale waste transport records to identify suspicious patterns or anomalies is a complex task. These records often involve multiple entities and exhibit variability in waste flows between them. Traditional anomaly detection methods relying solely on individual transaction data, may struggle to capture the deeper, network-level anomalies that emerge from the interactions between entities. To address this complexity, we propose a hybrid approach that integrates network-based measures with machine learning techniques for anomaly detection in waste transport data. Our method leverages advanced graph analysis techniques, such as sub-graph detection, community structure analysis, and centrality measures, to extract meaningful features that describe the network's topology. We also introduce novel metrics for edge weight disparities. Further, advanced machine learning techniques, including clustering, neural network, density-based, and ensemble methods are applied to these structural features to enhance and refine the identification of anomalous behaviors.
2025
Authors
Alves, B; Almeida, A; Silva, C; Pais, D; Ribeiro, RP; Gama, J; Fernandes, JM; Brás, S; Sebastiao, R;
Publication
HUMAN AND ARTIFICIAL RATIONALITIES. ADVANCES IN COGNITION, COMPUTATION, AND CONSCIOUSNESS, HAR 2024
Abstract
Pain is a highly subjective phenomenon that depends on multiple factors. The common methods used to evaluate pain require the person to be awakened and cooperative, which may not always be possible. Moreover, such methods are subject to non-quantifiable influences, namely the impact of an individual's emotional state on how pain is perceived or how negative emotions may exacerbate pain perception, while positive emotions may attenuate it. The goal of this study was to conduct a novel protocol for pain induction with emotional elicitation and assess its feasibility. In this protocol, the physiological responses were monitored, and collected, through Electrocardiogram, Electrodermal Activity, and surface Electromyogram signals. Along the protocol, the pain perception was evaluated using a 0-10 numerical rating scale and by registering the time from the pain stimulus beginning to the Pain and Tolerance Thresholds. This study comprised three emotional sessions, negative, positive, and neutral, which were performed through videos of excerpts of terror, comedy, and documentary films, respectively, followed by pain induction using the Cold Pressor Task (CPT). A total of 56 participants performed the study, with a CPT mean time of about 91.70 +/- 39.64 s among all the sessions. The conducted protocol was considered feasible and safe as it allowed the collection of physiological data, pain, and questionnaires' reports from 56 participants, without any harm to them. Moreover, the collected data can be further used to assess how emotional conditions influence pain perception and to provide better emotion-calibrated pain recognition systems based on physiological signals.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.