Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2025

Online learning from drifting capricious data streams with flexible Hoeffding tree

Authors
Zhao, RR; You, YQ; Sun, JB; Gama, J; Jiang, J;

Publication
INFORMATION PROCESSING & MANAGEMENT

Abstract
Capricious data streams, marked by random emergence and disappearance of features, are common in practical scenarios such as sensor networks. In existing research, they are mainly handled based on linear classifiers, feature correlation or ensemble of trees. There exist deficiencies such as limited learning capacity and high time cost. More importantly, the concept drift problem in them receives little attention. Therefore, drifting capricious data streams are focused on in this paper, and a new algorithm DCFHT (online learning from Drifting Capricious data streams with Flexible Hoeffding Tree) is proposed based on a single Hoeffding tree. DCFHT can achieve non-linear modeling and adaptation to drifts. First, DCFHT dynamically reuses and restructures the tree. The reusable information includes the tree structure and the information stored in each node. The restructuring process ensures that the Hoeffding tree dynamically aligns with the latest universal feature space. Second, DCFHT adapts to drifts in an informed way. When a drift is detected, DCFHT starts training a backup learner until it reaches the ability to replace the primary learner. Various experiments on 22 public and 15 synthetic datasets show that it is not only more accurate, but also maintains relatively low runtime on capricious data streams.

2025

Online Learning from Capricious Data streams with Flexible Hoeffding Tree

Authors
Zhao, RR; Sun, JB; Gama, J; Jiang, J;

Publication
40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING

Abstract
Capricious data streams make no assumptions on feature space dynamics and are mainly handled based on feature correlation, linear classifier or ensemble of trees. There exist deficiencies such as limited learning capacity, high time cost and low interpretability. To enhance effectiveness and efficiency, capricious data streams are handled through a single tree in this paper, and the proposed algorithm is named OCFHT (Online learning from Capricious data streams with Flexible Hoeffding Tree). OCFHT does not rely on the correlation pattern among features and can achieve non-linear modeling. Its performance is verified by various experiments on 18 public datasets, showing that it is not only more accurate than state-of-the-art algorithms, but also runs faster.

  • 94
  • 94