Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2025

A Systematic Literature Review on Multi-label Data Stream Classification

Autores
Oliveira, HF; de Faria, ER; Gama, J; Khan, L; Cerri, R;

Publicação
CoRR

Abstract

2025

Salvador Urban Network Transportation (SUNT): A Landmark Spatiotemporal Dataset for Public Transportation

Autores
Ferreira, MV; Souza, M; Rios, TN; Fernandes, IFC; Nery, J; Gama, J; Bifet, A; Rios, RA;

Publicação
SCIENTIFIC DATA

Abstract
Efficient public transportation management is essential for the development of large urban centers, providing several benefits such as comprehensive coverage of population mobility, reduction of transport costs, better control of traffic congestion, and significant reduction of environmental impact limiting gas emissions and pollution. Realizing these benefits requires a deeply understanding the population and transit patterns and the adoption of approaches to model multiple relations and characteristics efficiently. This work addresses these challenges by providing a novel dataset that includes various public transportation components from three different systems: regular buses, subway, and BRT (Bus Rapid Transit). Our dataset comprises daily information from about 700,000 passengers in Salvador, one of Brazil's largest cities, and local public transportation data with approximately 2,000 vehicles operating across nearly 400 lines, connecting almost 3,000 stops and stations. With data collected from March 2024 to March 2025 at a frequency lower than one minute, SUNT stands as one of the largest, most comprehensive, and openly available urban datasets in the literature.

2025

Data Science: Foundations and Applications - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, Australia, June 10-13, 2025, Proceedings, Part VII

Autores
Wu, X; Spiliopoulou, M; Wang, C; Kumar, V; Cao, L; Zhou, X; Pang, G; Gama, J;

Publicação
PAKDD (7)

Abstract

2025

Data Science: Foundations and Applications - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10-13, 2025, Proceedings, Part VI

Autores
Wu, X; Spiliopoulou, M; Wang, C; Kumar, V; Cao, L; Zhou, X; Pang, G; Gama, J;

Publicação
PAKDD (6)

Abstract

2025

OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery

Autores
Zhang, C; Wu, S; Chen, Y; Aßenmacher, M; Heumann, C; Men, Y; Fan, G; Gama, J;

Publicação
CoRR

Abstract

2025

Online learning from drifting capricious data streams with flexible Hoeffding tree

Autores
Zhao, RR; You, YQ; Sun, JB; Gama, J; Jiang, J;

Publicação
INFORMATION PROCESSING & MANAGEMENT

Abstract
Capricious data streams, marked by random emergence and disappearance of features, are common in practical scenarios such as sensor networks. In existing research, they are mainly handled based on linear classifiers, feature correlation or ensemble of trees. There exist deficiencies such as limited learning capacity and high time cost. More importantly, the concept drift problem in them receives little attention. Therefore, drifting capricious data streams are focused on in this paper, and a new algorithm DCFHT (online learning from Drifting Capricious data streams with Flexible Hoeffding Tree) is proposed based on a single Hoeffding tree. DCFHT can achieve non-linear modeling and adaptation to drifts. First, DCFHT dynamically reuses and restructures the tree. The reusable information includes the tree structure and the information stored in each node. The restructuring process ensures that the Hoeffding tree dynamically aligns with the latest universal feature space. Second, DCFHT adapts to drifts in an informed way. When a drift is detected, DCFHT starts training a backup learner until it reaches the ability to replace the primary learner. Various experiments on 22 public and 15 synthetic datasets show that it is not only more accurate, but also maintains relatively low runtime on capricious data streams.

  • 23
  • 529