Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2023

Data Stream Analytics

Autores
Aguilar Ruiz, S; Bifet, A; Gama, J;

Publicação
Analytics

Abstract
[No abstract available]

FecharLer Abstract

2023

Estimating Instantaneous Vehicle Emissions

Autores
Andrade, T; Gama, J;

Publicação
Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, SAC 2023, Tallinn, Estonia, March 27-31, 2023

Abstract

2023

A DTW Approach for Complex Data A Case Study with Network Data Streams

Autores
Silva, PR; Vinagre, J; Gama, J;

Publicação
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023

Abstract
Dynamic Time Warping (DTW) is a robust method to measure the similarity between two sequences. This paper proposes a method based on DTW to analyse high-speed data streams. The central idea is to decompose the network traffic into sequences of histograms of packet sizes and then calculate the distance between pairs of such sequences using DTW with Kullback-Leibler (KL) distance. As a baseline, we also compute the Euclidean Distance between the sequences of histograms. Since our preliminary experiments indicate that the distance between two sequences falls within a different range of values for distinct types of streams, we then exploit this distance information for stream classification using a Random Forest. The approach was investigated using recent internet traffic data from a telecommunications company. To illustrate the application of our approach, we conducted a case study with encrypted Internet Protocol Television (IPTV) network traffic data. The goal was to use our DTW-based approach to detect the video codec used in the streams, as well as the IPTV channel. Results strongly suggest that the DTW distance value between the data streams is highly informative for such classification tasks.

FecharLer Abstract

2023

Identification of morphologically cryptic species with computer vision models: wall lizards (Squamata: Lacertidae: Podarcis) as a case study

Autores
Pinho, C; Kaliontzopoulou, A; Ferreira, CA; Gama, J;

Publicação
ZOOLOGICAL JOURNAL OF THE LINNEAN SOCIETY

Abstract
Automated image classification is a thriving field of machine learning, and various successful applications dealing with biological images have recently emerged. In this work, we address the ability of these methods to identify species that are difficult to tell apart by humans due to their morphological similarity. We focus on distinguishing species of wall lizards, namely those belonging to the Podarcis hispanicus species complex, which constitutes a well-known example of cryptic morphological variation. We address two classification experiments: (1) assignment of images of the morphologically relatively distinct P. bocagei and P. lusitanicus; and (2) distinction between the overall more cryptic nine taxa that compose this complex. We used four datasets (two image perspectives and individuals of the two sexes) and three deep-learning models to address each problem. Our results suggest a high ability of the models to identify the correct species, especially when combining predictions from different perspectives and models (accuracy of 95.9% and 97.1% for females and males, respectively, in the two-class case; and of 91.2% to 93.5% for females and males, respectively, in the nine-class case). Overall, these results establish deep-learning models as an important tool for field identification and monitoring of cryptic species complexes, alleviating the burden of expert or genetic identification.

FecharLer Abstract

2023

Data-driven predictive maintenance framework for railway systems

Autores
Meira, J; Veloso, B; Bolon Canedo, V; Marreiros, G; Alonso Betanzos, A; Gama, J;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
The emergence of the Industry 4.0 trend brings automation and data exchange to industrial manufacturing. Using computational systems and IoT devices allows businesses to collect and deal with vast volumes of sensorial and business process data. The growing and proliferation of big data and machine learning technologies enable strategic decisions based on the analyzed data. This study suggests a data-driven predictive maintenance framework for the air production unit (APU) system of a train of Metro do Porto. The proposed method assists in detecting failures and errors in machinery before they reach critical stages. We present an anomaly detection model following an unsupervised approach, combining the Half-Space-trees method with One Class K Nearest Neighbor, adapted to deal with data streams. We evaluate and compare our approach with the Half-Space-Trees method applied without the One Class K Nearest Neighbor combination. Our model produced few type-I errors, significantly increasing the value of precision when compared to the Half-Space-Trees model. Our proposal achieved high anomaly detection performance, predicting most of the catastrophic failures of the APU train system.

FecharLer Abstract

2023

Towards federated learning: An overview of methods and applications

Autores
Silva, PR; Vinagre, J; Gama, J;

Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Federated learning (FL) is a collaborative, decentralized privacy-preserving method to attach the challenges of storing data and data privacy. Artificial intelligence, machine learning, smart devices, and deep learning have strongly marked the last years. Two challenges arose in data science as a result. First, the regulation protected the data by creating the General Data Protection Regulation, in which organizations are not allowed to keep or transfer data without the owner's authorization. Another challenge is the large volume of data generated in the era of big data, and keeping that data in one only server becomes increasingly tricky. Therefore, the data is allocated into different locations or generated by devices, creating the need to build models or perform calculations without transferring data to a single location. The new term FL emerged as a sub-area of machine learning that aims to solve the challenge of making distributed models with privacy considerations. This survey starts by describing relevant concepts, definitions, and methods, followed by an in-depth investigation of federated model evaluation. Finally, we discuss three promising applications for further research: anomaly detection, distributed data streams, and graph representation.This article is categorized under:Technologies > Machine LearningTechnologies > Artificial Intelligence

FecharLer Abstract