Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2020

ECML PKDD 2020 Workshops - Workshops of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020): SoGood 2020, PDFL 2020, MLCS 2020, NFMCP 2020, DINA 2020, EDML 2020, XKDD 2020 and INRA 2020, Ghent, Belgium, September 14-18, 2020, Proceedings

Authors
Koprinska, I; Kamp, M; Appice, A; Loglisci, C; Antonie, L; Zimmermann, A; Guidotti, R; Özgöbek, O; Ribeiro, RP; Gavaldà, R; Gama, J; Adilova, L; Krishnamurthy, Y; Ferreira, PM; Malerba, D; Medeiros, I; Ceci, M; Manco, G; Masciari, E; Ras, ZW; Christen, P; Ntoutsi, E; Schubert, E; Zimek, A; Monreale, A; Biecek, P; Rinzivillo, S; Kille, B; Lommatzsch, A; Gulla, JA;

Publication
PKDD/ECML Workshops

Abstract

2020

Clustering genomic words in human DNA using peaks and trends of distributions

Authors
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Brito, P; Afreixo, V;

Publication
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Abstract
In this work we seek clusters of genomic words in human DNA by studying their inter-word lag distributions. Due to the particularly spiked nature of these histograms, a clustering procedure is proposed that first decomposes each distribution into a baseline and a peak distribution. An outlier-robust fitting method is used to estimate the baseline distribution (the 'trend'), and a sparse vector of detrended data captures the peak structure. A simulation study demonstrates the effectiveness of the clustering procedure in grouping distributions with similar peak behavior and/or baseline features. The procedure is applied to investigate similarities between the distribution patterns of genomic words of lengths 3 and 5 in the human genome. These experiments demonstrate the potential of the new method for identifying words with similar distance patterns.

2020

New contributions for the comparison of community detection algorithms in attributed networks

Authors
Vieira, AR; Campos, P; Brito, P;

Publication
JOURNAL OF COMPLEX NETWORKS

Abstract
Community detection techniques use only the information about the network topology to find communities in networks Similarly, classic clustering techniques for vector data consider only the information about the values of the attributes describing the objects to find clusters. In real-world networks, however, in addition to the information about the network topology, usually there is information about the attributes describing the vertices that can also be used to find communities. Using both the information about the network topology and about the attributes describing the vertices can improve the algorithms' results. Therefore, authors started investigating methods for community detection in attributed networks. In the past years, several methods were proposed to uncover this task, partitioning a graph into sub-graphs of vertices that are densely connected and similar in terms of their descriptions. This article focuses on the analysis and comparison of some of the proposed methods for community detection in attributed networks. For that purpose, several applications to both synthetic and real networks are conducted. Experiments are performed on both weighted and unweighted graphs. The objective is to establish which methods perform generally better according to the validation measures and to investigate their sensitivity to changes in the networks' structure and homogeneity.

2020

Building Robust Prediction Models for Defective Sensor Data Using Artificial Neural Networks

Authors
de Sa, CR; Shekar, AK; Ferreira, H; Soares, C;

Publication
14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019)

Abstract
Sensors are susceptible to failure when exposed to extreme conditions over long periods of time. Besides they can be affected by noise or electrical interference. Models (Machine Learning or others) obtained from these faulty and noisy sensors may be less reliable. In this paper, we propose a data augmentation approach for making neural networks more robust to missing and faulty sensor data. This approach is shown to be effective in a real life industrial application that uses data of various sensors to predict the wear of an automotive fuel-system component. Empirical results show that the proposed approach leads to more robust neural network in this particular application than existing methods.

2020

Process discovery on geolocation data

Authors
Ribeiro, J; Fontes, T; Soares, C; Borges, JL;

Publication
Transportation Research Procedia

Abstract
Fleet tracking technology collects real-time information about geolocation of vehicles as well as driving-related data. This information is typically used for location monitoring as well as for analysis of routes, vehicles and drivers. From an operational point of view, the geolocation simply identifies the state of a vehicle in terms of positioning and navigation. From a management point of view, the geolocation may be used to infer the state of a vehicle in terms of process (e.g., driving, fueling, maintenance, or lunch break). Meaningful information may be extracted from these inferred states using process mining. An innovative methodology for inferring process states from geolocation data is proposed in this paper. Also, it is presented the potential of applying process mining techniques on geolocation data for process discovery. © 2020 The Authors. Published by Elsevier B.V.

2020

Factual Question Generation for the Portuguese Language

Authors
Leite, B; Cardoso, HL; Reis, LP; Soares, C;

Publication
International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2020, Novi Sad, Serbia, August 24-26, 2020

Abstract
Artificial Intelligence (AI) has seen numerous applications in the area of Education. Through the use of educational technologies such as Intelligent Tutoring Systems (ITS), learning possibilities have increased significantly. One of the main challenges for the widespread use of ITS is the ability to automatically generate questions. Bearing in mind that the act of questioning has been shown to improve the students learning outcomes, Automatic Question Generation (AQG) has proven to be one of the most important applications for optimizing this process. We present a tool for generating factual questions in Portuguese by proposing three distinct approaches. The first one performs a syntax-based analysis of a given text by using the information obtained from Part-of-speech tagging (PoS) and Named Entity Recognition (NER). The second approach carries out a semantic analysis of the sentences, through Semantic Role Labeling (SRL). The last method extracts the inherent dependencies within sentences using Dependency Parsing. All of these methods are possible thanks to Natural Language Processing (NLP) techniques. For evaluation, we have elaborated a pilot test that was answered by Portuguese teachers. The results verify the potential of these different approaches, opening up the possibility to use them in a teaching environment. © 2020 IEEE.

  • 124
  • 469