Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2021

Dynamic Topic Modeling Using Social Network Analytics

Authors
Tabassum, S; Gama, J; Azevedo, P; Teixeira, L; Martins, C; Martins, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
Topic modeling or inference has been one of the well-known problems in the area of text mining. It deals with the automatic categorisation of words or documents into similarity groups also known as topics. In most of the social media platforms such as Twitter, Instagram, and Facebook, hashtags are used to define the content of posts. Therefore, modelling of hashtags helps in categorising posts as well as analysing user preferences. In this work, we tried to address this problem involving hashtags that stream in real-time. Our approach encompasses graph of hashtags, dynamic sampling and modularity based community detection over the data from a popular social media engagement application. Further, we analysed the topic clusters' structure and quality using empirical experiments. The results unveil latent semantic relations between hashtags and also show frequent hashtags in a cluster. Moreover, in this approach, the words in different languages are treated synonymously. Besides, we also observed top trending topics and correlated clusters.

2021

Spatiotemporal Road Traffic Anomaly Detection: A Tensor-Based Approach

Authors
Tisljaric, L; Fernandes, S; Caric, T; Gama, J;

Publication
APPLIED SCIENCES-BASEL

Abstract
The increased development of urban areas results in a larger number of vehicles on the road network, leading to traffic congestion, which often leads to potentially dangerous situations that can be described as anomalies. The tensor-based methods emerged only recently in applications related to traffic anomaly detection. They outperform other models regarding simultaneously capturing spatial and temporal components, which are of immense importance in traffic dataset analysis. This paper presents a tensor-based method for extracting the spatiotemporal road traffic patterns represented with the speed transition matrices, with the goal of anomaly detection. A novel anomaly detection approach is presented, which relies on computing the center of mass of the observed traffic patterns. The method was evaluated on a large road traffic dataset and was able to detect the most anomalous parts of the urban road network. By analyzing spatial and temporal components of the most anomalous traffic patterns, sources of anomalies can be identified. Results were validated using the extracted domain knowledge from the Highway Capacity Manual. The anomaly detection model achieved a precision score of 92.88%. Therefore, this method finds its usages for safety experts in detecting potentially dangerous road segments, urban traffic planners, and routing applications.

2021

A new self-organizing map based algorithm for multi-label stream classification

Authors
Cerri, R; Costa Júnior, JD; Faria, ER; Gama, J;

Publication
SAC

Abstract
Several algorithms have been proposed for offline multi-label classification. However, applications in areas such as traffic monitoring, social networks, and sensors produce data continuously, the so called data streams, posing challenges to batch multi-label learning. With the lack of stationarity in the distribution of data streams, new algorithms are needed to online adapt to such changes (concept drift). Also, in realistic applications, changes occur in scenarios with infinitely delayed labels, where the true classes of the arrival instances are never available. We propose an online unsupervised incremental method based on self-organizing maps for multi-label stream classification in scenarios with infinitely delayed labels. We consider the existence of an initial set of labeled instances to train a self-organizing map for each label. The learned models are then used and adapted in an evolving stream to classify new instances, considering that their classes will never be available. We adapt to incremental concept drifts by online updating the weight vectors of winner neurons and the dataset label cardinality. Predictions are obtained using the Bayes rule and the outputs of each neuron, adapting the prior probabilities and conditional probabilities of the classes in the stream. Experiments using synthetic and real datasets show that our method is highly competitive with several ones from the literature, in both stationary and concept drift scenarios.

2021

Tensor decomposition for analysing time-evolving social networks: an overview

Authors
Fernandes, S; Fanaee T, H; Gama, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Social networks are becoming larger and more complex as new ways of collecting social interaction data arise (namely from online social networks, mobile devices sensors, ...). These networks are often large-scale and of high dimensionality. Therefore, dealing with such networks became a challenging task. An intuitive way to deal with this complexity is to resort to tensors. In this context, the application of tensor decomposition has proven its usefulness in modelling and mining these networks: it has not only been applied for exploratory analysis (thus allowing the discovery of interaction patterns), but also for more demanding and elaborated tasks such as community detection and link prediction. In this work, we provide an overview of the methods based on tensor decomposition for the purpose of analysing time-evolving social networks from various perspectives: from community detection, link prediction and anomaly/event detection to network summarization and visualization. In more detail, we discuss the ideas exploited to carry out each social network analysis task as well as its limitations in order to give a complete coverage of the topic.

2021

Using network features for credit scoring in microfinance

Authors
Paraiso, P; Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers' credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of experiments concerning feature selection, strategies to deal with imbalanced datasets and algorithm choice, to define a baseline model. This model is, then, compared to others adding network features to the original ones. For that comparison, we generate a network that links a given user to its phone book contacts which are users of a given mobile application, taking into account the ethics and privacy concerns involved, and use some feature extraction techniques, such as the introduction of centrality measures and the definition of node embeddings, in order to capture certain aspects of the network's topology. Several node embedding algorithms are tested, but only Node2Vec proves to be significantly better than the baseline model, applying Friedman's post hoc tests. This node embedding algorithm outperforms all the other, representing a relative improvement, in comparison with the baseline model, of 5.74% on the mean accuracy, 7.13% on the area under the Receiver Operating Characteristic curve and 30.83% on the Kolmogorov-Smirnov statistic scores. This method, therefore, proves to be very promising when trying to discriminate between "good" and "bad" customers, in credit scoring classification problems.

2021

Artificial intelligence, cyber-threats and Industry 4.0: challenges and opportunities

Authors
Becue, A; Praca, I; Gama, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
This survey paper discusses opportunities and threats of using artificial intelligence (AI) technology in the manufacturing sector with consideration for offensive and defensive uses of such technology. It starts with an introduction of Industry 4.0 concept and an understanding of AI use in this context. Then provides elements of security principles and detection techniques applied to operational technology (OT) which forms the main attack surface of manufacturing systems. As some intrusion detection systems (IDS) already involve some AI-based techniques, we focus on existing machine-learning and data-mining based techniques in use for intrusion detection. This article presents the major strengths and weaknesses of the main techniques in use. We also discuss an assessment of their relevance for application to OT, from the manufacturer point of view. Another part of the paper introduces the essential drivers and principles of Industry 4.0, providing insights on the advent of AI in manufacturing systems as well as an understanding of the new set of challenges it implies. AI-based techniques for production monitoring, optimisation and control are proposed with insights on several application cases. The related technical, operational and security challenges are discussed and an understanding of the impact of such transition on current security practices is then provided in more details. The final part of the report further develops a vision of security challenges for Industry 4.0. It addresses aspects of orchestration of distributed detection techniques, introduces an approach to adversarial/robust AI development and concludes with human-machine behaviour monitoring requirements.

  • 141
  • 516