Publicacoes - INESC TEC

Publicações

2023

Predicting Model Training Time to Optimize Distributed Machine Learning Applications

Autores
Guimaraes, M; Carneiro, D; Palumbo, G; Oliveira, F; Oliveira, O; Alves, V; Novais, P;

Publicação
ELECTRONICS

Abstract
Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed datasets. In this paper, we describe CEDEs-a distributed learning system in which models are heterogeneous distributed Ensembles, i.e., complex models constituted by different base models, trained with different and distributed subsets of data. Specifically, we address the issue of predicting the training time of a given model, given its characteristics and the characteristics of the data. Given that the creation of an Ensemble may imply the training of hundreds of base models, information about the predicted duration of each of these individual tasks is paramount for an efficient management of the cluster's computational resources and for minimizing makespan, i.e., the time it takes to train the whole Ensemble. Results show that the proposed approach is able to predict the training time of Decision Trees with an average error of 0.103 s, and the training time of Neural Networks with an average error of 21.263 s. We also show how results depend significantly on the hyperparameters of the model and on the characteristics of the input data.

FecharLer Abstract

2023

Multitask learning approach for lung nodule segmentation and classification in CT images

Autores
Fernandes, L; Oliveira, HP;

Publicação
IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, Istanbul, Turkiye, December 5-8, 2023

Abstract
Amongst the different types of cancer, lung cancer is the one with the highest mortality rate and consequently, there is an urgent need to develop early detection methods to improve the survival probabilities of the patients. Due to the millions of deaths that are caused annually by cancer, there is large interest int the scientific community to developed deep learning models that can be employed in computer aided diagnostic tools.Currently, in the literature, there are several works in the Radiomics field that try to develop new solutions by employing learning models for lung nodule classification. However, in these types of application, it is usually required to extract the lung nodule from the input images, while using a segmentation mask made by a radiologist. This means that in a clinical scenario, to be able to employ the developed learning models, it is required first to manually segment the lung nodule. Considering the fact that several patients are attended daily in the hospital with suspicion of lung cancer, the segmentation of each lung nodule would become a tiresome task. Furthermore, the available algorithms for automatic lung nodule segmentation are not efficient enough to be used in a real application.In response to the current limitations of the state of the art, the proposed work attempts to evaluate a multitasking approach where both the segmentation and the classification task are executed in parallel. As a baseline, we also study a sequential approach where first we employ DL models to segment the lung nodule, corp the lung nodule from the input image and then finally, we classify the cropped nodule. Our results show that the multitasking approach is better than to sequentially execute the segmentation and classification task for lung nodule classification. For instances, while the multitasking approach was able to achieve an AUC of 84.49% in the classification task, the sequential approach was only able to achieve an AUC of 72.43%. These results show that the proposed multitasking approach can become a viable alternative to the classification and segmentation of lung nodules.

FecharLer Abstract

2023

From Real-Time Marketing to Corporate Social Responsibility

Autores
Carvalho, CL; Barbosa, B; Santos, CA;

Publicação
Advances in Business Strategy and Competitive Advantage

Abstract
Social media strategies are commonly adopted by large and SMEs due to the expected impacts on customer engagement, branding, sales, and overall company performance. One particularly interesting strategy conducted on social media is real-time marketing (RTM) that enables the company to get involved in the discussion of trending topics. The main aim of this chapter is to analyze RTM impacts on user engagement in the case of socially relevant topics, particularly Women's Day. It provides an analysis of publications by the 25 most valuable brands in Brazil (comprising both large companies and SME) and explores the interconnections between RTM publications and CSR policies. One main conclusion is that companies should approach socially relevant dates in accordance with their CSR policies, and that successful RTM initiatives can comprise alternative approaches: promotional actions, tributes, and CSR. The findings of this chapter are particularly relevant for SMEs, considering the democratic nature of RTM and overall social media strategies.

FecharLer Abstract

2023

Symmetry-based regularization in deep breast cancer screening

Autores
Castro, E; Pereira, JC; Cardoso, JS;

Publicação
MEDICAL IMAGE ANALYSIS

Abstract
Breast cancer is the most common and lethal form of cancer in women. Recent efforts have focused on developing accurate neural network-based computer-aided diagnosis systems for screening to help anticipate this disease. The ultimate goal is to reduce mortality and improve quality of life after treatment. Due to the difficulty in collecting and annotating data in this domain, data scarcity is - and will continue to be - a limiting factor. In this work, we present a unified view of different regularization methods that incorporate domain-known symmetries in the model. Three general strategies were followed: (i) data augmentation, (ii) invariance promotion in the loss function, and (iii) the use of equivariant architectures. Each of these strategies encodes different priors on the functions learned by the model and can be readily introduced in most settings. Empirically we show that the proposed symmetry-based regularization procedures improve generalization to unseen examples. This advantage is verified in different scenarios, datasets and model architectures. We hope that both the principle of symmetry-based regularization and the concrete methods presented can guide development towards more data-efficient methods for breast cancer screening as well as other medical imaging domains.

FecharLer Abstract

2023

Public News Archive: A Searchable Sub-archive to Portuguese Past News Articles

Autores
Campos, R; Correia, D; Jatowt, A;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III

Abstract
Over the past fewdecades, the amount of information generated turned the Web into the largest knowledge infrastructure existing to date. Web archives have been at the forefront of data preservation, preventing the losses of significant data to humankind. Different snapshots of the web are saved everyday enabling users to surf the past web and to travel through this overtime. Despite these efforts, many people are not aware that the web is being preserved, often finding these infrastructures to be unattractive or difficult to use, when compared to common search engines. In this paper, we give a step towards making use of this preserved information to develop Public Archive an intuitive interface that enables end-users to search and analyze a large-scale of 67,242 past preserved news articles belonging to a Portuguese reference newspaper (Jornal Publico). The referred collection was obtained by scraping 10,976 versions of the homepage of the Jornal Publico preserved by the Portuguese web archive infrastructure (Arquivo.pt) during the time-period of 2010 to 2021. By doing this, we aim, not only to mark a stand in what respects to make use of this preserved information, but also to come up with an easy-to-follow solution, the Public Archive python package, which creates the roots to be used (with minor adaptations) by other news source providers interested in offering their readers access to past news articles.

FecharLer Abstract

2023

Confronting security and privacy challenges in digital marketing

Autores
Pires, PB; Santos, JD; Pereira, IV; Torres, AI;

Publicação
Confronting Security and Privacy Challenges in Digital Marketing

Abstract
Marketing, and specifically its digital marketing component, is being challenged by disruptive innovations, which are creating new, unique, and unusual opportunities, and with the emergence of new paradigms and models. Other areas of knowledge have embraced these innovations with swiftness, adapting promptly and using them as leverage to create new paradigms, models, and realities. Marketing, in clear opposition, has been somewhat dismissive, ignoring the potential of these new contexts that are emerging, some of which are already unavoidable. Confronting Security and Privacy Challenges in Digital Marketing identifies the most relevant issues in the current context of digital marketing and explores the implications, opportunities, and challenges of leveraging marketing strategies with digital innovations. This book explores the impact that these disruptive innovations are having on digital marketing, pointing out guidelines for organizations to leverage their strategy on the opportunities created by them. Covering topics such as blockchain technology, artificial intelligence, and virtual reality, this book is ideal for academicians, marketing professionals, researchers, and more. © 2023 by IGI Global. All rights reserved.

FecharLer Abstract

554
4198