Publicacoes - INESC TEC

Publicações

Publicações por Luís Cavique

2023

Feature engineering: techniques and applications

Autores
Teixeira, Mariana; Cavique, Luís;

Publicação
Revista de Ciências da Computação

Abstract
Machine Learning is a rising concept in today's society. In the past decade, ML-based systems have become part of people's daily routines, and their usage has been disseminated through diverse sectors. This evolution is supported by the exponential increase in data created worldwide. Feature Engineering is a critical process focused on transforming data into suitable inputs for Machine Learning algorithms. This work explores the Feature Engineering process by developing a baseline for its implementation. Hence, a pipeline of Feature Engineering techniques and their taxonomy is proposed, along with a set of R scripts to implement. The validity of the code is then demonstrated through its application to a real-world dataset.;MachineLearning é um conceito em crescente evolução na sociedade atual. Na última década, os sistemas baseados em ML tornaram-se parte do quotidiano da população e a sua aplicação tem vindo a disseminar-se por diversos setores. Este crescimento é suportado pelo aumento exponencial da quantidade de dados gerados a nível mundial. FeatureEngineering surge, assim, como um processo chave que permite transformar dados em inputs adequados para os algoritmos de MachineLearning. O presente trabalho pretende explorar o processo de FeatureEngineering, com vista a desenvolver uma base de suporte à sua implementação. Por conseguinte, é proposta uma pipeline de técnicas de FeatureEngineering em paralelo com a sua taxonomia, juntamente com um conjunto de scripts R, para as implementar. A validade do código é, posteriormente, demonstrada através da sua aplicação a um conjunto de dados reais.

FecharLer Abstract

2023

A Data Science Maturity Model Applied to Students' Modeling

Autores
Cavique, L; Pombinho, P; Correia, L;

Publicação
Emerging Science Journal

Abstract
Maturity models define a series of levels, each representing an increased complexity in information systems. Data Science appears in the Business Intelligence (BI) and Business Analytics (BA) literature. This work applies the _IABE maturity model, which includes two additional levels: Data Engineering (DE) at the bottom and Business Experimentation (BE) at the top. This study uses the _IABE model for students' modeling in the ModEst project. For this purpose, the Public Administration organism is the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Education Ministry. DGEEC provided vast data on two million students per year in the Portuguese school system, from pre-scholar to doctoral programs. This work presents the comprehensible _IABE maturity model to extract new knowledge from the DGEEC dataset. The method applied is _IABE, where after the DE level, wh-questions are formulated and answered with the most appropriate techniques at each maturity level. This work's novelty is applying the maturity model _IABE to a unique dataset for the first time. Wh-questions are stated at the BI level using data summarization; at the BA level, predictive models are performed, and counterfactual approaches are presented at the BE level. © 2023 by the authors. Licensee ESJ, Italy.

FecharLer Abstract

2023

Causal machine learning in social impact assessment

Autores
Lopes, NC; Cavique, L;

Publicação
Philosophy of Artificial Intelligence and Its Place in Society

Abstract
Social impact assessment is a fundamental process to verify the achievement of the objectives of interventions and, consequently, to validate investments in the social area. Generally, this process is based on the analysis of the average effects of the intervention, which does not allow a detailed understanding of the individualization of these effects. Causal machine learning methods mark an evolution in causal inference, as they allow for a more heterogeneous assessment of the effects of interventions. Applying these methods to evaluate the impact of social projects and programs offers the advantage of improving the selection of target audiences and optimizing and personalizing future interventions. In this chapter, in a non-technical way, the authors explore classical causal inference methods to estimate average effects and new causal machine learning methods to evaluate heterogeneous effects. They address adapting the Uplift Modeling method to assess social interventions. They also address the advantages, limitations, and research needs for using these new techniques in social intervention. © 2023, IGI Global. All rights reserved.

FecharLer Abstract

2023

Impact of artificial intelligence in Industry 4.0 and 5.0

Autores
Motinho, L; Cavique, L;

Publicação
Philosophy of Artificial Intelligence and Its Place in Society

Abstract
Industry 4.0 uses the network concept to establish an interconnected manufacturing system. Industry 4.0 integrates the more recent digital concepts such as artificial intelligence (AI), the internet of things (IoT), big data, cloud computing, and 3D printing. The next maturity level, Industry 5.0, aims to shift the focus back to human-centric production by creating a sustainable and collaborative environment with humans and machines. Every manufacturer aims to find new ways to increase profits, reduce risks, and improve production efficiency. AI tools can process and interpret vast volumes of data from the production floor to spot patterns, analyze and predict consumer behavior, and detect real-time anomalies in production processes. This work studies the impact of AI in Industries 4.0 and 5.0. In Industry 4.0, AI can help in classic tasks such as predictive maintenance, production optimization, and customer personalization. Industry 5.0 enables sustainable manufacturing development and human-AI interaction. In this work, the authors demonstrate the impact of AI in Industry 4.0 and 5.0. © 2023, IGI Global. All rights reserved.

FecharLer Abstract

2013

A feature selection approach in the study of azorean proverbs

Autores
Cavique, L; Mendes, AB; Funk, M; Santos, JMA;

Publicação
Exploring Innovative and Successful Applications of Soft Computing

Abstract
A paremiologic (study of proverbs) case is presented as part of a wider project based on data collected among the Azorean population. Given the considerable distance between the Azores islands, the authors present the hypothesis that there are significant differences in the proverbs from each island, thus permitting the identification of the native island of the interviewee based on his or her knowledge of proverbs. In this chapter, a feature selection algorithm that combines Rough Sets and the Logical Analysis of Data (LAD) is presented. The algorithm named LAID (Logical Analysis of Inconsistent Data) deals with noisy data, and the authors believe that an important link was established between the two different schools with similar approaches. The algorithm was applied to a real world dataset based on data collected using thousands of interviews of Azoreans, involving an initial set of twenty-two thousand Portuguese proverbs. © 2015, IGI Global. All right reserved.

FecharLer Abstract

2025

A machine learning framework for uplift modeling through customer segmentation

Autores
Pinheiro, P; Cavique, L;

Publicação
Decision Analytics Journal

Abstract
In uplift modeling, the goal is to identify high-value customers based on persuadable customers, those who make a purchase only if contacted. To achieve this, uplift modeling combines machine learning techniques with causal inference, allowing businesses to refine their customer targeting strategies and focus efforts where they are most profitable. This study proposes a practical and reproducible two-phase procedure for identifying high-value customers. In the first phase, customers are segmented using decision trees, which offer a transparent and data-driven approach to grouping individuals with similar characteristics. This segmentation lays the groundwork for a meaningful interpretation of customer behavior. In the second phase, uplift is calculated for each customer segment by comparing the outcomes of the treatment and control groups. This enables the identification of customer groups with the highest uplift. A real-world use case further illustrates the value and applicability of the proposed method. To validate model performance, the procedure employs established metrics such as the Qini index and Cohen's kappa, which provide insights into both the effectiveness and reliability of the uplift estimates. This work presents a decoupled procedure for uplift modeling that leverages well-established libraries, fostering transparency and a clear understanding of the analytical process. A key contribution to uplift modeling and causal inference is the use of decision trees for stratification, which enables the creation of meaningful segments and their evaluation through the average treatment effect. By integrating theory with practical implementation, this work offers a comprehensive framework for uplift modeling that bridges academic rigor and business usability. © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract