Publicacoes - INESC TEC

Publicações

Publicações por Luís Cavique

2013

Superefficiency and multiplier adjustment in data envelopment analysis

Autores
Santos, J; Santos, LC; Mendes, AB;

Publicação
Efficiency Measures in the Agricultural Sector: With Applications

Abstract
Superefficiency is an important extension of DEA that overcomes some limitations of the traditional models, specifically allowing ranking of efficient units and a unique set of weights for those units. Weights restriction is a well-known technique in the DEA field. When those techniques are applied, weights cluster around its new limits, making its evaluation dependent of its levels. This chapter introduces a new approach to weights adjustment by goal programming techniques, avoiding the imposition of hard restrictions that can even lead to unfeasibility. This method results in models that are more flexible.

FecharLer Abstract

2013

Introduction to data envelopment analysis

Autores
Santos, J; Negas, ER; Santos, LC;

Publicação
Efficiency Measures in the Agricultural Sector: With Applications

Abstract
This chapter introduces the basics of data envelopment analysis techniques, with a short historical introduction and examples of the constant returns to scale model (CRS) and the variable returns to scale (VRS) model. The ratio models are linearized and for both orientations primal and dual models are presented.

FecharLer Abstract

2025

Mitigating false negatives in imbalanced datasets: An ensemble approach

Autores
Vasconcelos, M; Cavique, L;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.

FecharLer Abstract

2025

Large Language Model for Querying Databases in Portuguese

Autores
Figueiredo, L; Pinheiro, P; Cavique, L; Marques, N;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
This study introduces a system that helps non-expert users find information easily without knowing database languages or asking technicians for help. A specific domain is explored, focusing on a subscrip- tion-based sports facility, which serves as an open-source version of a real case study. Utilizing the star schema, the available data in the database is structured to provide accessibility through Portuguese Natural Language queries. Using a Large Language Model (LLM), SQL queries are generated based on the question and the provided star schema. We created a dataset with 115 highly challenging questions drawn from real-world usage scenarios to validate the correctness of the system. Challenges found during testing, like attribute value interpretation, out-of-scope questions, and temporal interval adequacy issues, highlight the insufficiency of the star schema alone in providing the needed context for generating accurate SQL queries by the LLM. Addressing these challenges through enhanced contextual information shows significant improvement in query correctness, with validation results increasing from 57.76% to 88.79%. This study shows the potential and limitations of LLMs in generating SQL queries from Portuguese Natural Language queries. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

FecharLer Abstract

2018

A data reduction approach using hypergraphs to visualize communities and brokers in social networks

Autores
Cavique, L; Marques, NC; Goncalves, A;

Publicação
SOCIAL NETWORK ANALYSIS AND MINING

Abstract
The comprehension of social network phenomena is closely related to data visualization. However, even with only hundreds of nodes, the visualization of dense networks is usually difficult. The strategy adopted in this work is data reduction using communities. Community detection in social network analysis is a very important issue and in particular detection of community overlapping. In this approach, the information extracted from social networks transcends cohesive groups, enabling the discovery of brokers that interact among communities. To find admissible solutions in hard problems, relaxed approaches are used. Quasi-cliques are generated, and partition is found using a partial set-covering heuristic. The proposed method allows the identification of communities and actors that link two or more groups. In the visualization process, the user can choose different dimension reduction approaches for the condensed graph. For each condensed structure, a hypergraph can be drawn, identifying communities and brokers.

FecharLer Abstract

2018

A biobjective feature selection algorithm for large omics datasets

Autores
Cavique, L; Mendes, AB; Martiniano, HFMC; Correia, L;

Publicação
EXPERT SYSTEMS

Abstract
Feature selection is one of the most important concepts in data mining when dimensionality reduction is needed. The performance measures of feature selection encompass predictive accuracy and result comprehensibility. Consistency-based methods are a significant category of feature selection research that substantially improves the comprehensibility of the result using the parsimony principle. In this work, the biobjective version of the algorithm logical analysis of inconsistent data is applied to large volumes of data. In order to deal with hundreds of thousands of attributes, heuristic decomposition uses parallel processing to solve a set covering problem and a cross-validation technique. The biobjective solutions contain the number of reduced features and the accuracy. The algorithm is applied to omics datasets with genome-like characteristics of patients with rare diseases.

FecharLer Abstract