Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Luís Torgo

2019

Explaining the Performance of Black Box Regression Models

Authors
Areosa, I; Torgo, L;

Publication
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019)

Abstract
The widespread usage of Machine Learning and Data Mining models in several key areas of our societies has raised serious concerns in terms of accountability and ability to justify and interpret the decisions of these models. This is even more relevant when models are too complex and often regarded as black boxes. In this paper we present several tools designed to help in understanding and explaining the reasons for the observed predictive performance of black box regression models. We describe, evaluate and propose several variants of Error Dependence Plots. These plots provide a visual display of the expected relationship between the prediction error of any model and the values of a predictor variable. They allow the end user to understand what to expect from the models given some concrete values of the predictor variables. These tools allow more accurate explanations on the conditions that may lead to some failures of the models. Moreover, our proposed extensions also provide a multivariate perspective of this analysis, and the ability to compare the behaviour of multiple models under different conditions. This comparative analysis empowers the end user with the ability to have a case-based analysis of the risks associated with different models, and thus select the model with lower expected risk for each test case, or even decide not to use any model because the expected error is unacceptable.

2020

Understanding the Response of Nitrifying Communities to Disturbance in the McMurdo Dry Valleys, Antarctica

Authors
Monteiro, M; Baptista, MS; Seneca, J; Torgo, L; Lee, CK; Cary, SC; Magalhaes, C;

Publication
MICROORGANISMS

Abstract
Polar ecosystems are generally limited in nitrogen (N) nutrients, and the patchy availability of N is partly determined by biological pathways, such as nitrification, which are carried out by distinctive prokaryotic functional groups. The activity and diversity of microorganisms are generally strongly influenced by environmental conditions. However, we know little of the attributes that control the distribution and activity of specific microbial functional groups, such as nitrifiers, in extreme cold environments and how they may respond to change. To ascertain relationships between soil geochemistry and the ecology of nitrifying microbial communities, we carried out a laboratory-based manipulative experiment to test the selective effect of key geochemical variables on the activity and abundance of ammonia-oxidizing communities in soils from the McMurdo Dry Valleys of Antarctica. We hypothesized that nitrifying communities, adapted to different environmental conditions within the Dry Valleys, will have distinct responses when submitted to similar geochemical disturbances. In order to test this hypothesis, soils from two geographically distant and geochemically divergent locations, Miers and Beacon Valleys, were incubated over 2 months under increased conductivity, ammonia concentration, copper concentration, and organic matter content. Amplicon sequencing of the 16S rRNA gene and transcripts allowed comparison of the response of ammonia-oxidizing Archaea (AOA) and ammonia-oxidizing Bacteria (AOB) to each treatment over time. This approach was combined with measurements of (NH4+)-N-15 oxidation rates using N-15 isotopic additions. Our results showed a higher potential for nitrification in Miers Valley, where environmental conditions are milder relative to Beacon Valley. AOA exhibited better adaptability to geochemical changes compared to AOB, particularly to the increase in copper and conductivity. AOA were also the only nitrifying group found in Beacon Valley soils. This laboratorial manipulative experiment provided new knowledge on how nitrifying groups respond to changes on key geochemical variables of Antarctic desert soils, and we believe these results offer new insights on the dynamics of N cycling in these ecosystems.

2018

Analysis and Detection of Unreliable Users in Twitter: Two Case Studies

Authors
Guimarães, N; Figueira, A; Torgo, L;

Publication
Knowledge Discovery, Knowledge Engineering and Knowledge Management - 10th International Joint Conference, IC3K 2018, Seville, Spain, September 18-20, 2018, Revised Selected Papers

Abstract

2019

The CURE for Class Imbalance

Authors
Bellinger, C; Branco, P; Torgo, L;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Addressing the class imbalance problem is critical for several real world applications. The application of pre-processing methods is a popular way of dealing with this problem. These solutions increase the rare class examples and/or decrease the normal class cases. However, these procedures typically only take into account the characteristics of each individual class. This segmented view of the data can have a negative impact. We propose a new method that uses an integrated view of the data classes to generate new examples and remove cases. ClUstered REsampling (CURE) is a method based on a holistic view of the data that uses hierarchical clustering and a new distance measure to guide the sampling procedure. Clusters generated in this way take into account the structure of the data. This enables CURE to avoid common mistakes made by other resampling methods. In particular, CURE prevents the generation of synthetic examples in dangerous regions and undersamples safe, non-borderline, regions of the majority class. We show the effectiveness of CURE in an extensive set of experiments with benchmark domains. We also show that CURE is a user-friendly method that does not require extensive fine-tuning of hyper-parameters. © Springer Nature Switzerland AG 2019.

2016

Data Mining with R

Authors
Torgo, L;

Publication

Abstract

2016

Data mining with R: Learning with case studies, second edition

Authors
Torgo, L;

Publication
Data Mining with R: Learning with Case Studies, Second Edition

Abstract
Data Mining with R: Learning with Case Studies, Second Edition uses practical examples to illustrate the power of R and data mining. Providing an extensive update to the best-selling first edition, this new edition is divided into two parts. The first part will feature introductory material, including a new chapter that provides an introduction to data mining, to complement the already existing introduction to R. The second part includes case studies, and the new edition strongly revises the R code of the case studies making it more up-to-date with recent packages that have emerged in R. The book does not assume any prior knowledge about R. Readers who are new to R and data mining should be able to follow the case studies, and they are designed to be self-contained so the reader can start anywhere in the document. The book is accompanied by a set of freely available R source files that can be obtained at the book's web site. These files include all the code used in the case studies, and they facilitate the "do-it-yourself" approach followed in the book. Designed for users of data analysis tools, as well as researchers and developers, the book should be useful for anyone interested in entering the "world" of R and data mining.

  • 11
  • 24