Publicacoes - INESC TEC

Publicações

Publicações por Luís Torgo

2017

Regression Trees

Autores
Torgo, L;

Publicação
Encyclopedia of Machine Learning and Data Mining

Abstract

2018

Potential of dissimilatory nitrate reduction pathways in polycyclic aromatic hydrocarbon degradation

Autores
Ribeiro, H; de Sousa, T; Santos, JP; Sousa, AGG; Teixeira, C; Monteiro, MR; Salgado, P; Mucha, AP; Almeida, CMR; Torgo, L; Magalhaes, C;

Publicação
CHEMOSPHERE

Abstract
This study investigates the potential, of an indigenous estuarine microbial consortium to degrade two polycyclic aromatic hydrocarbons (PAHs), naphthalene and fluoranthene, under nitrate-reducing conditions. Two physicochemically diverse sediment samples from the Lima Estuary (Portugal) were spiked individually with 25 mg L-1 of each PAH in laboratory designed microcosms. Sediments without PAHs and autoclaved sediments spiked with PAHs were run in parallel. Destructive sampling at the beginning and after 3, 6, 12, 30 and 63 weeks incubation was performed. Naphthalene and fluoranthene levels decreased over time with distinct degradation dynamics varying with sediment type. Next generation sequencing (NGS) of 16 S rRNA gene amplicons revealed that the sediment type and incubation time were the main drivers influencing the microbial community structure rather than the impact of PAH amendments. Predicted microbial functional analyses revealed clear shifts and interrelationships between genes involved in anaerobic and aerobic degradation of PAHs and in the dissimilatory nitrate reducing pathways (denitrification and dissimilatory nitrate reduction to ammonium - DNRA). These findings reinforced by clear biogeochemical denitrification signals (NO3- consumption, and NH4+ increased during the incubation period), suggest that naphthalene and fluoranthene degradation may be coupled with denitrification and DNRA metabolism. The results of this study contribute to the understanding of the dissimilatory nitrate-reducing pathways and help uncover their involvement in degradation of PAHs, which will be crucial for directing remediation strategies of PAH-contaminated anoxic sediments.

FecharLer Abstract

2017

A comparative study of approaches to forecast the correct trading actions

Autores
Baia, L; Torgo, L;

Publicação
EXPERT SYSTEMS

Abstract
This paper addresses the problem of decision making in the context of financial markets, more specifically, the problem of forecasting the correct trading action for a certain future horizon. We study and compare two alternative ways of addressing these forecasting tasks: (a) using standard numeric prediction models to forecast the variation on the prices of the target asset and, on a second stage, transform these numeric predictions into a decision according to some predefined decision rules; and (b) use models that directly forecast the right decision thus ignoring the intermediate numeric forecasting task. The objective of our study is to determine if both strategies provide identical results or if there is any particular advantage worth being considered that may distinguish each alternative in the context of financial markets.

FecharLer Abstract

2013

OpenML: A collaborative science platform

Autores
Van Rijn, JN; Bischl, B; Torgo, L; Gao, B; Umaashankar, V; Fischer, S; Winter, P; Wiswedel, B; Berthold, MR; Vanschoren, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
We present OpenML, a novel open science platform that provides easy access to machine learning data, software and results to encourage further study and application. It organizes all submitted results online so they can be easily found and reused, and features a web API which is being integrated in popular machine learning tools such as Weka, KNIME, RapidMiner and R packages, so that experiments can be shared easily. © 2013 Springer-Verlag.

FecharLer Abstract

2018

How to evaluate sentiment classifiers for Twitter time-ordered data?

Autores
Mozetic, I; Torgo, L; Cerqueira, V; Smailovic, J;

Publicação
PLOS ONE

Abstract
Social media are becoming an increasingly important source of information about the public mood regarding issues such as elections, Brexit, stock market, etc. In this paper we focus on sentiment classification of Twitter data. Construction of sentiment classifiers is a standard text mining task, but here we address the question of how to properly evaluate them as there is no settled way to do so. Sentiment classes are ordered and unbalanced, and Twitter produces a stream of time-ordered data. The problem we address concerns the procedures used to obtain reliable estimates of performance measures, and whether the temporal ordering of the training and test data matters. We collected a large set of 1.5 million tweets in 13 European languages. We created 138 sentiment models and out-of-sample datasets, which are used as a gold standard for evaluations. The corresponding 138 in-sample data-sets are used to empirically compare six different estimation procedures: three variants of cross-validation, and three variants of sequential validation (where test set always follows the training set). We find no significant difference between the best cross-validation and sequential validation. However, we observe that all cross-validation variants tend to overestimate the performance, while the sequential methods tend to underestimate it. Standard cross-validation with random selection of examples is significantly worse than the blocked cross-validation, and should not be used to evaluate classifiers in time-ordered data scenarios.

FecharLer Abstract

2018

Twitter as a Source for Time- and Domain-Dependent Sentiment Lexicons

Autores
Guimaraes, N; Torgo, L; Figueira, A;

Publicação
SOCIAL NETWORK BASED BIG DATA ANALYSIS AND APPLICATIONS

Abstract
Sentiment lexicons are an essential component on most state-of-the-art sentiment analysis methods. However, the terms included are usually restricted to verbs and adjectives because they (1) usually have similar meanings among different domains and (2) are the main indicators of subjectivity in the text. This can lead to a problem in the classification of short informal texts since sometimes the absence of these types of parts of speech does not mean an absence of sentiment. Therefore, our hypothesis states that knowledge of terms regarding certain events and respective sentiment (public opinion) can improve the task of sentiment analysis. Consequently, to complement traditional sentiment dictionaries, we present a system for lexicon expansion that extracts the most relevant terms from news and assesses their positive or negative score through Twitter. Preliminary results on a labelled dataset show that our complementary lexicons increase the performance of three state-of-the-art sentiment systems, therefore proving the effectiveness of our approach.

FecharLer Abstract