Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2012

Comparing state-of-the-art regression methods for long term travel time prediction

Autores
Mendes Moreira, J; Jorge, AM; de Sousa, JF; Soares, C;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
Long-term travel time prediction (TTP) can be an important planning tool for both freight transport and public transport companies. In both cases it is expected that the use of long-term TTP can improve the quality of the planned services by reducing the error between the actual and the planned travel times. However, for reasons that we try to stretch out along this paper, long-term TTP is almost not mentioned in the scientific literature. In this paper we discuss the relevance of this study and compare three non-parametric state-of-the-art regression methods: Projection Pursuit Regression (PPR), Support Vector Machine (SVM) and Random Forests (RF). For each one of these methods we study the best combination of input parameters. We also study the impact of different methods for the pre-processing tasks (feature selection, example selection and domain values definition) in the accuracy of those algorithms. We use bus travel time's data from a bus dispatch system. From an off-the-shelf point-of-view, our experiments show that RF is the most promising approach from the three we have tested. However, it is possible to obtain more accurate results using PPR but with extra pre-processing work, namely on example selection and domain values definition.

FecharLer Abstract

2012

Optimal leverage association rules with numerical interval conditions

Autores
Jorge, AM; Azevedo, PJ;

Publicação
INTELLIGENT DATA ANALYSIS

Abstract
In this paper we propose a framework for defining and discovering optimal association rules involving a numerical attribute A in the consequent. The consequent has the form of interval conditions (A < x, A >= x or A is an element of I where I is an interval or a set of intervals of the form [x(l), x(u))). The optimality is with respect to leverage, one well known association rule interest measure. The generated rules are called Maximal Leverage Rules (MLR) and are generated from Distribution Rules. The principle for finding the MLR is related to the Kolmogorov-Smirnov goodness of fit statistical test. We propose different methods for MLR generation, taking into account leverage optimallity and readability. We theoretically demonstrate the optimality of the main exact methods, and measure the leverage loss of approximate methods. We show empirically that the discovery process is scalable.

FecharLer Abstract

2012

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Autores
Campos, R; Jorge, AM; Dias, G; Nunes, C;

Publicação
2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1

Abstract
With the growing popularity of research in Temporal Information Retrieval (T-IR), a large amount of temporal data is ready to be exploited. The ability to exploit this information can be potentially useful for several tasks. For example, when querying "Football World Cup Germany", it would be interesting to have two separate clusters {1974,2006} corresponding to each of the two temporal instances. However, clustering of search results by time is a non-trivial task that involves determining the most relevant dates associated to a query. In this paper, we propose a first approach to flat temporal clustering of search results. We rely on a second order co-occurrence similarity measure approach which first identifies top relevant dates. Documents are grouped at the year level, forming the temporal instances of the query. Experimental tests were performed using real-world text queries. We used several measures for evaluating the performance of the system and compared our approach with Carrot Web-snippet clustering engine. Both experiments were complemented with a user survey.

FecharLer Abstract

2012

A Multi-agent Recommender System

Autores
Jorge Morais, AJ; Oliveira, E; Jorge, AM;

Publicação
DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE

Abstract
The large amount of pages in Websites is a problem for users who waste time looking for the information they really want. Knowledge about users' previous visits may provide patterns that allow the customization of the Website. This concept is known as Adaptive Website: a Website that adapts itself for the purpose of improving the user's experience. Some Web Mining algorithms have been proposed for adapting a Website. In this paper, a recommender system using agents with two different algorithms (associative rules and collaborative filtering) is described. Both algorithms are incremental and work with binary data. Results show that this multi-agent approach combining different algorithms is capable of improving user's satisfaction.

FecharLer Abstract

2012

GTE: a distributional second-order co-occurrence approach to improve the identification of top relevant dates in web snippets

Autores
Campos, R; Dias, G; Jorge, A; Nunes, C;

Publicação
21st ACM International Conference on Information and Knowledge Management, CIKM'12, Maui, HI, USA, October 29 - November 02, 2012

Abstract
In this paper, we present an approach to identify top relevant dates in Web snippets with respect to a given implicit temporal query. Our approach is two-fold. First, we propose a generic temporal similarity measure called GTE, which evaluates the temporal similarity between a query and a date. Second, we propose a classification model to accurately relate relevant dates to their corresponding query terms and withdraw irrelevant ones. We suggest two different solutions: a threshold-based classification strategy and a supervised classifier based on a combination of multiple similarity measures. We evaluate both strategies over a set of real-world text queries and compare the performance of our Web snippet approach with a query log approach over the same set of queries. Experiments show that determining the most relevant dates of any given implicit temporal query can be improved with GTE combined with the second order similarity measure InfoSimba, the Dice coefficient and the threshold-based strategy compared to (1) first-order similarity measures and (2) the query log based approach. © 2012 ACM.

FecharLer Abstract

2012

Ensemble Approaches for Regression: A Survey

Autores
Mendes Moreira, J; Soares, C; Jorge, AM; De Sousa, JF;

Publicação
ACM COMPUTING SURVEYS

Abstract
The goal of ensemble regression is to combine several models in order to improve the prediction accuracy in learning problems with a numerical target variable. The process of ensemble learning can be divided into three phases: the generation phase, the pruning phase, and the integration phase. We discuss different approaches to each of these phases that are able to deal with the regression problem, categorizing them in terms of their relevant characteristics and linking them to contributions from different fields. Furthermore, this work makes it possible to identify interesting areas for future research.

FecharLer Abstract