2012
Authors
Campos, R; Dias, G; Jorge, AM; Nunes, C;
Publication
ACM International Conference Proceeding Series
Abstract
Generically, search engines fail to understand the user's temporal intents when expressed as implicit temporal queries. This causes the retrieval of less relevant information and prevents users from being aware of the possible temporal dimension of the query results. In this paper, we aim to develop a language-independent model that tackles the temporal dimensions of a query and identifies its most relevant time periods. For this purpose, we propose a temporal similarity measure capable of associating a relevant date(s) to a given query and filtering out irrelevant ones. Our approach is based on the exploitation of temporal information from web content, particularly within the set of k-top retrieved web snippets returned in response to a query. We particularly focus on extracting years, which are a kind of temporal information that often appears in this type of collection. We evaluate our methodology using a set of real-world text temporal queries, which are clear concepts (i.e. queries which are non-ambiguous in concept and temporal in their purpose). Experiments show that when compared to baseline methods, determining the most relevant dates relating to any given implicit temporal query can be improved with a new temporal similarity measure. © 2012 ACM.
2012
Authors
Domingues, MA; Gouyon, F; Jorge, AM; Leal, JP; Vinagre, J; Lemos, L; Sordo, M;
Publication
WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion
Abstract
In this paper we propose a hybrid music recommender system, which combines usage and content data. We describe an online evaluation experiment performed in real time on a commercial music web site, specialised in content from the very long tail of music content. We compare it against two stand-alone recommenders, the first system based on usage and the second one based on content data. The results show that the proposed hybrid recommender shows advantages with respect to usage- and content-based systems, namely, higher user absolute acceptance rate, higher user activity rate and higher user loyalty. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
2012
Authors
Nogueira, BM; Jorge, AM; Rezende, SO;
Publication
Proceedings of the ACM Symposium on Applied Computing
Abstract
In this paper, we address the problem of semi-supervised hierarchical clustering by using an active clustering solution with cluster-level constraints. This active learning approach is based on a concept of merge confidence in agglomerative clustering. The proposed method was compared with an un-supervised algorithm (average-link) and a semi-supervised algorithm based on pairwise constraints. The results show that our algorithm tends to be better than the pairwise constrained algorithm and can achieve a significant improvement when compared to the unsupervised algorithm. © 2012 Authors.
2012
Authors
Vinagre, J; Jorge, AM;
Publication
Journal of the Brazilian Computer Society
Abstract
Collaborative filtering (CF) has been an important subject of research in the past few years. Many achievements have been made in this field, however, many challenges still need to be faced, mainly related to scalability and predictive ability. One important issue is how to deal with old and potentially obsolete data in order to avoid unnecessary memory usage and processing time. Our proposal is to use forgetting mechanisms. In this paper, we present and evaluate the impact of two forgetting mechanisms-sliding windows and fading factors-in user-based and item-based CF algorithms with implicit binary ratings under a scenario of abrupt change. Our results suggest that forgetting mechanisms reduce time and space requirements, improving scalability, while not significantly affecting the predictive ability of the algorithms. © 2012 The Brazilian Computer Society.
2012
Authors
Escudeiro, NF; Jorge, AM;
Publication
Journal of the Brazilian Computer Society
Abstract
In some classification tasks, such as those related to the automatic building and maintenance of text corpora, it is expensive to obtain labeled instances to train a classifier. In such circumstances it is common to have massive corpora where a few instances are labeled (typically a minority) while others are not. Semi-supervised learning techniques try to leverage the intrinsic information in unlabeled instances to improve classification models. However, these techniques assume that the labeled instances cover all the classes to learn which might not be the case. Moreover, when in the presence of an imbalanced class distribution, getting labeled instances from minority classes might be very costly, requiring extensive labeling, if queries are randomly selected. Active learning allows asking an oracle to label new instances, which are selected by criteria, aiming to reduce the labeling effort. D-Confidence is an active learning approach that is effective when in presence of imbalanced training sets. In this paper we evaluate the performance of d-Confidence in comparison to its baseline criteria over tabular and text datasets. We provide empirical evidence that d-Confidence reduces label disclosure complexity-which we have defined as the number of queries required to identify instances from all classes to learn-when in the presence of imbalanced data. © 2012 The Brazilian Computer Society.
2012
Authors
Jorge, AM; Mendes Moreira, J; De Sousa, JF; Soares, C; Azevedo, PJ;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
In this paper we study the deviation of bus trip duration and its causes. Deviations are obtained by comparing scheduled times against actual trip duration and are either delays or early arrivals. We use distribution rules, a kind of association rules that may have continuous distributions on the consequent. Distribution rules allow the systematic identification of particular conditions, which we call contexts, under which the distribution of trip time deviations differs significantly from the overall deviation distribution. After identifying specific causes of delay the bus company operational managers can make adjustments to the timetables increasing punctuality without disrupting the service. © Springer-Verlag Berlin Heidelberg 2012.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.