Publications

Publications by LIAAD

2009

Item-Based and User-Based Incremental Collaborative Filtering for Web Recommendations

Authors
Miranda, C; Jorge, AM;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
In this paper we propose an incremental item-based collaborative filtering algorithm. It works with binary ratings (sometimes also called implicit ratings), as it; is typically the case in a Web environment. Our method is capable of incorporating new information in parallel with performing recommendation. New sessions and new users are used to update the similarity matrix as they appear. The proposed algorithm is compared with a non-incremental one, as well as with an incremental user-based approach, based oil an existing explicit, rating recommender. The use of techniques for working with sparse matrices oil these algorithms is also evaluated. All versions, implemented ill R, are evaluated on 5 datasets with various number of users and/or items. We observed that: Recall tends to improve when we continuously add information to the recommender model; the time spent for recommendation does not degrade; the time for updating the similarity matrix (necessary to the recommendation) is relatively low and motivates the use of the item-based incremental approach. Moreover we study how the number of items and users affects the user based and the item based approaches.

CloseRead Abstract

2009

Efficient Coverage of Case Space with Active Learning

Authors
Escudeiro, NF; Jorge, AM;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
Collecting and annotating exemplary cases is a costly and critical task that is required in early stages of any classification process. Reducing labeling cost without degrading accuracy calls for a compromise solution which may be achieved with active learning. Common active learning approaches focus on accuracy and assume the availability of a pre-labeled set of exemplary cases covering all classes to learn. This assumption does not necessarily hold. In this paper we study the capabilities of a new active learning approach, d-Confidence, in rapidly covering the case space when compared to the traditional active learning confidence criterion, when the representativeness assumption is not met.. Experimental results also show that; d-Confidence reduces the number of queries required to achieve complete class coverage and tends to improve or maintain classification error.

CloseRead Abstract

2009

Analysis and Forecast of Team Formation in the Simulated Robotic Soccer Domain

Authors
Almeida, R; Reis, LP; Jorge, AM;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This paper proposes a classification approach to identify the team's formation (formation means the strategical layout of the players in the field) in the robotic soccer domain for the two dimensional (213) simulation league. It is a tool for decision support that allows the coach to understand the strategy of the opponent. To reach that goal we employ Data Mining classification techniques. TO understand the simulated robotic soccer domain we briefly describe the Simulation system, some related work and the use of Data Mining techniques for the detection of formations. In order to perform a robotic soccer match with different formations we develop a way to configure the formations in a training base team (FC Portugal) and a data preparation process. The paper describes the base team and the test team,, used and the respective configuration process. After the matches between test teams the data is subjected to a reduction process taking into account the players' position in the field given the collective. In the modeling stage appropriate learning algorithms were selected. In the solution analysis, the error rate (% incorrectly classify instances) with the statistic test t-Student for paired samples were selected, as the evaluation measure. Experimental results show that it is possible to automatically identify the formations used by the base team (FC Portugal) in distinct matches against different opponents, using Data Mining techniques. The experimental results also show that the SMO (Sequential Minimal Optimization) learning algorithm has the best performance.

CloseRead Abstract

2009

The Effect of Varying Parameters and Focusing on Bus Travel Time Prediction

Authors
Moreira, JM; Soares, C; Jorge, AM; de Sousa, JF;

Publication
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS

Abstract
Travel time prediction is an important tool for the planning tasks of mass transit and logistics companies. ID this paper we investigate the use of regression methods for the problem of predicting the travel time of buses in a Portuguese public transportation company. More specifically, we empirically evaluate the impact of varying parameters on the performance of different regression algorithms, such as support vector machines (SVM), random forests (RF) and projection pursuit, regression (PPR). We also evaluate the impact of the focusing tusks (example selection; domain value definition and feature selection) in the accuracy of those algorithms. Concerning the algorithms, we observe that 1) RF is quite robust to the choice of parameters and focusing methods: 2) the choice of parameters for SVM can be made independently of focusing methods while 3) for PPR they should be selected simultaneously. For the focusing methods, we observe that a stronger effect is obtained using example selection, particularly in combination with SVM.

CloseRead Abstract

2009

A Knowledge Discovery Method for the Characterization of Protein Unfolding Processes

Authors
Fernandes, E; Jorge, AM; Silva, CG; Brito, RMM;

Publication
2ND INTERNATIONAL WORKSHOP ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (IWPACBB 2008)

Abstract
This work presents a method of knowledge discovery in data obtained from Molecular Dynamics Protein Unfolding Simulations. The data under study was obtained from simulations of the unfolding process of the protein Transthyretin (TTR), responsible for amyloid diseases such as Familial Amyloid Polyneuropathy (FAP). Protein unfolding and misfolding are at the source of many amyloidogenic diseases. Thus, the molecular characterization of protein unfolding processes through experimental and simulation methods may be essential in the development of effective treatments. Here, we analyzed the distance variation of each of the 127 amino acids C. (alpha carbon) atoms of TTR to the centre of mass of the protein, along 10 different unfolding simulations - five simulations of WT-TTR and five simulations of L55P-TTR, a highly amyloidogenic TTR variant. Using data mining techniques, and considering all the information of the 10 runs, we identified several clusters of amino acids. For each cluster we selected the representative element and identified events which were used as features. With Association Rules we found patterns that characterize the type of TTR variant under study. These results may help discriminate between amyloidogenic and non-amyloidogenic behaviour among different TTR variants and contribute to the understanding of the molecular mechanisms of FAP.

CloseRead Abstract

2009

Discovery Science, 12th International Conference, DS 2009, Porto, Portugal, October 3-5, 2009

Authors
Gama, J; Costa, VS; Jorge, AM; Brazdil, P;

Publication
Discovery Science

Abstract