2006
Autores
Rebelo, C; Brito, PQ; Soares, C; Jorge, A;
Publicação
2006 IEEE/WIC/ACM International Conference on Web Intelligence, (WI 2006 Main Conference Proceedings)
Abstract
Clusterings based on many variables are difficult to visualize and interpret. We present a methodology based on Factor Analysis (FA) which can be used for that purpose. FA generates a small set of variables which encode most of the information in the original variables. We apply the methodology to segment the users of a web portal, using access log data. It not only makes it simpler to visualize and understand the clusters which are obtained on the original variables but it also helps the analyst in selecting some of the original variables for further analysis of those clusters.
2006
Autores
Carvalho, C; Jorge, AM; Soares, C;
Publicação
2006 IEEE/WIC/ACM International Conference on Web Intelligence, (WI 2006 Main Conference Proceedings)
Abstract
We present a methodology for the personalization of e-newsletters based on the analysis of user access logs. To approach the problem we have used clustering on the set of users, described by their web access patterns. Our work is evaluated using a case study with real data from e-newsletters sent by mail to users of a web portal, and can be adapted to similar situations. Positive results were obtained, indicating that the methodology is able to automatically select contents for a personalized e-newsletter.
2006
Autores
Domingues, MA; Soares, C; Jorge, AM;
Publicação
2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Workshops Proceedings
Abstract
We present a web-based system to monitor the quality of the meta-data used to describe content in web portals. The system implements meta-data analysis using statistical, visualization and data mining tools. The web-based system enables the site's editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We have developed this system and tested it on a Portuguese portal for management executives.
2006
Autores
Costa e Silva, A; Jorge, AM; Torgo, L;
Publicação
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION
Abstract
This paper plans an end-to-end method for extracting information from tables embedded in documents; input format is ASCII, to which any richer fort-nat can be converted, preserving all textual and much of the layout information. We start by defining table. Then we describe the steps involved in extracting information from tables and analyse table-related research to place the contribution of different authors, find the paths research is following, and identify issues that are still unsolved. We then analyse current approaches to evaluating table processing algorithms and propose two new metrics for the task of segmenting cells/columns/rows. We proceed to design our own end-to-end method, where there is a higher interaction between different steps; we indicate how back loops in the usual order of the steps can reduce the possibility of errors and contribute to solving previously unsolved problems. Finally, we explore how the actual interpretation of the table not only allows inferring the accuracy of the overall extraction process but also contributes to actually improving its quality. In order to do so, we believe interpretation has to consider context-specific knowledge; we explore how the addition of this knowledge can be made in a plug-in/out manner, such that the overall method will maintain its operability in different contexts.
2006
Autores
Escudeiro, NF; Jorge, AM;
Publicação
Semantics, Web and Mining
Abstract
In this paper we propose a methodology for automatically retrieving document collections from the web on specific topics and for organizing them and keeping them up-to-date over time, according to user specific persistent information needs. The documents collected are organized according to user specifications and are classified partly by the user and partly automatically. A presentation layer enables the exploration of large sets of documents and, simultaneously, monitors and records user interaction with these document collections. The quality of the system is permanently monitored; the system periodically measures and stores the values of its quality parameters. Using this quality log it is possible to maintain the quality of the resources by triggering procedures aimed at correcting or preventing quality degradation.
2006
Autores
Ribeiro, R; Torgo, L;
Publicação
DISCOVERY SCIENCE, PROCEEDINGS
Abstract
This paper describes a rule learning method that obtains models biased towards a particular class of regression tasks. These tasks have as main distinguishing feature the fact that the main goal is to be accurate at predicting rare extreme values of the continuous target variable. Many real-world applications from scientific areas like ecology, meteorology, finance,etc., share this objective. Most existing approaches to regression problems search for the model parameters that optimize a given average error estimator (e.g. mean squared error). This means that they are biased towards achieving a good performance on the most common cases. The motivation for our work is the claim that being accurate at a small set of rare cases requires different error metrics. Moreover, given the nature and relevance of this type of applications an interpretable model is usually of key importance to domain experts, as predicting these rare events is normally associated with costly decisions. Our proposed system (R-PREV) obtains a set of interpretable regression rules derived from a set of bagged regression trees using evaluation metrics that bias the resulting models to predict accurately rare extreme values. We provide an experimental evaluation of our method confirming the advantages of our proposal in terms of accuracy in predicting rare extreme values.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.