Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2006

A web-based system to monitor the quality of meta-data in web portals

Autores
Domingues, MA; Soares, C; Jorge, AM;

Publicação
2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Workshops Proceedings

Abstract
We present a web-based system to monitor the quality of the meta-data used to describe content in web portals. The system implements meta-data analysis using statistical, visualization and data mining tools. The web-based system enables the site's editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We have developed this system and tested it on a Portuguese portal for management executives.

FecharLer Abstract

2006

Design of an end-to-end method to extract information from tables

Autores
Costa e Silva, A; Jorge, AM; Torgo, L;

Publicação
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION

Abstract
This paper plans an end-to-end method for extracting information from tables embedded in documents; input format is ASCII, to which any richer fort-nat can be converted, preserving all textual and much of the layout information. We start by defining table. Then we describe the steps involved in extracting information from tables and analyse table-related research to place the contribution of different authors, find the paths research is following, and identify issues that are still unsolved. We then analyse current approaches to evaluating table processing algorithms and propose two new metrics for the task of segmenting cells/columns/rows. We proceed to design our own end-to-end method, where there is a higher interaction between different steps; we indicate how back loops in the usual order of the steps can reduce the possibility of errors and contribute to solving previously unsolved problems. Finally, we explore how the actual interpretation of the table not only allows inferring the accuracy of the overall extraction process but also contributes to actually improving its quality. In order to do so, we believe interpretation has to consider context-specific knowledge; we explore how the addition of this knowledge can be made in a plug-in/out manner, such that the overall method will maintain its operability in different contexts.

FecharLer Abstract

2006

Semi-automatic creation and maintenance of web resources with webTopic

Autores
Escudeiro, NF; Jorge, AM;

Publicação
Semantics, Web and Mining

Abstract
In this paper we propose a methodology for automatically retrieving document collections from the web on specific topics and for organizing them and keeping them up-to-date over time, according to user specific persistent information needs. The documents collected are organized according to user specifications and are classified partly by the user and partly automatically. A presentation layer enables the exploration of large sets of documents and, simultaneously, monitors and records user interaction with these document collections. The quality of the system is permanently monitored; the system periodically measures and stores the values of its quality parameters. Using this quality log it is possible to maintain the quality of the resources by triggering procedures aimed at correcting or preventing quality degradation.

FecharLer Abstract

2006

Rule-based prediction of rare extreme values

Autores
Ribeiro, R; Torgo, L;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
This paper describes a rule learning method that obtains models biased towards a particular class of regression tasks. These tasks have as main distinguishing feature the fact that the main goal is to be accurate at predicting rare extreme values of the continuous target variable. Many real-world applications from scientific areas like ecology, meteorology, finance,etc., share this objective. Most existing approaches to regression problems search for the model parameters that optimize a given average error estimator (e.g. mean squared error). This means that they are biased towards achieving a good performance on the most common cases. The motivation for our work is the claim that being accurate at a small set of rare cases requires different error metrics. Moreover, given the nature and relevance of this type of applications an interpretable model is usually of key importance to domain experts, as predicting these rare events is normally associated with costly decisions. Our proposed system (R-PREV) obtains a set of interpretable regression rules derived from a set of bagged regression trees using evaluation metrics that bias the resulting models to predict accurately rare extreme values. We provide an experimental evaluation of our method confirming the advantages of our proposal in terms of accuracy in predicting rare extreme values.

FecharLer Abstract

2006

Predicting rare extreme values

Autores
Torgo, L; Ribeiro, R;

Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS

Abstract
Modelling extreme data is very important in several application domains, like for instance finance, meteorology, ecology, etc.. This paper addresses the problem of predicting extreme values of a continuous variable. The main distinguishing feature of our target applications resides on the fact that these values are rare. Any prediction model is obtained by some sort of search process guided by a pre-specified evaluation criterion. In this work we argue against the use of standard criteria for evaluating regression models in the context of our target applications. We propose. a new predictive performance metric for this class of problems that our experiments show to perform better in distinguishing models that are more accurate at rare extreme values. This new evaluation metric could be used as the basis for developing better models in terms of rare extreme values prediction.

FecharLer Abstract

2006

Organizational survival in cooperation networks: The case of automobile manufacturing

Autores
Campos, P; Brazdil, P; Brito, P;

Publicação
Network-Centric Collaboration and Supporting Frameworks

Abstract
We propose a Multi-Agent framework to analyze the dynamics of organizational survival in cooperation networks. Firms can decide to cooperate horizontally (in the same market) or vertically with other firms that belong to the supply chain. Cooperation decisions are based on economic variables. We have defined a variant of the density dependence model to set up the dynamics of the survival in the simulation. To validate our model, we have used empirical outputs obtained in previous studies from the automobile manufacturing sector. We have observed that firms and networks proliferate in the regions with lower marginal costs, but new networks keep appearing and disappearing in regions with higher marginal costs.

FecharLer Abstract