Publicacoes - INESC TEC

Publicações

Publicações por Luís Torgo

2003

Automatic selection of table areas in documents for information extraction

Autores
Silva, ACE; Jorge, A; Torgo, L;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
The information contained in companies' financial statements is valuable to several users. Much of the relevant information in such documents is contained in tables and is currently mainly extracted by hand. We propose a method that accomplishes a prior step of the task of automatically extracting information from tables in documents: selecting the lines that are likely to belong to tables. Our method has been developed by empirically analyzing a set of Portuguese companies' financial statements using statistical and data mining techniques. Empirical evaluation indicates that more than 99% of table lines are selected after discarding at least 50% of all lines. The method can cope with the complexity of styles used in assembling information on paper and adapt its performance accordingly, thus maximizing its results.

FecharLer Abstract

2006

Design of an end-to-end method to extract information from tables

Autores
Costa e Silva, A; Jorge, AM; Torgo, L;

Publicação
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION

Abstract
This paper plans an end-to-end method for extracting information from tables embedded in documents; input format is ASCII, to which any richer fort-nat can be converted, preserving all textual and much of the layout information. We start by defining table. Then we describe the steps involved in extracting information from tables and analyse table-related research to place the contribution of different authors, find the paths research is following, and identify issues that are still unsolved. We then analyse current approaches to evaluating table processing algorithms and propose two new metrics for the task of segmenting cells/columns/rows. We proceed to design our own end-to-end method, where there is a higher interaction between different steps; we indicate how back loops in the usual order of the steps can reduce the possibility of errors and contribute to solving previously unsolved problems. Finally, we explore how the actual interpretation of the table not only allows inferring the accuracy of the overall extraction process but also contributes to actually improving its quality. In order to do so, we believe interpretation has to consider context-specific knowledge; we explore how the addition of this knowledge can be made in a plug-in/out manner, such that the overall method will maintain its operability in different contexts.

FecharLer Abstract

2005

Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, October 3-7, 2005, Proceedings

Autores
Jorge, A; Torgo, L; Brazdil, P; Camacho, R; Gama, J;

Publicação
PKDD

Abstract

2005

Machine Learning: ECML 2005, 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings

Autores
Gama, J; Camacho, R; Brazdil, P; Jorge, A; Torgo, L;

Publicação
ECML

Abstract

2005

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface

Autores
Jorge, A; Torgo, L; Brazdil, P; Camacho, R; Gama, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2005

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface

Autores
Gama, J; Camacho, R; Brazdil, P; Jorge, A; Torgo, L;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract