2003
Autores
Silva, ACE; Jorge, A; Torgo, L;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE
Abstract
The information contained in companies' financial statements is valuable to several users. Much of the relevant information in such documents is contained in tables and is currently mainly extracted by hand. We propose a method that accomplishes a prior step of the task of automatically extracting information from tables in documents: selecting the lines that are likely to belong to tables. Our method has been developed by empirically analyzing a set of Portuguese companies' financial statements using statistical and data mining techniques. Empirical evaluation indicates that more than 99% of table lines are selected after discarding at least 50% of all lines. The method can cope with the complexity of styles used in assembling information on paper and adapt its performance accordingly, thus maximizing its results.
2006
Autores
Costa e Silva, A; Jorge, AM; Torgo, L;
Publicação
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION
Abstract
This paper plans an end-to-end method for extracting information from tables embedded in documents; input format is ASCII, to which any richer fort-nat can be converted, preserving all textual and much of the layout information. We start by defining table. Then we describe the steps involved in extracting information from tables and analyse table-related research to place the contribution of different authors, find the paths research is following, and identify issues that are still unsolved. We then analyse current approaches to evaluating table processing algorithms and propose two new metrics for the task of segmenting cells/columns/rows. We proceed to design our own end-to-end method, where there is a higher interaction between different steps; we indicate how back loops in the usual order of the steps can reduce the possibility of errors and contribute to solving previously unsolved problems. Finally, we explore how the actual interpretation of the table not only allows inferring the accuracy of the overall extraction process but also contributes to actually improving its quality. In order to do so, we believe interpretation has to consider context-specific knowledge; we explore how the addition of this knowledge can be made in a plug-in/out manner, such that the overall method will maintain its operability in different contexts.
2005
Autores
Jorge, A; Torgo, L; Brazdil, P; Camacho, R; Gama, J;
Publicação
PKDD
Abstract
2005
Autores
Gama, J; Camacho, R; Brazdil, P; Jorge, A; Torgo, L;
Publicação
ECML
Abstract
2005
Autores
Jorge, A; Torgo, L; Brazdil, P; Camacho, R; Gama, J;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
2005
Autores
Gama, J; Camacho, R; Brazdil, P; Jorge, A; Torgo, L;
Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.