Publications

Publications by Vitor Rocio

2007

Document retrieval for question answering: a quantitative evaluation of text preprocessing

Authors
Carvalho, G; de Matos, DM; Rocio, V;

Publication
Proceedings of the First Ph.D. Workshop in CIKM, PIKM 2007, Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 9, 2007

Abstract
Question Answering (QA) has been an area of interest for researchers, in part motivated by the international QA evaluation forums, namely the Text REtrieval Conference (TREC), and more recently, the Cross Language Evaluation Forum (CLEF) through QA@CLEF, that since 2004 includes the Portuguese language. In these forums, a collection of written documents is provided, as well as a set of questions, which are to be answered by the participating systems. Each system is evaluated by its capacity to answer the questions, as a whole, and there are relatively few results published that focus on the performance of its different components and their influence on the overall system performance. That is the case of the Information Retrieval (IR) component, which is broadly used in QA systems. Our work concentrates on the different options of preprocessing Portuguese text before feeding it to the IR component, evaluating their impact on the IR performance in the specific context of QA, so that we can make a sustained choice of which options to choose. From this work we conclude the clear advantage of the basic preprocessing techniques: case folding and removal of punctuation marks. For the other techniques considered, stop word removal enhanced the performance of the IR system but that was not the case as far as Stemming and Lemmatization are concerned. © 2007 ACM.

CloseRead Abstract

2005

Introduction

Authors
Lopes, GP; da Silva, JF; Rocio, V; Quaresma, P;

Publication
Progress in Artificial Intelligence - Lecture Notes in Computer Science

Abstract

2005

Lecture Notes in Artificial Intelligence: Introduction

Authors
Lopes, GP; Ferreira Da Silva, J; Rocio, V; Quaresma, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2005

TEMA'05: Workshop on Text Mining and Applications

Authors
Lopes, G; da Silva, J; Rocio, V; Quaresma, P;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract

2005

Text Mining and Applications (TEMA 2005) - Introduction

Authors
Lopes, GP; da Silva, JF; Rocio, V; Quaresma, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract

2007

Detection of strange and wrong automatic part-of-speech tagging

Authors
Rocio, V; Silva, J; Lopes, G;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
Automatic morphosyntactic tagging of corpora is usually imperfect. Wrong or strange tagging may be automatically repeated following some patterns. It is usually hard to manually detect all these errors, as corpora may contain millions of tags. This paper presents an approach to detect sequences of part-of-speech tags that have an internal cohesiveness in corpora. Some sequences match to syntactic chunks or correct sequences, but some are strange or incorrect, usually due to systematically wrong tagging. The amount of time spent in separating incorrect bigrams and trigrams from correct ones is very small, but it allows us to detect 70% of all tagging errors in the corpus.

CloseRead Abstract