2007
Authors
Trigo, A; Varajao, J; Figueired, N; Barroso, J;
Publication
NOVAS PERSPECTIVAS EM SISTEMAS E TECNOLOGIAS DE INFORMACAO, VOL I
Abstract
2007
Authors
Varajao, JE; Ribeiro, AT; Figueiredo, NP; Barroso, JM;
Publication
CISCI 2007: 6TA CONFERENCIA IBEROAMERICANA EN SISTEMAS, CIBERNETICA E INFORMATICA, MEMORIAS, VOL I
Abstract
2007
Authors
Carvalho, G; de Matos, DM; Rocio, V;
Publication
Proceedings of the First Ph.D. Workshop in CIKM, PIKM 2007, Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 9, 2007
Abstract
Question Answering (QA) has been an area of interest for researchers, in part motivated by the international QA evaluation forums, namely the Text REtrieval Conference (TREC), and more recently, the Cross Language Evaluation Forum (CLEF) through QA@CLEF, that since 2004 includes the Portuguese language. In these forums, a collection of written documents is provided, as well as a set of questions, which are to be answered by the participating systems. Each system is evaluated by its capacity to answer the questions, as a whole, and there are relatively few results published that focus on the performance of its different components and their influence on the overall system performance. That is the case of the Information Retrieval (IR) component, which is broadly used in QA systems. Our work concentrates on the different options of preprocessing Portuguese text before feeding it to the IR component, evaluating their impact on the IR performance in the specific context of QA, so that we can make a sustained choice of which options to choose. From this work we conclude the clear advantage of the basic preprocessing techniques: case folding and removal of punctuation marks. For the other techniques considered, stop word removal enhanced the performance of the IR system but that was not the case as far as Stemming and Lemmatization are concerned. © 2007 ACM.
2007
Authors
Rocio, V; Silva, J; Lopes, G;
Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS
Abstract
Automatic morphosyntactic tagging of corpora is usually imperfect. Wrong or strange tagging may be automatically repeated following some patterns. It is usually hard to manually detect all these errors, as corpora may contain millions of tags. This paper presents an approach to detect sequences of part-of-speech tags that have an internal cohesiveness in corpora. Some sequences match to syntactic chunks or correct sequences, but some are strange or incorrect, usually due to systematically wrong tagging. The amount of time spent in separating incorrect bigrams and trigrams from correct ones is very small, but it allows us to detect 70% of all tagging errors in the corpus.
2007
Authors
Santos, V; Mamede, HS;
Publication
Encyclopedia of Internet Technologies and Applications
Abstract
2007
Authors
Mamede, HS; Santos, V; Lopes Costa, JAL;
Publication
NOVAS PERSPECTIVAS EM SISTEMAS E TECNOLOGIAS DE INFORMACAO, VOL I
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.