Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por HumanISE

2011

Term Weighting Based on Document Revision History

Autores
Nunes, S; Ribeiro, C; David, G;

Publicação
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY

Abstract
In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.

2011

UPData - A Data Curation Experiment at U.Porto using DSpace

Autores
da Silva, JR; Lopes, JC; Ribeiro, C;

Publicação
Proceedings of the 8th International Conference on Digital Preservation, iPRES 2011, Singapore, November 1-4, 2011

Abstract

2011

Comparative evaluation of web search engines in health information retrieval

Autores
Lopes, CT; Ribeiro, C;

Publicação
ONLINE INFORMATION REVIEW

Abstract
Purpose - The intent of this work is to evaluate several generalist and health-specific search engines for retrieval of health information by consumers: to compare the retrieval effectiveness of these engines for different types of clinical queries, medical specialties and condition severity; and to compare the use of evaluation metrics for binary relevance scales and for graded ones. Design/methodology/approach - The authors conducted a study in which users evaluated the relevance of documents retrieved by four search engines for two different health information needs. Users could choose between generalist (Bing, Google, Sapo and Yahoo!) and health-specific (MedlinePlus, SapoSande and WebMD) search engines. The authors then analysed the differences between search engines and groups of information needs with six different measures: graded average precision (gap), average precision (ap), gap@5, gap@10, ap@5 and ap@10. Findings The results show that generalist web search engines surpass the precision of health-specific engines. Google has the best performance, mainly in the top ten results. It was found that information needs associated with severe conditions are associated with higher precision, as are overview and psychiatry questions. Originality/value - The study is one of the first to use a recently proposed measure to evaluate the effectiveness of retrieval systems with graded relevance scales. It includes tasks from several medical specialties, types of clinical questions and different levels of severity which, to the best of the authors' knowledge, has not been clone before. Moreover, users have considerable involvement in the experiment. The results help in understanding how search engines differ in their responses to health information needs, what types of online health information are more common on the web and how to improve this type of search.

2011

Data Curation at U.Porto: Identifying current practices across disciplinary domains

Autores
Ribeiro, C; Fernandes, EM;

Publicação
IASSIST 2011 - Data Science Professionals: A Global Community of Sharing, Vancouver, BC, Canada, May 31 - June 3, 2011

Abstract

2011

dpikt - Automatic illustration system for media content

Autores
Coelho, F; Ribeiro, C;

Publicação
Proceedings - International Workshop on Content-Based Multimedia Indexing

Abstract
Journalists and bloggers need to find useful images to illustrate news stories and blog entries with high quality photos. The dpikt text illustration system uses multimedia information retrieval to assist this content enrichment task. Users query the system with text fragments and get collections of candidate photos. Images in the results can be visually sorted according to a selected photo, or be used as a seed for interactive searches over the entire collection. dpikt incorporates a recent visual descriptor, the Joint Composite Descriptor, and an approximate indexing scheme designed for large-scale image collections, the Permutation-Prefix Index. We have used the SAPO-Labs large-scale news stories photo collection, containing almost two million high quality photos with short descriptions, as the resource for the illustration task. © 2011 IEEE.

2011

Automatic illustration with cross-media retrieval in large-scale collections

Autores
Coelho, F; Ribeiro, C;

Publicação
Proceedings - International Workshop on Content-Based Multimedia Indexing

Abstract
In this paper, we approach the task of finding suitable images to illustrate text, from specific news stories to more generic blog entries. We have developed an automatic illustration system supported by multimedia information retrieval, that analyzes text and presents a list of candidate images to illustrate it. The system was tested on the SAPO-Labs media collection, containing almost two million images with short descriptions, and the MIRFlickr-25000 collection, with photos and user tags from Flickr. Visual content is described by the Joint Composite Descriptor and indexed by a Permutation-Prefix Index. Illustration is a three-stage process using textual search, score filtering and visual clustering. A preliminary evaluation using exhaustive and approximate visual searches demonstrates the capabilities of the visual descriptor and approximate indexing scheme used. © 2011 IEEE.

  • 546
  • 662