Publications

Publications by Sérgio Nunes

2011

Term Weighting Based on Document Revision History

Authors
Nunes, S; Ribeiro, C; David, G;

Publication
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY

Abstract
In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.

CloseRead Abstract

2008

FEUP at TREC 2008 blog track: Using temporal evidence for ranking and feed distillation

Authors
Nunes, S; Ribeiro, C; David, G;

Publication
NIST Special Publication

Abstract
This paper presents the participation of FEUP, from University of Porto, in the TREC 2008 Blog Track. FEUP participated in two tasks, the baseline adhoc retrieval task and the blog finding distillation task. Our approach was focused on the use of the temporal information available in the TREC Blog06 collection. For the baseline adhoc retrieval task a simple temporal sort was evaluated. In the blog finding distillation task we tested three alternative scoring functions based on temporal evidence. All features were combined with a BM25 baseline run using a standard rank aggregation approach. We observed small, but statistically significant, improvements in several evaluation measures when temporal information is used.

CloseRead Abstract

2009

FEUP at TREC 2009 Blog Track: Temporal evidence in the faceted blog distillation task

Authors
Nunes, S; Ribeiro, C; David, G;

Publication
NIST Special Publication

Abstract
This paper describes the participation of FEUP, from the University of Porto, in the TREC 2009 Blog Track. FEUP participated in the faceted blog distillation task with work focused on the use of temporal features available in the new TREC Blogs08 collection. The approach presented in this paper uses the temporal information available in most individual posts to amplify (or reduce) each post's score. Blog scores, and subsequent ranks, are obtained by combining individual posts' scores. While preparing the runs, no endeavors were made to identify a priori any temporal differences between the three distinct facets.

CloseRead Abstract

2023

NewsLines: Narrative Visualization of News Stories

Authors
Costa, M; Nunes, S;

Publication
Proceedings of Text2Story - Sixth Workshop on Narrative Extraction From Texts held in conjunction with the 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland, April 2, 2023.

Abstract
Visual representations have the potential to improve information understanding. We explore this idea in the development of NewsLine, an open-source web-based prototype that focuses on narrative visualizations of news content. Having structured data as input, the prototype produces a storyline which showcases the narrative's events and participants, allowing the user to interact with the visualization in a number of ways. We built an information hub around the storyline to allow for multiple levels of exploration, specifically the main visualization, the event information module, and the sidebar. The visualization depicts the sequence of events that make up a news story, as well as the interactions between the involved parties in each event. The event information module presents additional information on a particular event. The sidebar is the “control center” of the visualization, unlocking a number of interactions and configurations. The prototype was evaluated with a user study with journalists and also with an online survey which gathered feedback from 178 potential end users. From these, 106 participants (60.6%) provided a rating of four or above (one to five scale) when asked to quantify their interest in using the application. Moreover, participants were asked to rank the importance of the visualization elements used. The results highlight that two elements stand out as the most important, the events and the entities. Overall, the participants generally found the application to be useful, but in need of some work in order for it to be made available to a broader public. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CloseRead Abstract

2021

Hypergraph-of-Entity: A General Model for Entity-Oriented Search

Authors
Devezas, JL; Nunes, S;

Publication
CoRR

Abstract

2021

Fatigued PageRank

Authors
Devezas, JL; Nunes, S;

Publication
CoRR

Abstract