2015
Autores
Kar, M; Nunes, S; Ribeiro, C;
Publicação
INFORMATION PROCESSING & MANAGEMENT
Abstract
In the area of Information Retrieval, the task of automatic text summarization usually assumes a static underlying collection of documents, disregarding the temporal dimension of each document. However, in real world settings, collections and individual documents rarely stay unchanged over time. The World Wide Web is a prime example of a collection where information changes both frequently and significantly over time, with documents being added, modified or just deleted at different times. In this context, previous work addressing the summarization of web documents has simply discarded the dynamic nature of the web, considering only the latest published version of each individual document. This paper proposes and addresses a new challenge - the automatic summarization of changes in dynamic text collections. In standard text summarization, retrieval techniques present a summary to the user by capturing the major points expressed in the most recent version of an entire document in a condensed form. In this new task, the goal is to obtain a summary that describes the most significant changes made to a document during a given period. In other words, the idea is to have a summary of the revisions made to a document over a specific period of time. This paper proposes different approaches to generate summaries using extractive summarization techniques. First, individual terms are scored and then this information is used to rank and select sentences to produce the final summary. A system based on Latent Dirichlet Allocation model (LDA) is used to find the hidden topic structures of changes. The purpose of using the LDA model is to identify separate topics where the changed terms from each topic are likely to carry at least one significant change. The different approaches are then compared with the previous work in this area. A collection of articles from Wikipedia, including their revision history, is used to evaluate the proposed system. For each article, a temporal interval and a reference summary from the article's content are selected manually. The articles and intervals in which a significant event occurred are carefully selected. The summaries produced by each of the approaches are evaluated comparatively to the manual summaries using ROUGE metrics. It is observed that the approach using the LDA model outperforms all the other approaches. Statistical tests reveal that the differences in ROUGE scores for the LDA-based approach is statistically significant at 99% over baseline.
2015
Autores
Devezas, T; Nunes, S; Rodríguez, MT;
Publicação
HIC@HT
Abstract
In this paper, we present the tools of the MediaViz project, a work-in progress platform that aims to provide researchers, academics and professionals from the media field with a set of analytical and exploratory resources to answer high level and complex questions about the online media panorama, in an eficient, visual and interactive way. Our approach consists of aggregating and processing news data from multiple online sources, and provide programatic access to it through an Application Programming Interface (API). The visualization tools leverage the data provided by the API, allowing users to interact, explore and interrogate that information. Through the use of data visualization techniques, we aim to characterize the publication patterns of multiple online news sources by analyzing and comparing distinct dimensions. Dimensions of interest include the frequency and flow of publications and social shares throughout time, and the geographic coverage of online news outlets. We present some of the developed visualization tools and describe how they can offer meaningful insights by providing a bird's-eye view of distinct characteristics of the online mediascape.
2015
Autores
Rodríguez, MT; Nunes, S; Devezas, T;
Publicação
NHT@HT
Abstract
In this article we survey the historical background and development of information and data visualization, and an overview of the intersection of data visualization with storytelling applied to the field of data journalism, where it finds its most widespread use in narrative visualizations. We start by explaining why the mere act of visualization can be highly useful to readers, helping them discover patterns and comprehend information. Backed by historical references, we will describe how some of the first data visualizations were used to explain facts, understand certain events, and determine courses of action. We will then outline how storytelling and narrative techniques are being currently used with data visualization to leverage the power of visual expression. Our goal is to characterize storytelling with data as a vibrant and interesting field that current journalism practices employ to help readers understand and form opinions on complex facts. By presenting concepts like storytelling with data and data stories, we aim to spark interest in further research in the applications of data visualization and narrative.
2015
Autores
Oroszlanyova, M; Ribeiro, C; Nunes, S; Lopes, CT;
Publicação
CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2015
Abstract
Search engines typically estimate relevance using features of the documents. We believe that several features from the user and task can also contribute to this process. In the health domain there are specific characteristics of web documents that can also add value to this estimation. In the present work, using a dataset composed by set of annotated web pages and their assessment by a set of users regarding their relevance and comprehension, we analyse what characteristics affect documents' relevance and what characteristics influence how well users comprehend them. We have conducted a bivariate analysis using characteristics of the above data collection. The strongest relations we have found are linked to the task features, suggesting a direct association between tasks' clarity and easiness and both the relevance and the comprehension of the content. The language of the document, its medical certification, the update status, the content in pathology definitions, the content in prevention, prognosis and treatment information, are other characteristics valued by consumers in terms of relevance. Users' previous experience on health searches and, particularly, on the topic being searched, their gender, the language and terminology of their queries were shown to be related to their success in the search tasks. We have also found that lay terminology, knowledge about the medico-scientific terms and the language of the documents are good indicators of comprehension. Documents containing links and testimonies, and the ones recently updated were observed to be better understood by users, as well as blog posts and comments. (C) 2015 The Authors. Published by Elsevier B.V.
2015
Autores
Oroszlányová, M; Ribeiro, C; Nunes, S; Lopes, CT;
Publicação
CENTERIS/ProjMAN/HCist
Abstract
2015
Autores
Bozorgzadeh, E; Cardoso, JMP; Abreu, R; Memik, SO;
Publicação
EUC
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.