2012
Authors
Cunha, E; Figueira, A;
Publication
15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012)
Abstract
Assessing the quality of the clustering process is fundamental in unsupervised clustering. In literature we can find three different clustering validity techniques: external criteria; internal criteria and relative criteria. In this paper, we focus on external criteria and present an algorithm that allows the implementation of external measures to assess clustering quality when the structure of the data set is unknown. To obtain an automatic partition of a data set and to reflect how documents must be grouped according to human intuition we use internal information present in data like descriptions provide by the users as tags and the distance between documents. The results show an evident correlation between manual and automatic classes indicating it is acceptable to use an automatic partition. In addition to presenting an alternative to finding the structure of the data set using meta-data such as tags, we also wanted to test the impact of their integration in the k-means++ algorithm and verify how it influences the quality of the formed clusters, suggesting a model of integration based on the occurrence of tags in document content. The experimental results indicate a positive impact when external measures are calculated, although there was no apparent correlation between the weight assigned to the tags and the quality of the obtained clusters.
2012
Authors
Revilla, LF; Figueira, A;
Publication
23rd ACM Conference on Hypertext and Social Media, HT '12, Milwaukee, WI, USA, June 25-28, 2012
Abstract
Computational journalism allows journalists to collect large collections of information chunks from separate sources. The analysis of these collections can reveal hidden relationships between of relationships, but due to their size, diversity, and varying nuances it is necessary to use both computational and human analysis. Breadcrumbs PDL is an adaptive spatial hypermedia system that brings together human cognition and machine computation in order to analyze a collection of usergenerated news clips. The project demonstrates the effectiveness of spatial hypermedia in the domain of computational journalism.
2012
Authors
Cravino, N; Devezas, JL; Figueira, A;
Publication
23rd ACM Conference on Hypertext and Social Media, HT '12, Milwaukee, WI, USA, June 25-28, 2012
Abstract
Breadcrumbs is a folksonomy of news clips, where users can aggregate fragments of text taken from online news. Besides the textual content, each news clip contains a set of metadata fields associated with it. User-defined tags are one of the most important of those information fields. Based on a small data set of news clips, we build a network of cooccurrence of tags in news clips, and use it to improve text clustering. We do this by defining a weighted cosine similarity proximity measure that takes into account both the clip vectors and the tag vectors. The tag weight is computed using the related tags that are present in the discovered community. We then use the resulting vectors together with the new distance metric, which allows us to identify socially biased document clusters. Our study indicates that using the structural features of the network of tags leads to a positive impact in the clustering process. Copyright 2012 ACM.
2012
Authors
Figueira, A;
Publication
ICIMTR 2012 - 2012 International Conference on Innovation, Management and Technology Research
Abstract
In this article we describe a system that is capable of self-organizing news clips collected by readers, into a personal digital library. The system then uses this information to provide a rich set of inferred relations between the clips and clusters of clips to the producers. The inferred information is in the form of news with the 'hot' topics, the relations between clips content and the interests of their readers. We describe the Breadcrumbs system which features an online news collecting tool, an inference engine and a social graph. We discuss the outcomes of our system, which allows for a better understanding of the news consumption and trends. Finally, we describe how we can use these outcomes to create a business model. © 2012 IEEE.
2012
Authors
Silva, A; Figueira, A;
Publication
12th IEEE International Conference on Advanced Learning Technologies, ICALT 2012, Rome, Italy, July 4-6, 2012
Abstract
In this article we present a system capable of graphically representing the interactions between students and teachers in hierarchical online forums. By defining the 'reply-to' relation between the users the system builds a graph. During forum posts mining, the system computes metrics taken from social network analysis which are then applied to the graph drawing process. This system brings up new possibilities to e-learning as a tool capable of helping the teacher assorting and illustrating the degree of participation of students; to identify key students in information passing, and to find the implicit relations between forums participants. Preliminary tests lead to the conclusions that the system is able to rapidly help in identifying situations like outliers, sources and sinks of information. It also depicts rapidly sub communities formed from forum participants. © 2012 IEEE.
2012
Authors
Devezas, J; Figueira, A;
Publication
KDIR 2012 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
Abstract
Interactive visualization systems are powerful tools in the task of exploring and understanding data. We describe two implementations of this approach, where a multidimensional network of news clips is depicted by taking advantage of its community structure. The first implementation is a multiresolution map of news clips that uses topic detection both at the clip level and at the community level, in order to assign labels to the nodes in each resolution. The second implementation is a traditional force-directed network visualization with several additional interactive aspects that provide a rich user experience for knowledge discovery. We describe a common use case for the visualization systems as a journalistic research and knowledge discovery tool. Both systems illustrate the links between news clips, induced by the co-occurrence of named entities, as well as several metadata fields based on the information contained within each node. Copyright © 2012 SciTePress - Science and Technology Publications.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.