Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About
Download Photo HD

About

Areas of Interest:

  • Information Retrieval
  • Network Science
  • Information Science
  • Machine Learning
  • Web Technologies

Interest
Topics
Details

Details

  • Name

    José Luís Devezas
  • Cluster

    Computer Science
  • Role

    Research Assistant
  • Since

    08th November 2011
Publications

2019

Hypergraph-of-entity

Authors
Devezas, J; Nunes, S;

Publication
Open Computer Science

Abstract
AbstractModern search is heavily powered by knowledge bases, but users still query using keywords or natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We have previously proposed the graph-of-entity as a purely graph-based representation and retrieval model, however this model would scale poorly. We tackle the scalability issue by adapting the model so that it can be represented as a hypergraph. This enables a significant reduction of the number of (hyper)edges, in regard to the number of nodes, while nearly capturing the same amount of information. Moreover, such a higher-order data structure, presents the ability to capture richer types of relations, including nary connections such as synonymy, or subsumption. We present the hypergraph-of-entity as the next step in the graph-of-entity model, where we explore a ranking approach based on biased random walks. We evaluate the approaches using a subset of the INEX 2009 Wikipedia Collection. While performance is still below the state of the art, we were, in part, able to achieve a MAP score similar to TF-IDF and greatly improve indexing efficiency over the graph-of-entity.

2019

Graph-of-entity: A model for combined data representation and retrieval

Authors
Devezas, JL; Lopes, CT; Nunes, S;

Publication
OpenAccess Series in Informatics

Abstract
Managing large volumes of digital documents along with the information they contain, or are associated with, can be challenging. As systems become more intelligent, it increasingly makes sense to power retrieval through all available data, where every lead makes it easier to reach relevant documents or entities. Modern search is heavily powered by structured knowledge, but users still query using keywords or, at the very best, telegraphic natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We tackle entity-oriented search using graph-based approaches for representation and retrieval. In particular, we propose the graph-of-entity, a novel approach for indexing combined data, where terms, entities and their relations are jointly represented. We compare the graph-of-entity with the graph-of-word, a text-only model, verifying that, overall, it does not yet achieve a better performance, despite obtaining a higher precision. Our assessment was based on a small subset of the INEX 2009 Wikipedia Collection, created from a sample of 10 topics and respectively judged documents. The offline evaluation we do here is complementary to its counterpart from TREC 2017 OpenSearch track, where, during our participation, we had assessed graph-of-entity in an online setting, through team-draft interleaving. © José Devezas, Carla Lopes, and Sérgio Nunes.

2018

Social Media and Information Consumption Diversity

Authors
Devezas, JL; Nunes, S;

Publication
Proceedings of the Second International Workshop on Recent Trends in News Information Retrieval co-located with 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, March 26, 2018.

Abstract
Social media platforms are having a profound impact on the so-called information ecosystem, specifically on how information is produced, distributed and consumed. Social media in particular has contributed to the rise of user generated content and consequently to a greater diversity in online content. On the other hand, social media networks, such as Twitter or Facebook, have become information management tools that allow users to setup and configure information sources to their particular interests. A Twitter user can handpick the sources he wishes to follow, thus creating a custom information channel. However, this opportunity to create personalized information channels effectively results in different consumption profiles? Is the information consumed by users through social media networks distinct from the information consumed though traditional mainstream media? In this work, we set out to investigate this question using Twitter as a case study. We prepare two samples of users, one based on a uniform random selection of user IDs, and another one based on a selection of mainstream media followers. We analyze the home timelines of the users in each sample, focusing on characterizing information consumption habits. We find that information consumption volume is higher, while diversity is consistently lower, for mainstream media followers when compared to random users. When analyzing daily behavior, however, the samples slightly approximate, while clearly maintaining a lower diversity for mainstream media followers and a higher diversity for random users. Copyright © 2018 for the individual papers by the papers’ authors.

2017

Information Extraction for Event Ranking

Authors
Devezas, JL; Nunes, S;

Publication
6th Symposium on Languages, Applications and Technologies, SLATE 2017, June 26-27, 2017, Vila do Conde, Portugal

Abstract
Search engines are evolving towards richer and stronger semantic approaches, focusing on entity-oriented tasks where knowledge bases have become fundamental. In order to support semantic search, search engines are increasingly reliant on robust information extraction systems. In fact, most modern search engines are already highly dependent on a well-curated knowledge base. Nevertheless, they still lack the ability to e ectively and automatically take advantage of multiple heterogeneous data sources. Central tasks include harnessing the information locked within textual content by linking mentioned entities to a knowledge base, or the integration of multiple knowledge bases to answer natural language questions. Combining text and knowledge bases is frequently used to improve search results, but it can also be used for the query-independent ranking of entities like events. In this work, we present a complete information extraction pipeline for the Portuguese language, covering all stages from data acquisition to knowledge base population. We also describe a practical application of the automatically extracted information, to support the ranking of upcoming events displayed in the landing page of an institutional search engine, where space is limited to only three relevant events. We manually annotate a dataset of news, covering event announcements from multiple faculties and organic units of the institution. We then use it to train and evaluate the named entity recognition module of the pipeline. We rank events by taking advantage of identified entities, as well as partOf relations, in order to compute an entity popularity score, as well as an entity click score based on implicit feedback from clicks from the institutional search engine. We then combine these two scores with the number of days to the event, obtaining a final ranking for the three most relevant upcoming events. © José Devezas and Sérgio Nunes

2017

Graph-Based Entity-Oriented Search: Imitating the Human Process of Seeking and Cross Referencing Information

Authors
Devezas, J; Nunes, S;

Publication
ERCIM News

Abstract

Supervised
thesis

2017

Exploring the Sea: Heterogenous Geo-Referenced Data Repository

Author
Inês Davim Lopes Garganta Silva

Institution
UP-FEUP

2017

Named entity extraction from Portuguese web text

Author
André Ricardo Oliveira Pires

Institution
UP-FEUP