2022
Autores
Guedes, C; Giesteira, B; Nunes, S;
Publicação
ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE
Abstract
In this article, we present solutions to visualize and interact with linked data in historical archives considering three different scenarios: search, individual record view, and creation of relationships. The created solutions were designed using nonfunctional mockups and were based on the CIDOC-CRM model, a model created and applied in the museums community liable to be extended to other cultural heritage institutions, being our solutions an application of this model to archives. A sample of 20 archival professionals was selected to evaluate the elements included in the proposed solutions. The evaluation sessions consisted in structured interviews supported by an introductory video and a survey. The think-aloud protocol was applied during the sessions. We conducted both a quantitative and qualitative analysis to the collected answers. From this analysis, we conclude that the majority of the participants showed great receptivity to the solutions presented and recognized many benefits in the application of linked data. Our contributions also include an exploratory study of some existing linked data systems, giving particular attention to their visualization and interaction modes.
2022
Autores
Nunes, S; Silva, T; Martins, C; Peixoto, R;
Publicação
Proceedings of the 26th International Conference on Theory and Practice of Digital Libraries - Workshops and Doctoral Consortium, Padua, Italy, September 20, 2022.
Abstract
In this paper we describe the EPISA Platform, a technical infrastructure designed and developed to support archival records management and access using linked data technologies. The EPISA Platform follows a client-server paradigm, with a central component, the EPISA Server, responsible for storage, reasoning, authorization, and search; and a frontend component, the EPISA ArchClient, responsible for user interaction. The EPISA Server uses Apache Jena Fuseki for storage and reasoning, and Apache Solr for search. The EPISA ArchClient is a web application implemented using PHP Laravel and standard web technologies. The platform follows a modular architecture, based on Docker containers. We describe the technical details of the platform and the main user interaction workflows, highlighting the abstractions developed to integrate linked data in the archival management process. The EPISA Platform has been successfully used to support research and development of linked data use in the archival domain in the context of the EPISA project. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
2022
Autores
Damas, J; Devezas, J; Nunes, S;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2022
Abstract
In this work, we targeted the search engine of a sports-related website that presented an opportunity for search result quality improvement. We reframed the engine as a Federated Search instance, where each collection represented a searchable entity type within the system, using Apache Solr for querying each resource and a Python Flask server to merge results. We extend previous work on individual search term weighing, making use of past search terms as a relevance indicator for user selected documents. To incorporate term weights we define four strategies combining two binary variables: integration with default relevance (linear scaling or linear combination) and search term frequency (raw value or log-smoothed). To evaluate our solution, we extracted two query sets from search logs: one with frequently submitted queries, and another with ambiguous result access patterns. We used click-through information as a relevance proxy and tried to mitigate its limitations by evaluating under distinct IR metrics, including MRR, MAP and NDCG. Moreover, we also measured Spearman rank correlation coefficients to test similarities between produced rankings and reference orderings according to user access patterns. Results show consistency across all metrics in both sets. Previous search terms were key to obtaining a higher effectiveness, with runs that used pure search term frequency performing best. Compared to the baseline, our best strategies were able to maintain quality on frequent queries and improve retrieval effectiveness on ambiguous queries, with up to six percentage points better performance on most metrics.
2026
Autores
Araújo, A; de Jesus, G; Nunes, S;
Publicação
Lecture Notes in Computer Science
Abstract
Developing information retrieval (IR) systems that enable access across multiple languages is crucial in multilingual contexts. In Timor-Leste, where Tetun, Portuguese, English, and Indonesian are official and working languages, no cross-lingual information retrieval (CLIR) solutions currently exist to support information access across these languages. This study addresses that gap by investigating CLIR approaches tailored to the linguistic landscape of Timor-Leste. Leveraging an existing monolingual Tetun document collection and ad-hoc text retrieval baselines, we explore the feasibility of CLIR for Tetun. Queries were manually translated into Portuguese, English, and Indonesian to create a multilingual query set. These were then automatically translated back into Tetun using Google Translate and several large language models, and used to retrieve documents in Tetun. Results show that Google Translate is the most reliable tool for Tetun CLIR overall, and the Hiemstra LM consistently outperforms BM25 and DFR BM25 in cross-lingual retrieval performance. However, overall effectiveness remains up to 26.95% points lower than that of the monolingual baseline, underscoring the limitations of current translation tools and the challenges of developing an effective CLIR for Tetun. Despite these challenges, this work establishes the first CLIR baseline for Tetun ad-hoc text retrieval, providing a foundation for future research in this under-resourced setting. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
2026
Autores
Damas, J; Nunes, S;
Publicação
Lecture Notes in Computer Science
Abstract
Understanding user behavior in search systems is essential for improving retrieval effectiveness and user satisfaction. While prior research has extensively examined general-purpose web search engines, domain-specific contexts—such as sports information—remain comparatively underexplored. In this study, we analyze over 400,000 interaction log entries from a sports-oriented search engine collected over a two-week period. Our analysis combines classic query-level metrics (e.g., frequency distributions, query lengths) with a detailed examination of click behavior, including entropy-based intent variability and a custom query quality scoring model. Compared to established baselines from general and specialized search environments, we observe a high proportion of new and single-term queries, as well as a notable lack of representativeness among top queries. These findings reveal patterns shaped by the event-driven and entity-centric nature of sports content, offering actionable insights for the design of domain-specific retrieval systems. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
2025
Autores
Catarina Pires; Sérgio Nunes; Luís Filipe Teixeira;
Publicação
Information Retrieval Research
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.