Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2023

ydata-profiling: Accelerating data-centric AI with high-quality data

Authors
Clemente, F; Ribeiro, GM; Quemy, A; Santos, MS; Pereira, RC; Barros, A;

Publication
NEUROCOMPUTING

Abstract
ydata-profiling is an open-source Python package for advanced exploratory data analysis that enables users to generate data profiling reports in a simple, fast, and efficient manner, fostering a standardized and visual understanding of the data. Beyond traditional descriptive properties and statistics, ydata-profiling follows a Data-Centric AI approach to exploratory analysis, as it focuses on the automatic detection and highlighting of complex data characteristics often associated with potential data quality issues, such as high ratios of missing or imbalanced data, infinite, unique, or constant values, skewness, high correlation, high cardinality, non-stationarity, seasonality, duplicate records, and other inconsistencies. The source code, documentation, and examples are available in the GitHub repository: https://github.com/ydataai/ydata-profiling.

2023

DRIPPS: a Corpus with Discourse Relations in Perfect Participial Sentences

Authors
Silvano, P; Cordeiro, J; Leal, A; Pais, S;

Publication
LDK

Abstract
The main objective of this paper is to introduce a new language resource for some varieties of Portuguese - European, Brazilian, Mozambican, and Angolan - and for British English, called DRIPPS (Discourse Relations In Perfect Participial Sentences). The corpus DRIPPS comprises, at the moment, 993 adverbial perfect participial sentences annotated with Discourse Relations and with the following Discourse Relational Devices: connectors, ordering of the clauses, temporal relations, tenses, and aspectual types. Additionally, an application with a Graphical User Interface (GUI) has been developed not only to browse and manipulate the corpus but also to allow the activation of specific Discourse Relation constraints, thereby selecting specific cases from the data set that can be analyzed separately. Besides calculating simple counts and percentages, insightful statistical graphs can be generated and visualized on the fly from the combination of the user-selected constraints and the loaded corpora. The application is pre-loaded with Portuguese and English cases and allows to import/load further cases from different languages/ varieties.

2023

ISO-DR-core Plugs into ISO-dialogue Acts for a Cross-linguistic Taxonomy of Discourse Markers

Authors
Silvano, P; Damova, M;

Publication
LDK

Abstract

2023

Validation of Language Agnostic Models for Discourse Marker Detection

Authors
Damova, M; Mishev, K; Oleskeviciene, GV; Liebeskind, C; Silvano, P; Trajanov, D; Truica, CO; Apostol, ES; Chiarcos, C; Baczkowska, A;

Publication
LDK

Abstract

2023

Portal infoCosméticos: a digital tool to empower consumers and health professionals

Authors
Torres, Ana; Fonseca, Leonor; Ferreira, Marta; Silvano, Maria da Purificação; Lobo, José Manuel Sousa; Almeida, Isabel F.;

Publication

Abstract
Nowadays, the information regarding cosmetic products available to the community is vast, although not always trustworthy. The Pharmaceutical Technology Laboratory of the Faculty of Pharmacy of the University of Porto (FFUP) launched the Portal infoCosméticos aiming to provide professionals involved in cosmetic advice with reliable information, supported by up-to-date scientific evidence, while empowering Portuguese-speaking consumers to make better informed choices. Pre and post-graduates of the master's degree in Pharmaceutical Sciences are responsible for developing contents, which are submitted to a linguistic review by students of the Faculty of Arts and Humanities. Firstly, a relevant question is identified, following a comprehensive search on the topic and the creation of an infographic. The scientific validation is carried out by national and international scholars, and the national regulatory authority, INFARMED. Since it was released in 2017, the website has hit more than 170,000 visualizations, covering topics related to regulatory affairs, safety and efficacy, cosmetic ingredients and cosmetic products. The most accessed topics by digital users were disclosed by monitoring the visualizations of each question with Google Analytics, considering the publication date. According to the records, consumers seem to be more concerned about the safety of cosmetics and interested to know more about their composition.

2023

BUILDING AN OWL-ONTOLOGY FOR REPRESENTING, LINKING AND QUERYING SEMAF DISCOURSE ANNOTATIONS

Authors
Chiarcos, C; Silvano, P; Damova, M; Oleskeviciene, GV; Liebeskind, C; Trajanov, D; Truica, CO; Apostol, ES; Baczkowska, A;

Publication
RASPRAVE

Abstract
Linguistic Linked Open Data (LLOD) are technologies that provide a powerful instrument for representing and interpreting language phenomena on a web-scale. The main objective of this paper is to demonstrate how LLOD technologies can be applied to represent and annotate a corpus composed of multiword discourse markers, and what the effects of this are. In particular, it is our aim to apply semantic web standards such as RDF and OWL for publishing and integrating data. We present a novel scheme for discourse annotation that combines ISO standards describing discourse relations and dialogue acts - ISO DR-Core (ISO 24617-8) and ISO-Dialogue Acts (ISO 24617-2) in 9 languages (cf. Silvano and Damova 2022; Silvano et al. 2022). We develop an OWL ontology to formalize that scheme, provide a newly annotated dataset and link its RDF edition with the ontology. Consequently, we describe the conjoint querying of the ontology and the annotations by means of SPARQL, the standard query language for the web of data. The ultimate result is that we are able to perform queries over multiple, interlinked datasets with complex internal structure. This is a first, but essential step, in developing novel, powerful, and groundbreaking means for the corpus-based study of multilingual discourse, communication analysis, or attitudes discovery.

  • 115
  • 529