Publications

Publications by LIAAD

2021

Data stream analysis: Foundations, major tasks and tools

Authors
Bahri, M; Bifet, A; Gama, J; Gomes, HM; Maniu, S;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The significant growth of interconnected Internet-of-Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In practice, several critical issues emerge when extracting useful knowledge from these potentially infinite data, mainly because of their evolving nature and high arrival rate which implies an inability to store them entirely. In this work, we provide a comprehensive survey that discusses the research constraints and the current state-of-the-art in this vibrant framework. Moreover, we present an updated overview of the latest contributions proposed in different stream mining tasks, particularly classification, regression, clustering, and frequent patterns. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining

CloseRead Abstract

2021

How can I choose an explainer?: An Application-grounded Evaluation of Post-hoc Explanations

Authors
Jesus, SM; Belém, C; Balayan, V; Bento, J; Saleiro, P; Bizarro, P; Gama, J;

Publication
FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021

Abstract
There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hurt the overall performance of the combined system of ML model + end-users. This study aims to bridge this gap by proposing XAI Test, an application-grounded evaluation methodology tailored to isolate the impact of providing the end-user with different levels of information. We conducted an experiment following XAI Test to evaluate three popular XAI methods - LIME, SHAP, and TreeInterpreter - on a real-world fraud detection task, with real data, a deployed ML model, and fraud analysts. During the experiment, we gradually increased the information provided to the fraud analysts in three stages: Data Only, i.e., just transaction data without access to model score nor explanations, Data + ML Model Score, and Data + ML Model Score + Explanations. Using strong statistical analysis, we show that, in general, these popular explainers have a worse impact than desired. Some of the conclusion highlights include: i) showing Data Only results in the highest decision accuracy and the slowest decision time among all variants tested, ii) all the explainers improve accuracy over the Data + ML Model Score variant but still result in lower accuracy when compared with Data Only; iii) LIME was the least preferred by users, probably due to its substantially lower variability of explanations from case to case. © 2021 ACM.

CloseRead Abstract

2021

Shedding Light on the African Enigma: In Vitro Testing of Homo sapiens-Helicobacter pylori Coevolution

Authors
Cavadas, B; Leite, M; Pedro, N; Magalhaes, AC; Melo, J; Correia, M; Maximo, V; Camacho, R; Fonseca, NA; Figueiredo, C; Pereira, L;

Publication
MICROORGANISMS

Abstract
The continuous characterization of genome-wide diversity in population and case-cohort samples, allied to the development of new algorithms, are shedding light on host ancestry impact and selection events on various infectious diseases. Especially interesting are the long-standing associations between humans and certain bacteria, such as the case of Helicobacter pylori, which could have been strong drivers of adaptation leading to coevolution. Some evidence on admixed gastric cancer cohorts have been suggested as supporting Homo-Helicobacter coevolution, but reliable experimental data that control both the bacterium and the host ancestries are lacking. Here, we conducted the first in vitro coinfection assays with dual human- and bacterium-matched and -mismatched ancestries, in African and European backgrounds, to evaluate the genome wide gene expression host response to H. pylori. Our results showed that: (1) the host response to H. pylori infection was greatly shaped by the human ancestry, with variability on innate immune system and metabolism; (2) African human ancestry showed signs of coevolution with H. pylori while European ancestry appeared to be maladapted; and (3) mismatched ancestry did not seem to be an important differentiator of gene expression at the initial stages of infection as assayed here.

CloseRead Abstract

2021

Metabarcoding with MinION: Speeding up the detection of invasive aquatic species using environmental DNA and nanopore sequencing

Authors
Egeter, B; Veríssimo, J; Lopes-Lima, M; chaves, c; Pinto, J; Riccardi, N; Beja, P; Fonseca, NA;

Publication
ARPHA Conference Abstracts

Abstract
Traditional detection of aquatic invasive species, via morphological identification is often time-consuming and can require a high level of taxonomic expertise, leading to delayed mitigation responses. Environmental DNA (eDNA) detection approaches of multiple species using Illumina-based sequencing technology have been used to overcome these hindrances, but sample processing is often lengthy. More recently, portable nanopore sequencing technology has become available, which has the potential to make molecular detection of invasive species more widely accessible and to substantially decrease sample turnaround times. However, nanopore-sequenced reads have a much higher error rate than those produced by Illumina platforms, which has so far hindered the adoption of this technology. We provide a detailed laboratory protocol and bioinformatic tools to increase the reliability of nanopore sequencing to detect invasive species, and we test its application using invasive bivalves. We sampled water from sites with pre-existing bivalve occurrence and abundance data, and contrasting bivalve communities, in Italy and Portugal. We extracted, amplified and sequenced eDNA with a turnaround of 3.5 days. The majority of processed reads were = 99 % identical to reference sequences. There were no taxa detected other than those known to occur. The lack of detections of some species at some sites could be explained by their known low abundances. The approach is now being tested on other target taxa such as fish and other vertebrates.

CloseRead Abstract

2021

Tumour gene expression signature in primary melanoma predicts long-term outcomes

Authors
Garg, M; Couturier, DL; Nsengimana, J; Fonseca, NA; Wongchenko, M; Yan, YB; Lauss, M; Jonsson, GB; Newton Bishop, J; Parkinson, C; Middleton, MR; Bishop, DT; McDonald, S; Stefanos, N; Tadross, J; Vergara, IA; Lo, S; Newell, F; Wilmott, JS; Thompson, JF; Long, GV; Scolyer, RA; Corrie, P; Adams, DJ; Brazma, A; Rabbie, R;

Publication
NATURE COMMUNICATIONS

Abstract
Adjuvant systemic therapies are now routinely used following resection of stage III melanoma, however accurate prognostic information is needed to better stratify patients. We use differential expression analyses of primary tumours from 204 RNA-sequenced melanomas within a large adjuvant trial, identifying a 121 metastasis-associated gene signature. This signature strongly associated with progression-free (HR=1.63, p=5.24 x 10(-5)) and overall survival (HR=1.61, p=1.67 x 10(-4)), was validated in 175 regional lymph nodes metastasis as well as two externally ascertained datasets. The machine learning classification models trained using the signature genes performed significantly better in predicting metastases than models trained with clinical covariates (p(AUROC) = 7.03 x 10(-4)), or published prognostic signatures (p(AUROC) < 0.05). The signature score negatively correlated with measures of immune cell infiltration (=-0.75, p<2.2 x 10(-16)), with a higher score representing reduced lymphocyte infiltration and a higher 5-year risk of death in stage II melanoma. Our expression signature identifies melanoma patients at higher risk of metastases and warrants further evaluation in adjuvant clinical trials. The identification of prognostic biomarkers can help stratify cancer patients. Here, the authors apply deep RNA sequencing from primary melanomas coupled with long-term clinical outcome data from a prospective multicentre phase III trial, to develop and validate a 121 metastasis-associated gene signature identifying early-stage melanoma patients at higher risk of metastasis and worse survival.

CloseRead Abstract

2021

Comparative Genomics of Xanthomonas euroxanthea and Xanthomonas arboricola pv. juglandis Strains Isolated from a Single Walnut Host Tree

Authors
Fernandes, C; Martins, L; Teixeira, M; Blom, J; Pothier, JE; Fonseca, NA; Tavares, F;

Publication
MICROORGANISMS

Abstract
The recent report of distinct Xanthomonas lineages of Xanthomonas arboricola pv. juglandis and Xanthomonas euroxanthea within the same walnut tree revealed that this consortium of walnut-associated Xanthomonas includes both pathogenic and nonpathogenic strains. As the implications of this co-colonization are still poorly understood, in order to unveil niche-specific adaptations, the genomes of three X. euroxanthea strains (CPBF 367, CPBF 424(T), and CPBF 426) and of an X. arboricola pv. juglandis strain (CPBF 427) isolated from a single walnut tree in Loures (Portugal) were sequenced with two different technologies, Illumina and Nanopore, to provide consistent single scaffold chromosomal sequences. General genomic features showed that CPBF 427 has a genome similar to other X. arboricola pv. juglandis strains, regarding its size, number, and content of CDSs, while X. euroxanthea strains show a reduction regarding these features comparatively to X. arboricola pv. juglandis strains. Whole genome comparisons revealed remarkable genomic differences between X. arboricola pv. juglandis and X. euroxanthea strains, which translates into different pathogenicity and virulence features, namely regarding type 3 secretion system and its effectors and other secretory systems, chemotaxis-related proteins, and extracellular enzymes. Altogether, the distinct genomic repertoire of X. euroxanthea may be particularly useful to address pathogenicity emergence and evolution in walnut-associated Xanthomonas.

CloseRead Abstract