Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2007

New Functions for Unsupervised Asymmetrical Paraphrase Detection

Authors
Cordeiro, J; Dias, G; Brazdil, P;

Publication
JSW

Abstract

2007

An iterative process for building learning curves and predicting relative performance of classifiers

Authors
Leite, R; Brazdil, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
This paper concerns the problem of predicting the relative performance of classification algorithms. Our approach requires that experiments are conducted on small samples. The information gathered is used to identify the nearest learning curve for which the sampling procedure was fully carried out. This allows the generation of a prediction regarding the relative performance of the algorithms. The method automatically establishes how many samples are needed and their sizes. This is done iteratively by taking into account the results of all previous experiments - both on other datasets and on the new dataset obtained so far. Experimental evaluation has shown that the method achieves better performance than previous approaches.

2007

Does SVM really scale up to large bag of words feature spaces?

Authors
Colas, F; Paclik, P; Kok, JN; Brazdil, P;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS VII, PROCEEDINGS

Abstract
We are concerned with the problem of learning classification rules in text categorization where many authors presented Support Vector Machines (SVM) as leading classification method. Number of studies, however, repeatedly pointed out that in some situations SVM is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. In this paper, we aim at developing better understanding of SVM behaviour in typical text categorization problems represented by sparse bag of words feature spaces. We study in details the performance and the number of support vectors when varying the training set size, the number of features and, unlike existing studies, also SVM free parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show that SVM solutions with small C are high performers. However, most training documents are then bounded support vectors sharing a same weight C. Thus, SVM reduce to a nearest mean classifier-, this raises an interesting question on SVM merits in sparse bag of words feature spaces. Additionally, SVM suffer from performance deterioration for particular training set size/number of features combinations.

2007

Cost-sensitive decision trees applied to medical data

Authors
Freitas, A; Costa Pereira, A; Brazdil, P;

Publication
DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS

Abstract
Classification plays an important role in medicine, especially for medical diagnosis. Health applications often require classifiers that minimize the total cost, including misclassifications costs and test costs. In fact, there are many reasons for considering costs in medicine, as diagnostic tests are not free and health budgets are limited. Our aim with this work was to define, implement and test a strategy for cost-sensitive learning. We defined an algorithm for decision tree induction that considers costs, including test costs, delayed costs and costs associated with risk. Then we applied our strategy to train and evaluate cost-sensitive decision trees in medical data. Built trees can be tested following some strategies, including group costs, common costs, and individual costs. Using the factor of "risk" it is possible to penalize invasive or delayed tests and obtain decision trees patient-friendly.

2007

A putative gene located at the MHC class I region around the D6S105 marker contributes to the setting of CD8+T-lymphocyte numbers in humans

Authors
Vieira, J; Cardoso, CS; Pinto, J; Patil, K; Brazdil, P; Cruz, E; Mascarenhas, C; Lacerda, R; Gartner, A; Almeida, S; Alves, H; Porto, G;

Publication
INTERNATIONAL JOURNAL OF IMMUNOGENETICS

Abstract
Significant associations between human leucocyte antigen (HLA)-A and -B alleles and CD8+ T-lymphocyte numbers have been reported in the literature in both healthy populations and in HFE-haemochromatosis patients. In order to address whether HLA alleles themselves or alleles at linked genes are responsible for these associations, several genetic markers at the MHC class I region were typed on a population of 147 apparently healthy unrelated subjects phenotypically characterized for their CD8+ and CD4+ T-lymphocyte numbers. By using a machine learning approach, a set of rules was generated that predict the number of CD8+ T-lymphocyte numbers on the basis of the information of the D6S105 microsatellite alleles only. We demonstrate that the previously reported associations with HLA-A and -B alleles are due to the presence of common long (up to 4 megabases long) haplotypes that increased in frequency recently due to positive selection and that encompass a region where a putative gene contributing to the setting of CD8+ T lymphocytes is located, in the neighbourhood of microsatellite locus D6S105, in the 6p21.3 region.

2007

Location of a putative gene contributing to the setting of CD8+T lymphocytes: A modifier of hereditary hemochromatosis expression?

Authors
Vieira, J; Cardoso, CS; Pinto, J; Patil, K; Brazdil, P; Cruz, E; Mascarenhas, C; Lacerda, R; Gartner, A; Almeida, S; Alves, H; Porto, G;

Publication
AMERICAN JOURNAL OF HEMATOLOGY

Abstract

  • 464
  • 516