Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2006

On the behavior of SVM and some older algorithms in binary text classification tasks

Autores
Colas, F; Brazdil, P;

Publicação
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS

Abstract
Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM? We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.

2006

Quantitative pharmacophore models with inductive logic programming

Autores
Srinivasan, A; Page, D; Camacho, R; King, R;

Publicação
MACHINE LEARNING

Abstract
Three-dimensional models, or pharmacophores, describing Euclidean constraints on the location on small molecules of functional groups (like hydrophobic groups, hydrogen acceptors and donors, etc.), are often used in drug design to describe the medicinal activity of potential drugs (or 'ligands'). This medicinal activity is produced by interaction of the functional groups on the ligand with a binding site on a target protein. In identifying structure-activity relations of this kind there are three principal issues: (1) It is often difficult to "align" the ligands in order to identify common structural properties that may be responsible for activity; (2) Ligands in solution can adopt different shapes (or 'conformations') arising from torsional rotations about bonds. The 3-D molecular substructure is typically sought on one or more low-energy conformers; and (3) Pharmacophore models must, ideally, predict medicinal activity on some quantitative scale. It has been shown that the logical representation adopted by Inductive Logic Programming (ILP) naturally resolves many of the difficulties associated with the alignment and multi-conformation issues. However, the predictions of models constructed by ILP have hitherto only been nominal, predicting medicinal activity to be present or absent. In this paper, we investigate the construction of two kinds of quantitative pharmacophoric models with ILP: (a) Models that predict the probability that a ligand is "active"; and (b) Models that predict the actual medicinal activity of a ligand. Quantitative predictions are obtained by the utilising the following statistical procedures as background knowledge: logistic regression and naive Bayes, for probability prediction; linear and kernel regression, for activity prediction. The multi-conformation issue and, more generally, the relational representation used by ILP results in some special difficulties in the use of any statistical procedure. We present the principal issues and some solutions. Specifically, using data on the inhibition of the protease Thermolysin, we demonstrate that it is possible for an ILP program to construct good quantitative structure-activity models. We also comment on the relationship of this work to other recent developments in statistical relational learning.

2006

Guest editorial

Autores
Camacho, R; King, RD; Srinivasan, A;

Publicação
Machine Learning

Abstract

2006

A commodity platform for Distributed Data Mining - the HARVARD System

Autores
Camacho, R;

Publicação
6th Industrial Conference on Data Mining, Poster Proceedings, ICDM 2006, Leipzig, Germany, July 14-15, 2006

Abstract

2006

CT-guided percutaneous transthoracic biopsy in the evaluation of undetermined pulmonary lesions [Biópsia percutânea transtorácica guiada por TC na avaliação de lesões pulmonares de natureza indeterminada]

Autores
Lourenco, R; Camacho, R; Barata, MJ; Canario, D; Gaspar, A; Cyrne, C;

Publicação
Revista Portuguesa de Pneumologia

Abstract
CT-guide Percutaneous Transthoracic Biopsies (PTB) performed in the Radiology Department of Garcia de Orta Hospital between 2002 and 2004 to evaluate undetermined pulmonary lesions were retrospectively analysed. 89 fine needle aspiration biopsies (FNAB) and 13 core needle biopsies (CNB) were performed on 92 patients (67 men, mean age: 64.4 years). 82 lesions (89%) were nodular lesions (mean diameter: 3.8±1.7 cm, 65 peripheral). We did not observe complications among patients who underwent CNB; minor complications and pneumothorax requiring drainage occurred in 11 FNAB. 72 FNAB were considered adequate for cytology diagnosis; 72% of them positive for malignancy. All CNB were adequate and conclusive. From the 7 CNB performed on patients with previous FNAB, 3 allowed a better histological characterization and in 3 cases of inadequate FNAB, CNB was conclusive. All malignant lesions were nodules: 20 adenocarcinoma, 13 non-small cell lung cancer (SCLC), 10 epidermoid tumours, 5 small-cell lung cancer, 2 carcinoids, 1 bronchiolo alveolar carcinoma, 1 malignant mesothelioma and 8 metastasis. Unspecific/ inflammatory lesions (n=5) were the most frequent benign lesions. Malignant lesions were more prevalent in older patients (p=0.007) and were larger (p=0.006). Spiculated and lobulated contour (p=0.05) were more prevalent in malignant lesions while regular contour was more frequent among benign lesions (p=0.0001). Gender, smoking, location, pleural tag, homogenous attenuation, cavitation, calcification, necrosis and air bronchogram did not differ significantly between benign and malignant nodules. This study shows that CT-guided PTB is a safe and effective procedure in the evaluation of undetermined pulmonary lesions.

2006

Multi-strategy learning made easy

Autores
Reinaldo, F; Siqueira, M; Camacho, R; Reis, LP;

Publicação
WSEAS Transactions on Systems

Abstract
This paper presents the AFRANCI tool for the development of Multi-Strategy learning systems. Designing a Multi-Strategy system using AFRANCI is a two step process. The use interactively designs the structure of the system and then chooses the learning strategies for each module. After providing the datasets all modules as automatically trained. The system is aware and takes into consideration the inter-dependency of the modules. The tool has built-in learning algorithms but can use external programs implementing the learning algorithms. The tool has the following facilities. It allows any user to design in an interactive and easy fashion the structure of the target system. The structure of the target system is a collection of interconnected modules. The user may then choose the different learning algorithms to construct each module. The tool has several built-in Machine Learning algorithms has has interfaces that enables it to use external learning tools like WEKA and CN2. AFRANCI uses the interdependency of the modules to determine the sequence of training. For each module the system uses a wrapper to tune automatically the parameters of the learning algorithm. In the final step of the design sequence AFRANCI generates a compact and legible ready-to-use ANSI C open-source code for the final system.

  • 476
  • 516