Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CRACS

2011

The importance of precision in humour classification

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Humour classification is one of the most interesting and difficult tasks in text classification. Humour is subjective by nature, yet humans are able to promptly define their preferences. Nowadays people often search for humour as a relaxing proxy to overcome stressful and demanding situations, having little or no time to search contents for such activities. Hence, we propose to aid the definition of personal models that allow the user to access humour with more confidence on the precision of his preferences. In this paper we focus on a Support Vector Machine (SVM) active learning strategy that uses specific most informative examples to improve baseline performance. Experiments were carried out using the widely available Jester jokes dataset, with encouraging results on the proposed framework. © 2011 Springer-Verlag.

2011

Tunable immune detectors for behaviour-based network intrusion detection

Authors
Antunes, M; Correia, ME;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Computer networks are highly dynamic environments in which the meaning of normal and anomalous behaviours can drift considerably throughout time. Behaviour-based Network Intrusion Detection System (NIDS) have thus to cope with the temporal normality drift intrinsic on computer networks, by tuning adaptively its level of response, in order to be able to distinguish harmful from harmless network traffic flows. In this paper we put forward the intrinsic Tunable Activation Threshold (TAT) theory ability to adaptively tolerate normal drifting network traffic flows. This is embodied on the TAT-NIDS, a TAT-based Artificial Immune System (AIS) we have developed for network intrusion detection. We describe the generic AIS framework we have developed to assemble TAT-NIDS and present the results obtained thus far on processing real network traffic data sets. We also compare the performance obtained by TAT-NIDS with the well known and widely deployed signature-based snort network intrusion detection system. © 2011 Springer-Verlag.

2011

A Hybrid AIS-SVM Ensemble Approach for Text Classification

Authors
Antunes, M; Silva, C; Ribeiro, B; Correia, M;

Publication
ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT II

Abstract
In this paper we propose and analyse methods for expanding state-of-the-art performance on text classification. We put forward an ensemble-based structure that includes Support Vector Machines (SVM) and Artificial Immune Systems (AIS). The underpinning idea is that SVM-like approaches can be enhanced with A IS approaches which can capture dynamics in models. While having radically different genesis, and probably because of that, SVM and AIS can cooperate in a committee setting, using a heterogeneous ensemble to improve overall performance, including a confidence on each system classification as the differentiating factor. Results on the well-known Reuters-21578 benchmark are presented, showing promising classification performance gains, resulting in a classification that improves upon all baseline contributors of the ensemble committee.

2011

T-SPPA: Trended Statistical PreProcessing Algorithm

Authors
Silva, T; Dutra, I;

Publication
DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS, PT 1

Abstract
Traditional machine learning systems learn from non-relational data but in fact most of the real world data is relational. Normally the learning task is done using a single flat file, which prevents the discovery of effective relations among records. Inductive logic programming and statistical relational learning partially solve this problem. In this work, we resource to another method to overcome this problem and propose the T-SPPA: Trended Statistical PreProcessing Algorithm, a preprocessing method that translates related records to one single record before learning. Using different kinds of data, we compare our results when learning with the transformed data with results produced when learning from the original data to demonstrate the efficacy of our method.

2011

Predicting Malignancy from Mammography Findings and Surgical Biopsies

Authors
Ferreira, P; Fonseca, NA; Dutra, I; Woods, R; Burnside, E;

Publication
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011)

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer earlier. The sole exam approved for this purpose is mammography. Usually, findings are annotated through the Breast Imaging Reporting and Data System (BIRADS) created by the American College of Radiology. The BIRADS system determines a standard lexicon to be used by radiologists when studying each finding. Although the lexicon is standard, the annotation accuracy of the findings depends on the experience of the radiologist. Moreover, the accuracy of the classification of a mammography is also highly dependent on the expertise of the radiologist. A correct classification is paramount due to economical and humanitarian reasons. The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a data set consisting of 348 consecutive breast masses that underwent image guided or surgical biopsy performed between October 2005 and December 2007 on 328 female subjects. The main conclusions are threefold: (1) automatic classification of a mammography, independent on information about mass density, can reach equal or better results than the classification performed by a physician; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) a machine learning model can predict mass density with a quality as good as the specialist blind to biopsy, which is one of our main contributions. Our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

2011

Integrating machine learning and physician knowledge to improve the accuracy of breast biopsy.

Authors
Dutra, I; Nassif, H; Page, D; Shavlik, J; Strigel, RM; Wu, Y; Elezaby, ME; Burnside, E;

Publication
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

Abstract
In this work we show that combining physician rules and machine learned rules may improve the performance of a classifier that predicts whether a breast cancer is missed on percutaneous, image-guided breast core needle biopsy (subsequently referred to as "breast core biopsy"). Specifically, we show how advice in the form of logical rules, derived by a sub-specialty, i.e. fellowship trained breast radiologists (subsequently referred to as "our physicians") can guide the search in an inductive logic programming system, and improve the performance of a learned classifier. Our dataset of 890 consecutive benign breast core biopsy results along with corresponding mammographic findings contains 94 cases that were deemed non-definitive by a multidisciplinary panel of physicians, from which 15 were upgraded to malignant disease at surgery. Our goal is to predict upgrade prospectively and avoid surgery in women who do not have breast cancer. Our results, some of which trended toward significance, show evidence that inductive logic programming may produce better results for this task than traditional propositional algorithms with default parameters. Moreover, we show that adding knowledge from our physicians into the learning process may improve the performance of the learned classifier trained only on data.

  • 148
  • 201