Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Pavel Brazdil

2009

Learning cost-sensitive decision trees to support medical diagnosis

Autores
Freitas, A; Costa Pereira, A; Brazdil, P;

Publicação
Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development: Innovative Methods and Applications

Abstract
Classification plays an important role in medicine, especially for medical diagnosis. Real-world medical applications often require classifiers that minimize the total cost, including costs for wrong diagnosis (misclassifications costs) and diagnostic test costs (attribute costs). There are indeed many reasons for considering costs in medicine, as diagnostic tests are not free and health budgets are limited. In this chapter, the authors have defined strategies for cost-sensitive learning. They have developed an algorithm for decision tree induction that considers various types of costs, including test costs, delayed costs and costs associated with risk. Then they have applied their strategy to train and to evaluate cost-sensitive decision trees in medical data. Generated trees can be tested following some strategies, including group costs, common costs, and individual costs. Using the factor of "risk" it is possible to penalize invasive or delayed tests and obtain patient-friendly decision trees. © 2010, IGI Global.

2009

Cost-sensitive learning in medicine

Autores
Freitas, A; Brazdil, P; Costa Pereira, A;

Publicação
Data Mining and Medical Knowledge Management: Cases and Applications

Abstract
This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to minimize several types of costs associated with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification costs (e.g. the cost of a false negative test). In fact, as in other professional areas, both diagnostic tests and its associated misclassification errors can have significant financial or human costs, including the use of unnecessary resource and patient safety issues. This chapter presents some concepts related to cost-sensitive learning and cost-sensitive classification and its application to medicine. Different types of costs are also present, with an emphasis on diagnostic tests and misclassification costs. In addition, an overview of research in the area of cost-sensitive learning is given, including current methodological approaches. Finally, current methods for the cost-sensitive evaluation of classifiers are discussed. © 2009, IGI Global.

2012

Factors influencing hospital high length of stay outliers

Autores
Freitas, A; Silva Costa, T; Lopes, F; Garcia Lema, I; Teixeira Pinto, A; Brazdil, P; Costa Pereira, A;

Publicação
BMC HEALTH SERVICES RESEARCH

Abstract
Background: The study of length of stay (LOS) outliers is important for the management and financing of hospitals. Our aim was to study variables associated with high LOS outliers and their evolution over time. Methods: We used hospital administrative data from inpatient episodes in public acute care hospitals in the Portuguese National Health Service (NHS), with discharges between years 2000 and 2009, together with some hospital characteristics. The dependent variable, LOS outliers, was calculated for each diagnosis related group (DRG) using a trim point defined for each year by the geometric mean plus two standard deviations. Hospitals were classified on the basis of administrative, economic and teaching characteristics. We also studied the influence of comorbidities and readmissions. Logistic regression models, including a multivariable logistic regression, were used in the analysis. All the logistic regressions were fitted using generalized estimating equations (GEE). Results: In near nine million inpatient episodes analysed we found a proportion of 3.9% high LOS outliers, accounting for 19.2% of total inpatient days. The number of hospital patient discharges increased between years 2000 and 2005 and slightly decreased after that. The proportion of outliers ranged between the lowest value of 3.6% (in years 2001 and 2002) and the highest value of 4.3% in 2009. Teaching hospitals with over 1,000 beds have significantly more outliers than other hospitals, even after adjustment to readmissions and several patient characteristics. Conclusions: In the last years both average LOS and high LOS outliers are increasing in Portuguese NHS hospitals. As high LOS outliers represent an important proportion in the total inpatient days, this should be seen as an important alert for the management of hospitals and for national health policies. As expected, age, type of admission, and hospital type were significantly associated with high LOS outliers. The proportion of high outliers does not seem to be related to their financial coverage; they should be studied in order to highlight areas for further investigation. The increasing complexity of both hospitals and patients may be the single most important determinant of high LOS outliers and must therefore be taken into account by health managers when considering hospital costs.

2005

Predicting relative performance of classifiers from samples

Autores
Leite, R; Brazdil, P;

Publicação
ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

Abstract
This paper is concerned with the problem of predicting relative performance of classification algorithms. It focusses on methods that use results on small samples and discusses the shortcomings of previous approaches. A new variant is proposed that exploits, as some previous approaches, meta-learning. The method requires that experiments be conducted on few samples. The information gathered is used to identify the nearest learning curve for which the sampling procedure was carried out fully. This in turn permits to generate a prediction regards the relative performance of algorithms. Experimental evaluation shows that the method competes well with previous approaches and provides quite good and practical solution to this problem.

2007

A putative gene located at the MHC class I region around the D6S105 marker contributes to the setting of CD8+T-lymphocyte numbers in humans

Autores
Vieira, J; Cardoso, CS; Pinto, J; Patil, K; Brazdil, P; Cruz, E; Mascarenhas, C; Lacerda, R; Gartner, A; Almeida, S; Alves, H; Porto, G;

Publicação
INTERNATIONAL JOURNAL OF IMMUNOGENETICS

Abstract
Significant associations between human leucocyte antigen (HLA)-A and -B alleles and CD8+ T-lymphocyte numbers have been reported in the literature in both healthy populations and in HFE-haemochromatosis patients. In order to address whether HLA alleles themselves or alleles at linked genes are responsible for these associations, several genetic markers at the MHC class I region were typed on a population of 147 apparently healthy unrelated subjects phenotypically characterized for their CD8+ and CD4+ T-lymphocyte numbers. By using a machine learning approach, a set of rules was generated that predict the number of CD8+ T-lymphocyte numbers on the basis of the information of the D6S105 microsatellite alleles only. We demonstrate that the previously reported associations with HLA-A and -B alleles are due to the presence of common long (up to 4 megabases long) haplotypes that increased in frequency recently due to positive selection and that encompass a region where a putative gene contributing to the setting of CD8+ T lymphocytes is located, in the neighbourhood of microsatellite locus D6S105, in the 6p21.3 region.

2012

Selecting classification algorithms with active testing

Autores
Leite, R; Brazdil, P; Vanschoren, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This 'most promising' competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods. © 2012 Springer-Verlag.

  • 13
  • 21