Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2011

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Autores
Earl, D; Bradnam, K; St John, J; Darling, A; Lin, DW; Fass, J; Hung, OKY; Buffalo, V; Zerbino, DR; Diekhans, M; Nguyen, N; Ariyaratne, PN; Sung, WK; Ning, ZM; Haimel, M; Simpson, JT; Fonseca, NA; Birol, I; Docking, TR; Ho, IY; Rokhsar, DS; Chikhi, R; Lavenier, D; Chapuis, G; Naquin, D; Maillet, N; Schatz, MC; Kelley, DR; Phillippy, AM; Koren, S; Yang, SP; Wu, W; Chou, WC; Srivastava, A; Shaw, TI; Ruby, JG; Skewes Cox, P; Betegon, M; Dimon, MT; Solovyev, V; Seledtsov, I; Kosarev, P; Vorobyev, D; Ramirez Gonzalez, R; Leggett, R; MacLean, D; Xia, FF; Luo, RB; Li, ZY; Xie, YL; Liu, BH; Gnerre, S; MacCallum, I; Przybylski, D; Ribeiro, FJ; Yin, SY; Sharpe, T; Hall, G; Kersey, PJ; Durbin, R; Jackman, SD; Chapman, JA; Huang, XQ; DeRisi, JL; Caccamo, M; Li, YR; Jaffe, DB; Green, RE; Haussler, D; Korf, I; Paten, B;

Publicação
GENOME RESEARCH

Abstract
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: ( 1) It is possible to assemble the genome to a high level of coverage and accuracy, and that ( 2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

2011

Amino acid pair- and triplet-wise groupings in the interior of alpha-helical segments in proteins

Autores
de Sousa, MM; Munteanu, CR; Pazos, A; Fonseca, NA; Camacho, R; Magalhaes, AL;

Publicação
JOURNAL OF THEORETICAL BIOLOGY

Abstract
A statistical approach has been applied to analyse primary structure patterns at inner positions of alpha-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse alpha-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the alpha-helix were considered as inner. Amino acid pairings i, i+k(k = 1, 2, 3,4, 5), were analysed and the corresponding 20 x 20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for alpha-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools.

2011

PopAffiliator: online calculator for individual affiliation to a major population group based on 17 autosomal short tandem repeat genotype profile

Autores
Pereira, L; Alshamali, F; Andreassen, R; Ballard, R; Chantratita, W; Cho, NS; Coudray, C; Dugoujon, JM; Espinoza, M; Gonzalez Andrade, F; Hadi, S; Immel, UD; Marian, C; Gonzalez Martin, A; Mertens, G; Parson, W; Perone, C; Prieto, L; Takeshita, H; Rangel Villalobos, HR; Zeng, ZS; Zhivotovsky, L; Camacho, R; Fonseca, NA;

Publicação
INTERNATIONAL JOURNAL OF LEGAL MEDICINE

Abstract
Because of their sensitivity and high level of discrimination, short tandem repeat (STR) maker systems are currently the method of choice in routine forensic casework and data banking, usually in multiplexes up to 15-17 loci. Constraints related to sample amount and quality, frequently encountered in forensic casework, will not allow to change this picture in the near future, notwithstanding the technological developments. In this study, we present a free online calculator named PopAffiliator (http://cracs.fc.up.pt/popaffiliator) for individual population affiliation in the three main population groups, Eurasian, East Asian and sub-Saharan African, based on genotype profiles for the common set of STRs used in forensics. This calculator performs affiliation based on a model constructed using machine learning techniques. The model was constructed using a data set of approximately fifteen thousand individuals collected for this work. The accuracy of individual population affiliation is approximately 86%, showing that the common set of STRs routinely used in forensics provide a considerable amount of information for population assignment, in addition to being excellent for individual identification.

2011

Predicting Malignancy from Mammography Findings and Surgical Biopsies

Autores
Ferreira, P; Fonseca, NA; Dutra, I; Woods, R; Burnside, E;

Publicação
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011)

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer earlier. The sole exam approved for this purpose is mammography. Usually, findings are annotated through the Breast Imaging Reporting and Data System (BIRADS) created by the American College of Radiology. The BIRADS system determines a standard lexicon to be used by radiologists when studying each finding. Although the lexicon is standard, the annotation accuracy of the findings depends on the experience of the radiologist. Moreover, the accuracy of the classification of a mammography is also highly dependent on the expertise of the radiologist. A correct classification is paramount due to economical and humanitarian reasons. The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a data set consisting of 348 consecutive breast masses that underwent image guided or surgical biopsy performed between October 2005 and December 2007 on 328 female subjects. The main conclusions are threefold: (1) automatic classification of a mammography, independent on information about mass density, can reach equal or better results than the classification performed by a physician; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) a machine learning model can predict mass density with a quality as good as the specialist blind to biopsy, which is one of our main contributions. Our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

2011

STUDYING THE RELEVANCE OF BREAST IMAGING FEATURES

Autores
Ferreira, P; Dutra, I; Fonseca, NA; Woods, R; Burnside, E;

Publicação
HEALTHINF 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON HEALTH INFORMATICS

Abstract
Breast screening is the regular examination of a woman's breasts to find breast cancer in an initial stage. The sole exam approved for this purpose is mammography that, despite the existence of more advanced technologies, is considered the cheapest and most efficient method to detect cancer in a preclinical stage. We investigate, using machine learning techniques, how attributes obtained from mammographies can relate to malignancy. In particular, this study focus is on how mass density can influence malignancy from a data set of 348 patients containing, among other information, results of biopsies. To this end, we applied different learning algorithms on the data set using the WEKA tools, and performed significance tests on the results. The conclusions are threefold: (1) automatic classification of a mammography can reach equal or better results than the ones annotated by specialists, which can help doctors to quickly concentrate on some specific mammogram for a more thorough study; (2) mass density seems to be a good indicator of malignancy, as previous studies suggested; (3) we can obtain classifiers that can predict mass density with a quality as good as the specialist blind to biopsy.

2011

A Relational Learning Approach to Structure-Activity Relationships in Drug Design Toxicity Studies

Autores
Camacho, Rui; Pereira, Max; Costa, VitorSantos; Fonseca, NunoA.; Gonçalves, CarlosAdriano; Simões, CarlosJ.V.; Brito, RuiM.M.;

Publicação
J. Integrative Bioinformatics

Abstract
It has been recognized that the development of new therapeutic drugs is a complex and expensive process. A large number of factors affect the activity in vivo of putative candidate molecules and the propensity for causing adverse and toxic effects is recognized as one of the major hurdles behind the current "target-rich, lead-poor" scenario. Structure-Activity Relationship (SAR) studies, using relational Machine Learning (ML) algorithms, have already been shown to be very useful in the complex process of rational drug design. Despite the ML successes, human expertise is still of the utmost importance in the drug development process. An iterative process and tight integration between the models developed by ML algorithms and the know-how of medicinal chemistry experts would be a very useful symbiotic approach. In this paper we describe a software tool that achieves that goal--iLogCHEM. The tool allows the use of Relational Learners in the task of identifying molecules or molecular fragments with potential to produce toxic effects, and thus help in stream-lining drug design in silico. It also allows the expert to guide the search for useful molecules without the need to know the details of the algorithms used. The models produced by the algorithms may be visualized using a graphical interface, that is of common use amongst researchers in structural biology and medicinal chemistry. The graphical interface enables the expert to provide feedback to the learning system. The developed tool has also facilities to handle the similarity bias typical of large chemical databases. For that purpose the user can filter out similar compounds when assembling a data set. Additionally, we propose ways of providing background knowledge for Relational Learners using the results of Graph Mining algorithms. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.

  • 396
  • 506