Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CRACS

2015

DOTS: Drift Oriented Tool System

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV

Abstract
Drift is a given in most machine learning applications. The idea that models must accommodate for changes, and thus be dynamic, is ubiquitous. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. There are multiple drift patterns types: concepts that appear and disappear suddenly, recurrently, or even gradually or incrementally. Researchers strive to propose and test algorithms and techniques to deal with drift in text classification, but it is difficult to find adequate benchmarks in such dynamic environments. In this paper we present DOTS, Drift Oriented Tool System, a framework that allows for the definition and generation of text-based datasets where drift characteristics can be thoroughly defined, implemented and tested. The usefulness of DOTS is presented using a Twitter stream case study. DOTS is used to define datasets and test the effectiveness of using different document representation in a Twitter scenario. Results show the potential of DOTS in machine learning research.

2015

Performance Evaluation of Statistical Functions

Authors
Rodrigues, A; Silva, C; Borges, P; Silva, S; Dutra, I;

Publication
2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY)

Abstract
Statistical data analysis methods are well known for their difficulty in handling large number of instances or large number of parameters. This is most noticeable in the presence of "big data", i.e., of data that are heterogeneous, and come from several sources, which makes their volume increase very rapidly. In this paper, we study popular and well-known statistical functions generally applied to data analysis, and assess their performance using our own implementation (DataIP) 1, MatLab and R. We show that DataIP outperforms MatLab and R by several orders of magnitude and that the design and implementation of these functions need to be rethought to adapt to today's data challenges.

2015

Predicting malignancy from mammography findings and image-guided core biopsies

Authors
Ferreira, P; Fonseca, NA; Dutra, I; Woods, R; Burnside, E;

Publication
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS

Abstract
The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a dataset consisting of 348 consecutive breast masses that underwent image guided core biopsy performed between October 2005 and December 2007 on 328 female subjects. We applied various algorithms with parameter variation to learn from the data. The tasks were to predict mass density and to predict malignancy. The best classifier that predicts mass density is based on a support vector machine and has accuracy of 81.3%. The expert correctly annotated 70% of the mass densities. The best classifier that predicts malignancy is also based on a support vector machine and has accuracy of 85.6%, with a positive predictive value of 85%. One important contribution of this work is that our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

2015

Accelerating Recommender Systems using GPUs

Authors
Rodrigues, AV; Jorge, A; Dutra, I;

Publication
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II

Abstract
We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi- core versions of the same algorithms. Results on the GPU are better than the results of the multi- core versions (maximum speedup of 14.8).

2015

Accelerating Recommender Systems using GPUs

Authors
Rodrigues, AV; Jorge, A; Dutra, I;

Publication
CoRR

Abstract

2015

Grid computing: Techniques and future prospects

Authors
Barbosa, JG; Dutra, I;

Publication
Grid Computing: Techniques and Future Prospects

Abstract
In the past two decades, grid computing have fostered advances in several scientific domains by making resources available to a wide community and bridging scientific gaps. Grid infrastructures have been harnessing computational resources all around the world allowing all kinds of parallelisms to be explored. Other approaches to parallel and distributed computing still exist like the use of dedicated high-performance (HPC) infrastructures, and the use of clouds for computing and storage, but grid computing continues to be the predominant technology used for scientific computing in Europe, through the European Grid Infrastructure (EGI) and the European Middleware Initiative (EMI). Currently, there is a trend towards the use of cloud technologies for computing and storage. In Europe, this trend is being followed by taking advantage of all the experiences gained from building grid infrastructures and the technologies developed around them (resource management orchestration, unified job description languages, security, user interfaces, programming models, and scheduling policies, among others). As a result, the European Grid Infrastructure Federated Cloud is being built on top of the grid infrastructure already available. After almost two decades of the development of grid software and components and the emergence of competing technologies, now is the time to discuss current trends and to assess future prospects. When organizing this book, the authors considered contributions that would review the current grid computing scenario as well as contributions that would summarize the main tools and technologies used so far. The chapters in this book provide reviews for the following topics: a) performance prediction for parallel and distributed computing systems, b) resource sharing on computational grids, c) economic models for resource management, and d) programming frameworks. The chapters address grid issues such as a) the challenges of designing efficient job schedulers for production grids, b) scalability analysis of bag-of-tasks applications, c) the energy efficiency of resource reservation-based scheduling, and d) the development of parallel applications using the grid environment. Additionally, the following tools are presented: a) a programming framework based on the concept of a pluggable grid service that avoids explicit calls to grid services in scientific code and b) a desktop grid framework that runs on top of a cloud and can be deployed on the fly. The authors were each invited to contribute a chapter to this book, which were carefully revised and selected based on their originality and the value of their contribution to the overall discussion on grid computing and its future prospects.

  • 105
  • 202