Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Mohammad Nozari

2016

Collaborative Data Analysis in Hyperconnected Transportation Systems

Autores
Zarmehri, MN; Soares, C;

Publicação
COLLABORATION IN A HYPERCONNECTED WORLD

Abstract
Taxi trip duration affects the efficiency of operation, the satisfaction of drivers, and, mainly, the satisfaction of the customers, therefore, it is an important metric for the taxi companies. Especially, knowing the predicted trip duration beforehand is very useful to allocate taxis to the taxi stands and also finding the best route for different trips. The existence of hyperconnected network can help to collect data from connected taxis in the city environment and use it collaboratively between taxis for a better prediction. As a matter of fact, the existence of high volume of data, for each individual taxi, several models can be generated. Moreover, taking into account the difference between the data collected by taxis, this data can be organized into different levels of hierarchy. However, finding the best level of granularity which leads to the best model for an individual taxi could be computationally expensive. In this paper, the use of metalearning for addressing the problem of selection of the right level of the hierarchy and the right algorithm that generates the model with the best performance for each taxi is proposed. The proposed approach is evaluated by the data collected in the Drive-In project. The results show that metalearning helps the selection of the algorithm with the best performance.

2015

Metalearning to Choose the Level of Analysis in Nested Data: A Case Study on Error Detection in Foreign Trade Statistics

Autores
Zarmehri, MN; Soares, C;

Publicação
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

Abstract
Traditionally, a single model is developed for a data mining task. As more data is being collected at a more detailed level, organizations are becoming more interested in having specific models for distinct parts of data (e. g. customer segments). From the business perspective, data can be divided naturally into different dimensions. Each of these dimensions is usually hierarchically organized (e. g. country, city, zip code), which means that, when developing a model for a given part of the problem (e. g. a zip code) the training data may be collected at different levels of this nested hierarchy (e. g. the same zip code, the city and the country it is located in). Selecting different levels of granularity may change the performance of the whole process, so the question is which level to use for a given part. We propose a metalearning model which recommends a level of granularity for the training data to learn the model that is expected to obtain the best performance. We apply decision tree and random forest algorithms for metalearning. At the base level, our experiment uses results obtained by outlier detection methods on the problem of detecting errors in foreign trade transactions. The results show that using metalearning help finding the best level of granularity.

2013

POPSTAR at RepLab 2013: Name ambiguity resolution on Twitter

Autores
Saleiro, P; Rei, L; Pasquali, A; Soares, C; Teixeira, J; Pinto, F; Nozari, M; Felix, C; Strecht, P;

Publicação
CEUR Workshop Proceedings

Abstract
Filtering tweets relevant to a given entity is an important task for online reputation management systems. This contributes to a reliable analysis of opinions and trends regarding a given entity. In this paper we describe our participation at the Filtering Task of RepLab 2013. The goal of the competition is to classify a tweet as relevant or not relevant to a given entity. To address this task we studied a large set of features that can be generated to describe the relationship between an entity and a tweet. We explored different learning algorithms as well as, different types of features: text, keyword similarity scores between enti-ties metadata and tweets, Freebase entity graph and Wikipedia. The test set of the competition comprises more than 90000 tweets of 61 entities of four distinct categories: automotive, banking, universities and music. Results show that our approach is able to achieve a Reliability of 0.72 and a Sensitivity of 0.45 on the test set, corresponding to an F-measure of 0.48 and an Accuracy of 0.908.

2015

Using Metalearning for Prediction of Taxi Trip Duration Using Different Granularity Levels

Autores
Zarmehri, MN; Soares, C;

Publicação
Advances in Intelligent Data Analysis XIV

Abstract
Trip duration is an important metric for the management of taxi companies, as it affects operational efficiency, driver satisfaction and, above all, customer satisfaction. In particular, the ability to predict trip duration in advance can be very useful for allocating taxis to stands and finding the best route for trips. A data mining approach can be used to generate models for trip time prediction. In fact, given the amount of data available, different models can be generated for different taxis. Given the difference between the data collected by different taxis, the best model for each one can be obtained with different algorithms and/or parameter settings. However, finding the configuration that generates the best model for each taxi is computationally very expensive. In this paper, we propose the use of metalearning to address the problem of selecting the algorithm that generates the model with the most accurate predictions for each taxi. The approach is tested on data collected in the Drive-In project. Our results show that metalearning can help to select the algorithm with the best accuracy.

2017

Entropy and Compression Capture Different Complexity Features: The Case of Fetal Heart Rate

Autores
Monteiro Santos, J; Goncalves, H; Bernardes, J; Antunes, L; Nozari, M; Costa Santos, C;

Publicação
ENTROPY

Abstract
Entropy and compression have been used to distinguish fetuses at risk of hypoxia from their healthy counterparts through the analysis of Fetal Heart Rate (FHR). Low correlation that was observed between these two approaches suggests that they capture different complexity features. This study aims at characterizing the complexity of FHR features captured by entropy and compression, using as reference international guidelines. Single and multi-scale approaches were considered in the computation of entropy and compression. The following physiologic-based features were considered: FHR baseline; percentage of abnormal long (% abLTV) and short (% abSTV) term variability; average short term variability; and, number of acceleration and decelerations. All of the features were computed on a set of 68 intrapartum FHR tracings, divided as normal, mildly, and moderately-severely acidemic born fetuses. The correlation between entropy/compression features and the physiologic-based features was assessed. There were correlations between compressions and accelerations and decelerations, but neither accelerations nor decelerations were significantly correlated with entropies. The % abSTV was significantly correlated with entropies (ranging between 0.54 and 0.62), and to a higher extent with compression (ranging between 0.80 and 0.94). Distinction between groups was clearer in the lower scales using entropy and in the higher scales using compression. Entropy and compression are complementary complexity measures.

2013

Numerical Limits for Data Gathering in Wireless Networks

Autores
Zarmehri, MN; Aguiar, A;

Publicação
2013 IEEE 24TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC)

Abstract
In our previous work, we proposed to use a vehicle network for data gathering, i.e. as an urban sensor. In this paper, we aim at understanding the theoretical limits of data gathering in a time slotted wireless network in terms of maximum service rate per node and end to end packet delivery ratio. The capacity of wireless networks has been widely studied and boundaries for that capacity expressed in Bachmann-Landau notation [ 1]. But these asymptotic limits do not clarify the numeric limits on data packets that can be carried by a wireless network. In this paper, we calculate the maximum data that each node can generate before saturating the network. The expected number of collision and its effect of the PDR% and service rate are investigated. The results quantify the trade off between packet delivery rate and service rate. Finally, we verify our analytical results by simulating the same scenario.

  • 1
  • 2