Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I am a researcher at LIAAD, INESC TEC Porto. I have successfully concluded a Ph.D. in Informatics Engineering at FEUP, University of Porto under the supervision of Prof. João Gama. I also hold the degrees of Bachelor’s and Master’s in Computer Applications from Kakatiya University (KU) in India. My research focus is on Machine Learning, Big Data Science, Networked Data Streams, Evolving Graphs, and Social Network Analysis. I was an Organizing Committee member of Discovery Challenge at EPIA’2017, High Velocity Mobile Data Mining Workshop at IEEE MDM’2016 and DSIE’2015.

Interest
Topics
Details

Details

  • Name

    Shazia Tabassum
  • Role

    Assistant Researcher
  • Since

    01st April 2015
002
Publications

2023

Social network analytics and visualization: Dynamic topic-based influence analysis in evolving micro-blogs

Authors
Tabassum, S; Gama, J; Azevedo, PJ; Cordeiro, M; Martins, C; Martins, A;

Publication
EXPERT SYSTEMS

Abstract
Influence Analysis is one of the well-known areas of Social Network Analysis. However, discovering influencers from micro-blog networks based on topics has gained recent popularity due to its specificity. Besides, these data networks are massive, continuous and evolving. Therefore, to address the above challenges we propose a dynamic framework for topic modelling and identifying influencers in the same process. It incorporates dynamic sampling, community detection and network statistics over graph data stream from a social media activity management application. Further, we compare the graph measures against each other empirically and observe that there is no evidence of correlation between the sets of users having large number of friends and the users whose posts achieve high acceptance (i.e., highly liked, commented and shared posts). Therefore, we propose a novel approach that incorporates a user's reachability and also acceptability by other users. Consequently, we improve on graph metrics by including a dynamic acceptance score (integrating content quality with network structure) for ranking influencers in micro-blogs. Additionally, we analysed the topic clusters' structure and quality with empirical experiments and visualization.

2021

Dynamic Topic Modeling Using Social Network Analytics

Authors
Tabassum, S; Gama, J; Azevedo, P; Teixeira, L; Martins, C; Martins, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
Topic modeling or inference has been one of the well-known problems in the area of text mining. It deals with the automatic categorisation of words or documents into similarity groups also known as topics. In most of the social media platforms such as Twitter, Instagram, and Facebook, hashtags are used to define the content of posts. Therefore, modelling of hashtags helps in categorising posts as well as analysing user preferences. In this work, we tried to address this problem involving hashtags that stream in real-time. Our approach encompasses graph of hashtags, dynamic sampling and modularity based community detection over the data from a popular social media engagement application. Further, we analysed the topic clusters' structure and quality using empirical experiments. The results unveil latent semantic relations between hashtags and also show frequent hashtags in a cluster. Moreover, in this approach, the words in different languages are treated synonymously. Besides, we also observed top trending topics and correlated clusters.

2020

privy: Privacy Preserving Collaboration Across Multiple Service Providers to Combat Telecom Spams

Authors
Azad, MA; Bag, S; Tabassum, S; Hao, F;

Publication
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING

Abstract
Nuisance or unsolicited calls and instant messages come at any time in a variety of different ways. These calls would not only exasperate recipients with the unwanted ringing, impacting their productivity, but also lead to a direct financial loss to users and service providers. Telecommunication Service Providers (TSPs) often employ standalone detection systems to classify call originators as spammers or non-spammers using their behavioral patterns. These approaches perform well when spammers target a large number of recipients of one service provider. However, professional spammers try to evade the standalone systems by intelligently reducing the number of spam calls sent to one service provider, and instead distribute calls to the recipients of many service providers. Naturally, collaboration among service providers could provide an effective defense, but it brings the challenge of privacy protection and system resources required for the collaboration process. In this paper, we propose a novel decentralized collaborative system named privy for the effective blocking of spammers who target multiple TSPs. More specifically, we develop a system that aggregates the feedback scores reported by the collaborating TSPs without employing any trusted third party system, while preserving the privacy of users and collaborators. We evaluate the system performance of privy using both the synthetic and real call detail records. We find that privy can correctly block spammers in a quicker time, as compared to standalone systems. Further, we also analyze the security and privacy properties of the privy system under different adversarial models.

2020

On fast and scalable recurring link's prediction in evolving multi-graph streams

Authors
Tabassum, S; Veloso, B; Gama, J;

Publication
NETWORK SCIENCE

Abstract
The link prediction task has found numerous applications in real-world scenarios. However, in most of the cases like interactions, purchases, mobility, etc., links can re-occur again and again across time. As a result, the data being generated is excessively large to handle, associated with the complexity and sparsity of networks. Therefore, we propose a very fast, memory-less, and dynamic sampling-based method for predicting recurring links for a successive future point in time. This method works by biasing the links exponentially based on their time of occurrence, frequency, and stability. To evaluate the efficiency of our method, we carried out rigorous experiments with massive real-world graph streams. Our empirical results show that the proposed method outperforms the state-of-the-art method for recurring links prediction. Additionally, we also empirically analyzed the evolution of links with the perspective of multi-graph topology and their recurrence probability over time.

2020

Interconnect bypass fraud detection: a case study

Authors
Veloso, B; Tabassum, S; Martins, C; Espanha, R; Azevedo, R; Gama, J;

Publication
ANNALS OF TELECOMMUNICATIONS

Abstract
The high asymmetry of international termination rates is fertile ground for the appearance of fraud in telecom companies. International calls have higher values when compared with national ones, which raises the attention of fraudsters. In this paper, we present a solution for a real problem called interconnect bypass fraud, more specifically, a newly identified distributed pattern that crosses different countries and keeps fraudsters from being tracked by almost all fraud detection techniques. This problem is one of the most expressive in the telecommunication domain, and it has some abnormal behaviours like the occurrence of a burst of calls from specific numbers. Based on this assumption, we propose the adoption of a new fast forgetting technique that works together with the Lossy Counting algorithm. We apply frequent set mining to capture distributed patterns from different countries. Our goal is to detect as soon as possible items with abnormal behaviours, e.g., bursts of calls, repetitions, mirrors, distributed behaviours and a small number of calls spread by a vast set of destination numbers. The results show that the application of different techniques improves the detection ratio and not only complements the techniques used by the telecom company but also improves the performance of the Lossy Counting algorithm in terms of run-time, memory used and sensibility to detect the abnormal behaviours. Additionally, the application of frequent set mining allows us to capture distributed fraud patterns.