Polarized User and Topic Tracking in Twitter
Short paper accepted at SIGIR ’16: ACM Conference on Research and Development in Information Retrieval .
Abstract. Digital traces of conversations in micro-blogging platforms and in OSNs provide information about user opinion with a high degree of resolution. These information sources can be exploited to understand and monitor collective behaviors. In this work, we focus on polarization classes, i.e., those topics that require the user to side exclusively with one position. The proposed method provides an iterative classiﬁcation of users and keywords: ﬁrst, polarized users are identiﬁed, then polarized keywords are discovered by monitoring the activities of previously classiﬁed users. This method thus allows to track users and topics over time. We report several experiments conducted on two Twitter datasets during political election time-frames. We measure the user classiﬁcation accuracy on a golden set of users and we provide an analysis of the relevance of the extracted keywords for the ongoing political discussion.
Our method requires some initial seed topics that identify the classes of interests. We propose to identify them with a single textual keyword for each class. Although each keyword identiﬁes a topic, e.g., a political party, it is not suﬃcient to correctly classify users.
The Polarization TRacker (PTR) algorithm iterates the two classiﬁcation steps, namely UserClass and HashtagsClass, that continuously improve the classiﬁcation into polarization classes of users and of the hashtags they used. The goal of the ﬁrst step is to identify polarized users on the basis of the given hashtags. First, we identify polarized tweets, which mention seed hashtags. We discard all of those tweets which contain hashtags belonging to more than one polirized class. We thus measure the user polarization, i.e., if for a polarization class, the number of tweets by a user is signiﬁcantly larger than for any other class, then the user is labeled with the corresponding class. The goal the second step is to process all the hashtags adopted by classiﬁed users in order to discover a new set of discriminating hashtags. We take into considerations all the hashtags used, and not only those occurring in the polarized tweets as in the previous step. This allows to extend our analysis to the full set of topics discussed by the users, even if they were not captured in the early iterations of the algorithm.
We built an evaluation dataset by identifying those users whose opinion can be inferred with high conﬁdence. During elections, as for other events, very speciﬁc hashtags are used over Twitter to express a strong intention of vote or an explicit membership in a group. We assume that users that frequently use one of such hashtags are strongly sided with one of the competing parties and they will not change idea in the short term. Such hashtags, named golden hashtags, were handpicked among the 500 most frequent in the data. The used golden hashtags are of the kind #i-vote-party.
This evaluation dataset id used to evaluate the user classiﬁcation accuracy of the proposed algorithm. Experimental results show that the F-measure achieved by the proposed PTR provides an overall improvement w.r.t. the k-means baseline of +71% and +7% on datasets IT13 and EU14 respectively.