Multilabel user classification using the community structure of online networks

PLoS One. 2017 Mar 9;12(3):e0173347. doi: 10.1371/journal.pone.0173347. eCollection 2017.

Abstract

We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user's graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score.

MeSH terms

  • Algorithms
  • Computer Graphics
  • Internet*
  • Models, Theoretical
  • Social Networking*

Grants and funding

GR, SP and YK received support from the Community Research and Development Information Service of the European Commission under the contract number 610928. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.