|Title||Think Before Your Click: Data and Models for Adult Content in Arabic Twitter|
|Publication Type||Conference Paper|
|Year of Publication||2017|
|Authors||Alshehri A, Nagoudi EMoatez Bil, Alhuzali H, Abdul-Mageed M|
|Conference Name||Second Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS 2018)|
|Publisher||European Language Resources Association (ELRA)|
|Conference Location||Miyazaki, Japan|
Given the widespread use of social media and their growing role in our lives today, there is a pressing need for ensuring the safety of these online spaces. In particular, the spread of adult content in social networks is undesirable by various social groups and may even pose a threat to others (e.g., children). In this work, we develop a unique, large-scale dataset of adult content in Arabic Twitter and provide in-depth analyses of the data. The dataset enables us to study the scope and distribution of adult content in the Arabic Twitter sphere, thus possibly uncovering target geographic locales. We also exploit the data to learn a large lexicon specific to topic of adult content. We further utilize the data to to detect spreaders of adult content on the microblogging platform. Our models achieve promising results, reaching 0:79% accuracy on the task (24% higher than a competitive baseline, p < 0:3).