Octopus: University of The Aegean Research & Project Outputs System

Journal

Authors:	Stamatatos E.
Title:	On the Robustness of Authorship Attribution Based on Character n-gram Features
Journal:	Journal of Law and Policy
Volume:	21
Number:	2
Pages:	421-439
Year:	2013
Publisher:	Brooklyn Law School
To appear:	No
Link:	http://practicum.brooklaw.edu/journals/journal-law-and-policy/volume-21/issue-2/robustness-authorship-attribution-based-character
ISI:	No
Impact Factor:
File name:
Abstract:	A number of independent authorship attribution studies have demonstrated the effectiveness of character n-gram features for representing the stylistic properties of text. However, the vast majority of these studies examined the simple case where the training and test corpora are similar in terms of genre, topic, and distribution of the texts. Hence, there are doubts whether such a simple and low-level representation is equally effective in realistic conditions where some of the above factors are not possible to remain stable. In this study, the robustness of authorship attribution based on character n-gram features is tested under cross-genre and cross-topic conditions. In addition, the distribution of texts over the candidate authors varies in training and test corpora to imitate real cases. Comparative results with another competitive text representation approach based on very frequent words show that character n-grams are better able to capture stylistic properties of text when there are significant differences among the training and test corpora. Moreover, a set of guidelines to tune an authorship attribution model according to the properties of training and test corpora is provided.