Conference

Authors: Kourtis I., Stamatatos E.
Title: Author Identification Using Semi-supervised Learning
Conference: 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN-11)
Editors:
Ed: No
Eds: No
Pages:
To appear: No
Month:
Year: 2011
Place:
Pubisher:
Link:
File name:
Abstract: Author identification models fall into two major categories according to the way they handle the training texts: profile-based models produce one representation per author while instance-based models produce one representation per text. In this paper, we propose an approach that combines two well-known representatives of these categories, namely the Common n-Grams method and a Support Vector Machine classifier based on character n-grams. The outputs of these classifiers are combined to enrich the training set with additional documents in a repetitive semi-supervised procedure inspired by the co-training algorithm. The evaluation results on closed-set author identification are encouraging, especially when the set of candidate authors is large.