Authors: | Kourtis I., Stamatatos E. |
---|
Title: | Author Identification Using Semi-supervised Learning |
---|
Conference: | 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN-11) |
---|
Editors: | |
---|
Ed: | No |
---|
Eds: | No |
---|
Pages: | |
---|
To appear: | No |
---|
Month: | |
---|
Year: | 2011 |
---|
Place: | |
---|
Pubisher: | |
---|
Link: | |
---|
File name: | |
---|
Abstract: | Author identification models fall into two major categories according to the way they handle the training texts: profile-based models produce one representation per author while instance-based models produce one representation per text. In this paper, we propose an approach that combines two well-known representatives of these categories, namely the Common n-Grams method and a Support Vector Machine classifier based on character n-grams. The outputs of these classifiers are combined to enrich the training set with additional documents in a repetitive semi-supervised procedure inspired by the co-training algorithm. The evaluation results on closed-set author identification are encouraging, especially when the set of candidate authors is large. |