Συγγραφείς: | Kourtis I., Stamatatos E. |
---|
Τίτλος: | Author Identification Using Semi-supervised Learning |
---|
Συνέδριο: | 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN-11) |
---|
Editors: | |
---|
Ed: | Όχι |
---|
Eds: | Όχι |
---|
Σελίδες: | |
---|
Να εμφανιστεί: | Όχι |
---|
Μήνας: | |
---|
Έτος: | 2011 |
---|
Τόπος: | |
---|
Εκδότης: | |
---|
Δεσμός: | |
---|
Όνομα αρχείου: | |
---|
Περίληψη: | Author identification models fall into two major categories according to the way they handle the training texts: profile-based models produce one representation per author while instance-based models produce one representation per text. In this paper, we propose an approach that combines two well-known representatives of these categories, namely the Common n-Grams method and a Support Vector Machine classifier based on character n-grams. The outputs of these classifiers are combined to enrich the training set with additional documents in a repetitive semi-supervised procedure inspired by the co-training algorithm. The evaluation results on closed-set author identification are encouraging, especially when the set of candidate authors is large. |