Συνέδριο

Συγγραφείς: Kourtis I., Stamatatos E.
Τίτλος: Author Identification Using Semi-supervised Learning
Συνέδριο: 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN-11)
Editors:
Ed: Όχι
Eds: Όχι
Σελίδες:
Να εμφανιστεί: Όχι
Μήνας:
Έτος: 2011
Τόπος:
Εκδότης:
Δεσμός:
Όνομα αρχείου:
Περίληψη: Author identification models fall into two major categories according to the way they handle the training texts: profile-based models produce one representation per author while instance-based models produce one representation per text. In this paper, we propose an approach that combines two well-known representatives of these categories, namely the Common n-Grams method and a Support Vector Machine classifier based on character n-grams. The outputs of these classifiers are combined to enrich the training set with additional documents in a repetitive semi-supervised procedure inspired by the co-training algorithm. The evaluation results on closed-set author identification are encouraging, especially when the set of candidate authors is large.