Περίληψη: | This paper deals with the problem of author
identification. The Common N-Grams (CNG) method
[6] is a language-independent profile-based approach
with good results in many author identification
experiments so far. A variation of this approach is
presented based on new distance measures that are
quite stable for large profile length values. Special
emphasis is given to the degree upon which the
effectiveness of the method is affected by the available
training text samples per author. Experiments based on
text samples on the same topic from the Reuters
Corpus Volume 1 are presented using both balanced
and imbalanced training corpora. The results show
that CNG with the proposed distance measures is more
accurate when only limited training text samples are
available, at least for some of the candidate authors, a
realistic condition in author identification problems. |