Περίληψη: | Author identification is a text categorization task with
applications in intelligence, criminal law, computer forensics, etc.
Usually, in such cases there is shortage of training texts. In this
paper, we propose the use of second order tensors for representing
texts for this problem, in contrast to the traditional vector space
model. Based on a generalization of the SVM algorithm that can
handle tensors, we explore various methods for filling the matrix of
features taking into account that similar features should be placed in
the same neighborhood. To this end, we propose a frequency-based
metric. Experiments on a corpus controlled for genre and topic and
variable amount of training texts show that the proposed approach
is more effective than traditional vector-based SVM when only
limited amount of training texts is used. |