Περίληψη: | The most important approaches to computer-assisted authorship attribution are exclusively
based on lexical measures that either represent the vocabulary richness of the author or simply
comprise frequencies of occurrence of common words. In this paper we present a fully-automated
approach to the identification of the authorship of unrestricted text that excludes any lexical measure.
Instead we adapt a set of style markers to the analysis of the text performed by an already existing
natural language processing tool using three stylometric levels, i.e., token-level, phrase-level, and
analysis-level measures. The latter represent the way in which the text has been analyzed. The
presented experiments on a Modern Greek newspaper corpus show that the proposed set of style
markers is able to distinguish reliably the authors of a randomly-chosen group and performs better
than a lexically-based approach. However, the combination of these two approaches provides the
most accurate solution (i.e., 87% accuracy). Moreover, we describe experiments on various sizes of
the training data as well as tests dealing with the significance of the proposed set of style markers. |