Octopus: University of The Aegean Research & Project Outputs System

Περιοδικό

Συγγραφείς:	Stamatatos E., Fakotakis N., Kokkinakis G.
Τίτλος:	Computer-Based Authorship Attribution without Lexical Measures
Περιοδικό:	Computers and the Humanities
Volume:	35
Αριθμός:	2
Σελίδες:	193-214
Έτος:	2001
Εκδότης:	Kluwer
Να εμφανιστεί:	Όχι
Δεσμός:	http://dx.doi.org/10.1023/A:1002681919510
ISI:	Όχι
Impact Factor:
Όνομα αρχείου:
Περίληψη:	The most important approaches to computer-assisted authorship attribution are exclusively based on lexical measures that either represent the vocabulary richness of the author or simply comprise frequencies of occurrence of common words. In this paper we present a fully-automated approach to the identification of the authorship of unrestricted text that excludes any lexical measure. Instead we adapt a set of style markers to the analysis of the text performed by an already existing natural language processing tool using three stylometric levels, i.e., token-level, phrase-level, and analysis-level measures. The latter represent the way in which the text has been analyzed. The presented experiments on a Modern Greek newspaper corpus show that the proposed set of style markers is able to distinguish reliably the authors of a randomly-chosen group and performs better than a lexically-based approach. However, the combination of these two approaches provides the most accurate solution (i.e., 87% accuracy). Moreover, we describe experiments on various sizes of the training data as well as tests dealing with the significance of the proposed set of style markers.