Περίληψη: | Authorship attribution supported by statistical or computational
methods has a long history starting from the 19th
century and is marked by the seminal study of Mosteller
and Wallace (1964) on the authorship of the disputed
“Federalist Papers.”During the last decade, this scientific
field has been developed substantially, taking advantage
of research advances in areas such as machine learning,
information retrieval, and natural language processing.
The plethora of available electronic texts (e.g., e-mail messages,
online forum messages, blogs, source code, etc.)
indicates a wide variety of applications of this technology,
provided it is able to handle short and noisy text
from multiple candidate authors. In this article, a survey
of recent advances of the automated approaches
to attributing authorship is presented, examining their
characteristics for both text representation and text classification.
The focus of this survey is on computational
requirements and settings rather than on linguistic or
literary issues. We also discuss evaluation methodologies
and criteria for authorship attribution studies and list
open questions that will attract future work in this area. |