Περίληψη: | The two main factors that characterize a text are its content and its style, and both can be used
as a means of categorization. In this paper we present an approach to text categorization in
terms of genre and author for Modern Greek. In contrast to previous stylometric approaches,
we attempt to take full advantage of existing natural language processing (NLP) tools. To this
end, we propose a set of style markers including analysis-levelmeasures that represent the way in
which the input text has been analyzed and capture useful stylistic information without additional
cost. We present a set of small-scale but reasonable experiments in text genre detection, author
identication, and author verication tasks and show that the proposed method performs better
than the most popular distributional lexical measures, i.e., functions of vocabulary richness and
frequencies of occurrence of the most frequent words. All the presented experiments are based on
unrestricted text downloaded from the World Wide Web without any manual text preprocessing
or text sampling.Various performance issues regarding the training set size and the signicance of
the proposed style markers are discussed.Our system can be used in any application that requires
fast and easily adaptable text categorization in terms of stylistically homogeneous categories.
Moreover, the procedure of dening analysis-level markers can be followed in order to extract
useful stylistic information using existing text processing tools. |