Συνέδριο

Συγγραφείς: Pritsos D., Stamatatos E.
Τίτλος: Open-Set Classification for Automated Genre Identification
Συνέδριο: Advances in Information Retrieval - 35th European Conference on IR Research (ECIR 2013)
Editors:
Ed: Όχι
Eds: Όχι
Σελίδες: 207-217
Να εμφανιστεί: Όχι
Μήνας:
Έτος: 2013
Τόπος:
Εκδότης: Springer LNCS
Δεσμός:
Όνομα αρχείου:
Περίληψη: Automated Genre Identification (AGI) of web pages is a problem of increasing importance since web genre (e.g. blog, news, eshops, etc.) information can enhance modern Information Retrieval (IR) systems. The state-of-the-art in this field considers AGI as a closed-set classification problem where a variety of web page representation and machine learning models have intensively studied. In this paper, we study AGI as an open-set classification problem which better formulates the real world conditions of exploiting AGI in practice. Focusing on the use of content information, different text representation methods (words and character n-grams) are tested. Moreover, two classification methods are examined, one-class SVM learners, used as a baseline, and an ensemble of classifiers based on random feature subspacing, originally proposed for author identification. It is demonstrated that very high precision can be achieved in open-set AGI while recall remains relatively high.