Octopus: University of The Aegean Research & Project Outputs System

Περιοδικό

Συγγραφείς:	Sidorov G., Velasquez F., Stamatatos E., Gelbukh A., Chanona-Hernández L.
Τίτλος:	Syntactic N-grams as Machine Learning Features for Natural Language Processing
Περιοδικό:	Expert Systems with Applications
Volume:
Αριθμός:
Σελίδες:
Έτος:	2013
Εκδότης:
Να εμφανιστεί:	Ναι
Δεσμός:	http://dx.doi.org/10.1016/j.eswa.2013.08.015
ISI:	Όχι
Impact Factor:
Όνομα αρχείου:
Περίληψη:	In this paper we introduce and discuss a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner how we construct them, i.e., what elements are considered neighbors. In case of sngrams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking words as they appear in a text, i.e., sn-grams are constructed by following paths in syntactic trees. In this manner, sn-grams allow bringing syntactic knowledge into machine learning methods; still, previous parsing is necessary for their construction. Sn-grams can be applied in any natural language processing (NLP) task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. We used as baseline traditional n-grams of words, part of speech (POS) tags and characters; three classifiers were applied: support vector machines (SVM), naive Bayes (NB), and tree classifier J48. Sn-grams give better results with SVM classifier.