Octopus: University of The Aegean Research & Project Outputs System

Conference

Authors:	Stamatatos E.
Title:	Plagiarism Detection Based on Structural Information
Conference:	20th ACM Conference on Information and Knowledge Management (CIKM-11)
Editors:
Ed:	No
Eds:	No
Pages:	1221-1230
To appear:	No
Month:
Year:	2011
Place:
Pubisher:	ACM
Link:
File name:
Abstract:	In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses mainly content terms to represent documents, the proposed method is based on structural information provided by occurrences of a small list of stopwords (i.e., very frequent words). We show that stopword n-grams are able to capture local syntactic similarities between suspicious and original documents. Moreover, an algorithm for detecting the exact boundaries of plagiarized and source passages is proposed. Experimental results on a publicly-available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified by replacing most of the words or phrases with synonyms to hide the similarity with the source documents.