Journal

Authors: Frantzeskou G., MacDonell S., Stamatatos E., Gritzalis S.
Title: Examining the Significance of high-level programming features in Source-code Author Classification
Journal: Journal of Systems and Software
Volume: 81
Number: 3
Pages: 447-460
Year: 2008
Publisher: Elsevier
To appear: No
Link: http://www.sciencedirect.com/science/article/pii/S0164121207000829/pdfft?md5=90040a5360af7c35c3b8fd0a92d404fb&pid=1-s2.0-S0164121207000829-main.pdf
ISI: Yes
Impact Factor: 1.241
File name:
Abstract: The use of Source Code Author Profiles (SCAP) represents a new, highly accurate approach to source code authorship identification that is, unlike previous methods, language independent. While accuracy is clearly a crucial requirement of any author identification method, in cases of litigation regarding authorship, plagiarism, and so on, there is also a need to know why it is claimed that a piece of code is written by a particular author. What is it about that piece of code that suggests a particular author? What features in the code make one author more likely than another? In this study, we describe a means of identifying the high-level features that contribute to source code authorship identification using as a tool the SCAP method. A variety of features are considered for Java and Common Lisp and the importance of each feature in determining authorship is measured through a sequence of experiments in which we remove one feature at a time. The results show that, for these programs, comments, layout features and package-related naming influence classification accuracy whereas user-defined naming, an obvious programmer related feature, does not appear to influence accuracy. A comparison is also made between the relative feature contributions in programs written in the two languages.