Scuppering the Program Pirates

program-code Professors the world over are worried about plagiarism: students simply lifting huge chunks from web pages and passing the thoughts and arguments off as their own. Then there are the Professors who steal from each other and publish their work in supposedly novel research papers and books and present it at conferences as original. This kind of plagiarism seems to be on the increase. No one knows the true extent to which it is being undertaken, but a few high-profile cases have increased awareness in the academic community of the paper pirates who could scupper your research career plans with a few well-stolen words.

It could be that a whole generation of students and unscrupulous Professors are creating an information black market. In the long-term, it is the students’ education, the research community, and the future of progress that will suffer. After, all student assessment is based on the assumption that their work is original and similarly the advancement of any particular area of endeavour relies on originality and credit where it is due otherwise the whole system collapses into nothing more than noise.

For instance, in the world of computer science, students programming submissions has an important effect on the whole computing educational procedure. “It is of a great importance to evaluate the programming skills of each student,” explain Ameera Jadalla and Ashraf Elnagar of the Department of Computer Science at the University of Sharjah, in the United Arab Emirates, “but the evaluation results become misleading and unreal due to the problem of plagiarism.”

The researchers point out that since the late 1970s, concerns about source code plagiarism have risen significantly. Various surveys have shown that up to 85 percent of a representative sample of students had engaged in some sort of academic dishonesty and almost 40 percent in one survey confessed to engaging in at least one instance of cut and paste plagiarism using the internet in the preceding year. Companies that offer to do the plagiarism for you, for a fee, are rife. Studies have shown that male students are commonly more dishonest than their female peers in this regard and science students more than health or educational students. Mature students are less likely to engage in such practices.

The researchers suggest that there are a few fundamental non-technical steps that can be taken to reduce plagiarism.

Increasing the number of in-class assignments.
Doing more group work, makes it harder to cheat if just one student is honest.
Explaining from an early age that plagiarism is unethical and that citation is important.
Expecting an oral presentation to show understanding.
Giving students different specifications for the same assignment
Improving coursework in terms of time, pressure and difficulty to preclude the need to plagiarise.
Having flexible deadlines if plagiarism is the other option to completion on time.
Using honestly policies and punishment systems.
Recognising different plagiarism techniques.

It is easy to see why some students might plagiarise the efforts of others: getting a better grade, laziness or poor time management, easy access to the internet and not understanding the rules. Students are encouraged to use the internet, but there is often no emphasis on the importance of citation or acknowledgement.

Indeed, say Jadalla and Elnagar, the focus of society on end results, the “ï¬nal certiï¬cate”, means that students are under immense pressure to perform while the opportunities for cheating have gone far beyond the simple sharing of notes among themselves and the copying out of textbook paragraphs that were well known in the previous generation. However, no amount of top tips for persuading students not to plagiarise will solve the problem.

There are various programs available that a hard-pressed Professor might employ to spot plagiarism in the work of their academic offspring, but this is usually tailored towards essays and papers. Now, Jadalla and Elnagar, have developed PDE4Java, a new Plagiarism Detection Engine based on the platform-independent system Java that can detect plagiarism in computer science code.

Plagiarism in software was defined as “a program which has been produced from another program with a small number of routine transformations.”

PDE4Java uses data mining techniques to spot content that has been copied from other sources in a given set of programs, usually without attribution. The system “tokenises” the suspect program and then uses data mining, akin to a search engine algorithm, to carry out fast similarity searching of the tokenised index. It can then display side-by-side views of similar programming code and so display clusters of code that look suspiciously similar. These clusters allow the instructors or graders to quickly spot programming routines that the students lifted from each other.

The researchers point out that although modern technology makes it easier for students to plagiarise the work of others, programs such as theirs are allowing Professors to catch up with the cheats and plagiarism sinners.

Search Engine Journal has a nice side-by-side comparison of currently available anti-plagiarism systems including Copyscape, DocCop, Plagiarism Detect, Reprint Writer’s Tool, Copyright Spot. Plagiarism Today also has an interesting post on how to find plagiarism.

Ameera Jadalla, Ashraf Elnagar (2008). PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach International Journal of Business Intelligence and Data Mining, 3 (2) DOI: 10.1504/IJBIDM.2008.020514