Scuppering the Program Pirates

program-codeProfessors the world over are worried about plagiarism: students simply lifting huge chunks from web pages and passing the thoughts and arguments off as their own. Then there are the Professors who steal from each other and publish their work in supposedly novel research papers and books and present it at conferences as original. This kind of plagiarism seems to be on the increase. No one knows the true extent to which it is being undertaken, but a few high-profile cases have increased awareness in the academic community of the paper pirates who could scupper your research career plans with a few well-stolen words.

It could be that a whole generation of students and unscrupulous Professors are creating an information black market. In the long-term, it is the students’ education, the research community, and the future of progress that will suffer. After, all student assessment is based on the assumption that their work is original and similarly the advancement of any particular area of endeavour relies on originality and credit where it is due otherwise the whole system collapses into nothing more than noise.

For instance, in the world of computer science, students programming submissions has an important effect on the whole computing educational procedure. “It is of a great importance to evaluate the programming skills of each student,” explain Ameera Jadalla and Ashraf Elnagar of the Department of Computer Science at the University of Sharjah, in the United Arab Emirates, “but the evaluation results become misleading and unreal due to the problem of plagiarism.”

The researchers point out that since the late 1970s, concerns about source code plagiarism have risen significantly. Various surveys have shown that up to 85 percent of a representative sample of students had engaged in some sort of academic dishonesty and almost 40 percent in one survey confessed to engaging in at least one instance of cut and paste plagiarism using the internet in the preceding year. Companies that offer to do the plagiarism for you, for a fee, are rife. Studies have shown that male students are commonly more dishonest than their female peers in this regard and science students more than health or educational students. Mature students are less likely to engage in such practices.

The researchers suggest that there are a few fundamental non-technical steps that can be taken to reduce plagiarism.

  • Increasing the number of in-class assignments.
  • Doing more group work, makes it harder to cheat if just one student is honest.
  • Explaining from an early age that plagiarism is unethical and that citation is important.
  • Expecting an oral presentation to show understanding.
  • Giving students different specifications for the same assignment
  • Improving coursework in terms of time, pressure and difficulty to preclude the need to plagiarise.
  • Having flexible deadlines if plagiarism is the other option to completion on time.
  • Using honestly policies and punishment systems.
  • Recognising different plagiarism techniques.

It is easy to see why some students might plagiarise the efforts of others: getting a better grade, laziness or poor time management, easy access to the internet and not understanding the rules. Students are encouraged to use the internet, but there is often no emphasis on the importance of citation or acknowledgement.

Indeed, say Jadalla and Elnagar, the focus of society on end results, the “final certificate”, means that students are under immense pressure to perform while the opportunities for cheating have gone far beyond the simple sharing of notes among themselves and the copying out of textbook paragraphs that were well known in the previous generation. However, no amount of top tips for persuading students not to plagiarise will solve the problem.

There are various programs available that a hard-pressed Professor might employ to spot plagiarism in the work of their academic offspring, but this is usually tailored towards essays and papers. Now, Jadalla and Elnagar, have developed PDE4Java, a new Plagiarism Detection Engine based on the platform-independent system Java that can detect plagiarism in computer science code.

Plagiarism in software was defined as “a program which has been produced from another program with a small number of routine transformations.”

PDE4Java uses data mining techniques to spot content that has been copied from other sources in a given set of programs, usually without attribution. The system “tokenises” the suspect program and then uses data mining, akin to a search engine algorithm, to carry out fast similarity searching of the tokenised index. It can then display side-by-side views of similar programming code and so display clusters of code that look suspiciously similar. These clusters allow the instructors or graders to quickly spot programming routines that the students lifted from each other.

The researchers point out that although modern technology makes it easier for students to plagiarise the work of others, programs such as theirs are allowing Professors to catch up with the cheats and plagiarism sinners.

Search Engine Journal has a nice side-by-side comparison of currently available anti-plagiarism systems including Copyscape, DocCop, Plagiarism Detect, Reprint Writer’s Tool, Copyright Spot. Plagiarism Today also has an interesting post on how to find plagiarism.

Ameera Jadalla, Ashraf Elnagar (2008). PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach International Journal of Business Intelligence and Data Mining, 3 (2) DOI: 10.1504/IJBIDM.2008.020514

facebooktwittergoogle_plusredditpinterestmail

5 thoughts on “Scuppering the Program Pirates

  1. Yes, a reward/penalty system might work, but if we could entrench the idea earlier on that its wrong to plagiarise that would be better. I saw that news item just after I posted this post and was going to reference it.

  2. In this age of “nudge” economics, why not simply offer extra credit for the fastest code (rewarding individuals and small collaborative teams), and demerits for code which runs no faster than the average (if you copy and don’t know how many others copied, you’ll most likely lose out)? Or the code which stands up longest to pseudorandom input? Etc….

    Incidentally, a recent survey at Cambridge found 49% of students admitted to plagiarism, with lawyers being the most likely to copy!

  3. @Sundeep – interesting approach

    @Bob – thanks for the input, yes I am sure there are more stringent assessment methods and definitions in the industry than in academia, it’s a shame there isn’t more collaboration between the two camps.

  4. My company, Software Analysis & Forensic Engineering Corp., has created CodeSuite for determining source code correlation, the first step in detecting software plagiarism. Although our papers on the topic have been largely ignored by the academic community, the software is commercially very successful because it is many time more accurate that anything out of academia. You can even download a copy and use it for free on small code sets — something that hundreds of users are doing including many university professors.

    Also if you’re looking for a definition of software plagiarism that is much more rigorous and testable than anything out of academia, check out the papers references on the site. We define source code correlation and the define the steps necessary to determine whether correlation is due to copying or due to one of 5 other reasons.

  5. The solution adopted by my school was to have final exams which required coding a small program by hand on paper (it’s harder than it sounds). Another idea is to mangle their source code and require them to debug it in class. That would require a working knowledge of the programming language and debugging techniques.

Comments are closed.