A new idea for detecting Plagiarism

The cases of plagiarism related cheating is increasing day by day on Codechef. Some smart cheaters not just copy paste each other’s code but they make some minor changes in the copied code (like changing the variable name, adding/removing unnecessary comment, adding unused variable, adding extra tab or space, etc.) and try to show that their code is not exactly same and so they have not cheated. I think if the structure of code, sequence of function calls in the codes of two people are same, then there is a strong chance that they have cheated.

After thinking for a while I got an idea to detect this type of plagiarism or cheating (currently I have idea only for C & C++ languages).

To check if two (C/C++) programs are similar go through these steps:

step 1: convert the users code into assembly code by compiling the input codes by compiling with the command:

gcc -S filename.c filename.s (for C)
g++ -S filename.cpp filename.s (for C++)

step 2: compare the assembly code of the two users to be compared, generated in step 1. If the assembly codes are same, the users code are similar otherwise they are not.

This technique marks two (structurally similar) codes similar even if the codes differ with respect to:

  1. variable names
  2. comments
  3. blank spaces, newlines & tabs
  4. unused variables

It is obviously much faster than manual checking for plagiarism. So I would like to suggest the codechef team to use this technique for plagiarism detection.

3 Likes

@utsav_deep: I think this is a nice idea. But this wont work always. You can convert the C/C++ code over here. I have made a slight change by just assigning a value to the unused variable. The whole assembly code changes. We can check the difference here . I have even found out many more loop-holes. I will tell you later and I don’t want to make the loop-holes public :smiley:

Take a look at this post. From the links given there we can find people have NOT just taken help or asked any doubts from someone but just Ctrl C + Ctrl V. We can even see profiles who were getting all the answers AC at the first go itself :smiley: Any way this one is a good idea and let us hope that this post will provide some lights to the Geeks out there including you,so that they can bring up something better. :slight_smile: I really appreciate this idea.

ALL the Best

5 Likes

@bipin2: Code converter http://assembly.ynh.io is not working anymore.

I just saw this discussion. Just for your knowledge, there exist well-performing software for measuring the similarity of source codes in most languages. Moss is the one I have some experience with https://theory.stanford.edu/~aiken/moss/ and I know there are online judges using it.

@utsav_deep as @pkacprzak rightly pointed that MOSS is already smart enough to detect the changes that you specified.

I suggest you read a published paper on MOSS to know more about how MOSS works.

And for the record, Codechef uses MOSS for it’s plagiarism detection.

It also depends on the threshold percentage(percentage below which the code is said to be plagiarized) which is set for each problem. Suppose a problem has an approach using Sieve of Eratosthenes, you can expect a similarity among all the codes, thus a threshold for each problem needs to be set depending upon the common approach that will be used.

Please read the end of this page.

1 Like