Code Similarity is a premium feature that will compare the structure of code content to other solutions in the system. With this feature, we are constantly comparing every solution in the system to determine the similarity between newly submitted solutions and older solution. This comparison reports back a similarity score, as well as a complexity score.
What data is tracked?
Let's quickly go over the data that is made available by this feature.
The similarity score is used to find solutions that are similar in approach, and also for identifying possibly plagiarized solutions. Our implementation is designed to scale to millions of compared solutions. It is not fooled by formatting, naming differences, or by order of operations.
A score of 100 indicates that the code is exactly the same (even though the characters within the different sets of code may actually be different). We consider solutions to be similar to others with any score above 70, similarity scores below that value are not actively tracked.
The complexity score is used to determine how complex a candidate's solution is compared to others for the same challenge. This score takes into account things like operators, functions, regular expressions, etc and calculates a form of complexity that we believe to be more useful than simply using lines of code (LoC).
While there are other use cases for code similarity, such as simply surfacing similar but not exact solutions when comparing candidate's code to others, the main application of this feature is to detect possible plagiarized solutions. There is no way for Qualified to automatically say for sure that a solution is plagiarized, since the system only determines that code is similar, it doesn't know why the code is similar. However our system will take certain factors into account, such as complexity, to determine a range in which code being similar starts to be considered a risk of being plagiarized. Our system will identity solutions that may be at risk as either "low", "medium" or "high" risk. From there your team can decide if you believe the solutions to in fact be plagiarized.
If a candidate is identified as having any level of plagiarism risk, Qualified will set the candidate into the "Plagiarism Risk" state, which is easily viewable from the candidates sidebar. Once you review the candidate, they will be moved to the reviewed state. If you do not actively review candidate solutions as part of your workflow, you may still want to manually review any candidates who are in this state, to ensure that their score should not be discarded.
Where are similar solutions sourced from?
When comparing a candidate's solution, Qualified uses a few solution pools as its source for finding possible matches.
- Your own candidate solutions: Any previous solutions submitted by other candidates on your team.
- Solutions from other teams: If you are using a challenge taken from our shared library, then we also pull in solutions from all candidates who took that challenge, regardless if they are on your team or not. The solutions are anonymized, so you won't actually know where they came from.
Qualified does not source solutions from the web, as this isn't really necessary and is highly prone to false positives and false negatives, due to the unstructured nature of online sources. Once a candidate submits a solution found online, our system will know about it and all other candidates for that challenge will be compared to it. If you are using one of Qualified's library challenges, then the chances are any solution that may have been shared online is already available within our system to be compared to, and the system will flag that candidate.
Comparing Similar Solutions
When combined with our compare solutions feature, you can easily view similar code challenge solutions side-by-side within our solution details screen. This allows you to review other similar solutions, not just for cheat detection purposes, but also for getting a sense for what other types of closely similar solutions are being submitted in general.
To compare solutions, open up your solution details screen and activate the "Compare" tab. If you have Code Similarity enabled on your subscription, "Similar" will be the default active set of solutions you are comparing against, if there are similar solutions to be compared.