This is me self plagarising from my chapter in the Official CHFI book. Oh wait, it is not plagarising as I am giving credit (even if to myself).
Copyright Infringement: Plagiarism
The Webster New World Dictionary describes plagiarism as taking ideas of another and passing them as "one's own". This section details the tools and detection factors involved when investigating plagiarism. A common misconception is that plagiarism hurts nobody. The reality is that it is a fraud and thus a criminal offense (see § 1341. Frauds and swindles). Plagiarism takes away from the effort of the author and society suffers as a consequence.
The various plagiarism detection factors
Konstantinos Tripolitis in his 2002 dissertation, “Automated Detection of Plagiarism” (University of Sheffield, UK - note the referncing) addresses the various plagiarism detection factors that commonly occur. He includes the following as possible detection factors:
- Changes of vocabulary: When the vocabulary an author uses varies significantly in the text under consideration, then there is a great possibility that the author has committed plagiarism.
- Incoherent text: Inconsistency in the style of a text, such that parts of the text seem to be written by different people, can imply plagiarism.
- Punctuation: When two texts exhibit extremely similar punctuation, plagiarism can be implied, as it is not possible that two authors could use punctuation in the same way.
- Dependence on certain words and phrases: If particular words or phrases used by a certain author in a custom way are used consistently by another author, then plagiarism is possible, as authors tend to have different word preferences.
- Amount of similarity between texts: Texts written by different people and sharing a great amount of similar text should be checked more thoroughly for plagiarism
This is a fundamental idea for a detection tool and is the one mainly used in the present work. A reliable detection tool should be able to provide a fairly accurate similarity measure in order to be useful.
- Long sequences of common text: Long sequences of consecutive common characters or words found in the texts under test exhibit a fair possibility that plagiarism may have been committed. This is another fundamental idea that is used in the implementation of HERMES. It is also known as the ‘sequence comparison’ approach.
- Order of similarity between texts: If two texts have the same order of matching words or phrases then plagiarism is a possibility.
- Frequency of words: Finally, words used in the same frequency in two texts written by different authors suggest potential plagiarism
Today I read a published article by an ISACA branch president that was significatly copied. It was basically from the Information Systems Control Journal, Volume 5, 2001, “Harnessing IT for Secure, Profitable Use” by Erik Guldentops, CISA. Over 25% of the document was copied directly from this document, and the rest was slightly paraphrased.
I can see occasional misses. I have forgotten this myself at times and left sentances out of quotes. The diffierence is where 99%+ of the quotes are correctly refered against not a sinlge reference.
In the words of an Australian, Mr Hinch;
Shame, Shame Shame...