genetics
An Author’s DNA
Physicists in Sweden have come up with a mathematical fingerprint that could identify an author.
Hartosh Singh Bal
Hartosh Singh Bal
15 Dec, 2009
Physicists in Sweden have come up with a mathematical fingerprint that could identify an author.
Is there a unique fingerprint to the work of an author? Could we find a way of saying which of Shakespeare’s plays were indeed written by him? A paper published in the New Journal of Physics by researchers from Umea University suggests this may be possible based on an analysis of the works of Thomas Hardy, Herman Melville and DH Lawrence.
They begin by considering the vocabulary of any author. However learned an author, his vocabulary is finite. Let M be the number of words in any text. Now consider N the number of different words in any text. This means ignoring repetitions—for example, the word ‘the’ will only be counted once. When M is 1, N is 1 but as the length of a text keeps growing N will stop growing once the author exhausts his vocabulary. When you consider N/M, or the proportion of new words introduced by the author it will start off as 1, but as the text grows, M becomes larger and larger and N stops growing, the ratio nears zero. The authors suggest the rate of decrease from 1 to 0 in an author’s work is unique, i.e. the rate at which an author introduces new words as he writes a manuscript is his fingerprint.
In case of Hardy, Melville and Lawrence, consider the graph plotted between N and M (scaled by some factors, but ignore that). The graph, the physicists suggest, is the unique signature of an author, a universal text of a particular writer’s output because it does not matter if the author writes a short story of 1,000 words or a novel of 80,000 words. If you plot N versus M for that short story or novel it will match the author’s graph determined from his other works. This certainly holds true for these three authors. They suggest it may be universal. Consider then those works of Shakespeare we know as genuine and plot this graph, the graph for any of the other plays must match this graph—if it doesn’t, we would know it is a fake.
About The Author
Hartosh Singh Bal turned from the difficulty of doing mathematics to the ease of writing on politics. Unlike mathematics all this requires is being less wrong than most others who dwell on the subject.
More Columns
The Heart Has No Shape the Hands Can’t Take Sharanya Manivannan
Beware the Digital Arrest Madhavankutty Pillai
The Music of Our Lives Kaveree Bamzai