Analysing Austen’s novels computationally in the late 1980s, John Burrows discovered strong patterns in her prose and also in the dialogue of her major characters. But it was not unusual or complex vocabulary that distinguished, say, Emma’s language from Mr Darcy’s, but the most mundane of words, such as “we” and “the”.
In fact, the thirty most common words in any text, and the frequency with which they are used, Burrows established, are rich in stylistic information. Simple articles, prepositions and conjunctions, long disregarded by researchers as devoid of significance, hold the key to all manner of literary enigmas.
A distinguished literary scholar, Burrows is internationally recognised as the creator of “stylometry” – the use of data analysis methods to quantitatively interpret written works, also known as computational stylistics or literary computing. He also invented, in 2002, a new statistical procedure, called Delta, for perceiving and construing patterns in common words, which remains the most widely employed methodology in the field.
Authorship attribution – determining who penned anonymously published books such as the 1996 Primary Colors about a US presidential campaign, or whether some parts of the plays we usually regard as Shakespeare’s are actually by another writer – is one of stylometry’s most popular uses. It has been applied not only to literature, but also history and philosophy, as well as forensic linguistics (the analysis of language in legal settings) and corpus linguistics (analysis of a database of language).
Stylometry – a convergence of literary studies, linguistics, statistics and computer science –is based on the observation that every author has a relatively consistent, idiosyncratic style, habitually using language in mostly unconscious ways that result in discernible similarities between their writings. And the thirty most frequently used common words, such as “and” and “you” – which typically represent one-third of a given text – are the most reliable markers of stylistic difference.
From Jane Austen to the Beatles
Defying decades of conventional wisdom, Burrows pursued a wholly original line of enquiry that exposed the hidden potential of such words and gave them weight in literary studies for the first time. The revelation prompted an outburst of interest in computational approaches, opening the way for countless studies of literary style, authorship, translation, dating and genre classification. And stylometry is not confined to English; it has been successfully used to analyse works in languages ranging from Classical Greek to Mandarin.
As well as Austen, Burrows applied the methodology to the writings of Henry James, E M Forster and Virginia Woolf. Other scholars have used it to explore questions such as differences in style between female and male authors, the innovativeness of certain authors compared with their peers – and even how the lyrics of Paul McCartney and John Lennon became “less pleasant, less active, and less cheerful” over time.