Each literary character in a book - at least if the author is
good - should speak somewhat differently from all the other characters. In his
pioneering study on Jane Austen, Prof. John Burrows has shown (1987) that this
difference relies on the occurence of "non-meaningful" yet very
frequent words like "I," "can," or "or" as well as
on that of "keywords" like "love," "brother," or
"body." Burrows has also shown that similar characters different book
by Austen (Elizabeth and Elinor, for instance) may speak in a similar way.
I have tried to apply the same to the Polish classic of all
time, Henryk Sienkiewicz's Trilogy. A series of novels that share some of their
characters is an especially interesting field for a statistical comparison of
this kind. And then I went a step further: to compare the speech of
corresponding characters not only in the three parts of the series in Polish but
also in those in the Trilogy's two English translations. This was done to
investigate if differences in the characters' "idiolects" travel
accross languages.
The computer is used first to find the most frequent words of
each text and then to create relative frequency matrices. These are then
processed in a statistical package using a procedure called
"multidimensional scaling" to produce two-dimensional diagrams, or
"maps" showing the relative "distances" between the
individual languages of major characters.
Still interested?
|