Sei sulla pagina 1di 26

Collaborative authorship in the

twelfth century: A stylometric study


of Hildegard of Bingen and Guibert
of Gembloux
............................................................................................................................................................
Mike Kestemont
Institute for the Study of Literature in the Low Countries & CLiPS
Computational Linguistics Group, University of Antwerp, Belgium
Sara Moens and Jeroen Deploige
History Department, Ghent University, Belgium
.......................................................................................................................................
Abstract
Hildegard of Bingen (10981179) is one of the most influential female authors of
the Middle Ages. From the point of view of computational stylistics, the oeuvre
attributed to Hildegard is fascinating. Hildegard dictated her texts to secretaries
in Latin, a language of which she did not master all grammatical subtleties. She
therefore allowed her scribes to correct her spelling and grammar. Especially
Hildegards last collaborator, Guibert of Gembloux, seems to have considerably
reworked her works during his secretaryship. Whereas her other scribes were only
allowed to make superficial linguistic changes, Hildegard would have permitted
Guibert to render her language stylistically more elegant. In this article, we focus
on two shorter texts: the Visio ad Guibertum missa and Visio de Sancto Martino,
both of which Hildegard allegedly authored during Guiberts secretaryship. We
Correspondence:
analyze a corpus containing the letter collections of Hildegard, Guibert, and
Mike Kestemont, Institute Bernard of Clairvaux using a number of common stylometric techniques. We
for the Study of Literature in discuss our results in the light of the Synergy Hypothesis, suggesting that texts
the Low Countries & CLiPS resulting from collaboration can display a style markedly different from that of
Computational Linguistics the collaborating authors. Finally, we demonstrate that Guibert must have re-
Group, University of
Antwerp, Belgium.
worked the disputed visionary texts allegedly authored by Hildegard to such an
Email: extent that style-oriented computational procedures attribute the texts to
mike.kestemont@gmail.com Guibert.
.................................................................................................................................................................................

1 Introduction the rise, since the late 1980s, of Material Philology


(Nichols, 1997). Medievalists have become increas-
Since the end of the 1960s, literary studies have seen ingly aware of the importance of manuscript culture
a clear shift of focus from the analysis of authorial in their understanding of texts: medieval texts
intentions to reader-oriented criticism. The repudi- should not primarily be studied, it is argued, as ab-
ation of the modern idea of autonomous authorship stract entities resulting from authorial ambitions,
has perhaps gone furthest in medieval studies, with but rather as tangible objects, materialized in

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015. The Author 2013. Published by Oxford University Press on 199
behalf of ALLC. All rights reserved. For Permissions, please email: journals.permissions@oup.com
doi:10.1093/llc/fqt063 Advance Access published on 26 October 2013
M. Kestemont et al.

specific manuscript contexts. Every material mani- down their visionary experiences, even if these
festation of a text is unique, because the acts of were divinely inspired. These women therefore
copying and compiling nearly always resulted in had to be assisted by male collaborators, often also
textual changesfrom minor changes in orthog- serving as their spiritual directors. The precise
raphy to complete rewritings. Our modern post-ro- nature and implications of such cross-gender collab-
mantic conception of authorship therefore seems orations remain a topic of scholarly debate.
profoundly anachronistic with respect to the The immediate incentive for the present article is
Middle Ages (Cerquiglini, 1999, p. 810). Yet, the preparation of a new critical edition of two
even if medieval culture did not share our present- lesser known texts attributed to Hildegard of
day view on the significance of original authorship, Bingen, supposedly dating from the last years of
the Middle Ages have known many respected and her life: the Visio de Sancto Martino, which is con-
authoritative individuals who were recognized by ceived as a letter addressed to the worshippers of
their contemporaries and posterior readers as pro- Saint Martin, and the Visio ad Guibertum missa,
ducers of very specific literary works. Some kind of containing spiritual advice to an anonymous
correlation even existed between the degree to monk-priest, generally identified as her last secre-
which texts were susceptible to alterations and the tary, Guibert of Gembloux (11241213) (Deploige
religious and intellectual authority of their authors and Moens, forthcoming). Among the few scholars
(Deploige, 2005). who paid attention to these texts, there is still no
This did not mean, however, that such recog- consensus as to the extent to which they should be
nized authors were necessarily acting individually attributed to either Hildegard herself or to her col-
in the process of conceiving their treatises or narra- laborator Guibert. As neither traditional stylistic
tivesquite the contrary. Writing in the Middle analysis nor contextual historical research has so
Ages meant entering into a dialogue with a long far been able to resolve the problem, we will ap-
line of predecessors, whether through citations, proach this issue through a stylometric analysis.
paraphrasing, or allusions. In the actual process of We will focus on three research questions.
literary composition too, medieval authors only First, does stylometry allow for an authorial dif-
seldom worked alone. A new text could be the ferentiation between the writings of twelfth-century
result of drafts on wax tablets copied by professional Latin authors, belonging to highly similar intel-
scribes, of processes of dictation and subsequent lectual circles? To answer this question, we will
correction, etc. A twelfth-century authority like the investigate the letter collections or epistolaria of
Cistercian abbot Bernard of Clairvaux (10901153), Hildegard of Bingen, her secretary Guibert of
one of the most prolific and influential medieval Gembloux, and their famous contemporary,
authors, is known to have been surrounded by a Bernard of Clairvaux. Our aim is to assess to what
team of secretaries. For his sermons and letters in extent we can distinguish stylistic profiles for these
particular, he was assisted by a number of collabor- authors, despite the marked variance within medi-
ators to whom he could dictate his messages or who eval manuscript culture (Cerquiglini, 1999), as well
were asked to produce texts in accordance with his as the fact that these authors, like many of their
own views. Some of his collaborators were even contemporaries, were often assisted by secretaries.
trained in imitating his writing style, thus facilitat- Next, we wish to analyze in more detail to what
ing Bernards work of final editing or correcting extent we can discern in Hildegards epistolary
(Leclercq, 1962; 1987, pp. 14752). In the case of work, the influence of her last secretary, Guibert
the remarkably few medieval female authors known of Gembloux. Did her style undergo detectable styl-
to us, the role of secretaries and collaborators is even istic changes under the editorial assistance of
more intricate. Women writers like the German Guibert, or does the same homogeneous authorial
nuns Hildegard of Bingen (10981179) or voice appear throughout her epistolary work?
Elizabeth of Schonau (11291165) were considered Finally, we will assess the complex question to
unlearned and incapable of independently writing which author we should attribute, at least on

200 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

stylistic grounds, the visiones at stake in this article. have had the same type of schooling as young
In answering these research questions, we do not monks, meant that throughout her life Hildegard
aim to develop novel stylometric techniques. The had to be assisted by secretaries (Ferrante, 1998).
originality of this research is to be found in our Her first and principal secretary was Volmar of
application of a number of well-established tech- Disibodenberg, who remained her close associate
niques to assess their feasibility when dealing with until his death in 1173. He assisted in the redaction
medieval Latin texts, a textual tradition that until of the majority of her works. As we can learn from a
now has only rarely received attention in computa- famous miniature in the now lost manuscript
tional authorship attribution. Before addressing (henceforth MS) Wiesbaden, Landesbibliothek, 1,
these issues, we will first briefly introduce the state dating from the end of her life, Hildegard dictated
of research with respect to the so-called Mittarbeiter and wrote drafts on wax tablets, which were subse-
problem in the Hildegard scholarship. quently copied on parchment and linguistically pol-
ished in accordance with the rules of grammar
(Fig. 1). In addition, several Rupertsberg nuns
2 Uneducated in the Art of must have aided their abbess as scribes during this
period, given the number of known manuscripts
Grammar produced in Rupertsberg under Hildegards super-
The Benedictine nun Hildegard of Bingen was one vision (Embach, 2003, p. 76, 1289, 160, 1845;
of the most productive female authors of the Middle Herwegen, 1904, p. 3028). After Volmars death,
Ages (Newman, 1998). After a youth as anchoress at Hildegard had to complete her last major visionary
the abbey of the monks of Disibodenberg in the cycle, the Liber divinorum operum (Book of the
Rhineland near Mainz, she ended up as abbess of Divine Works), with more occasional assistance
her own convent at the nearby Rupertsberg. Her by a number of different collaborators from her im-
extensive oeuvre includes genres as diverse as vi- mediate circle of spiritual acquaintances (Herwegen,
sionary books, letters, hagiographical texts, treatises 1904, p. 30815). At the very end of her life, how-
on monastic life, musical compositions, and some ever, she was unexpectedly joined by Guibert, a
works on physics and medical healing. Considered a monk from the abbey of Gembloux in Brabant
true prophetess, receiving revelations and admon- (nowadays Belgium). Himself a fervent letter
itions from God, she enjoyed a special status, even writer and hagiographer (Moens, 2010), he served
in the highest ecclesiastical milieux. Her extensive as her secretary from 1177 until her death in 1179
circle of correspondents, comprising, among others, (Delehaye, 1889; Ferrante, 1998, p. 12230).
popes and the emperor, testifies to her prophetic While even the authenticity of her female author-
reputation. She was therefore able to gain an au- ship had not always gone uncontested, until the sem-
thority unprecedented for a woman, enabling her inal work by Schrader and Furhrkotter (1956), a lot
to even criticize the male clergy of her time. of scholarly efforts have been concerned with the
Among the first to approve her visionary gift was precise role of Hildegards secretaries. Just as for
Bernard of Clairvaux, in a letter answering her re- other female writers working under the direction of
quest for support. Her female authorship was built father confessors (Coakley, 2006), the question has
on her recognition as a mouthpiece of God, which been raised to what extent Hildegards secretaries
caused her to present herself during her entire life as interfered with the final versions of her works, pos-
a poor and uneducated womanuneducated pre- sibly generating male, clerical interpretations rather
cisely because she was a woman (Deploige, 1998). In than original female viewpoints. Following the pion-
one of her vitae, her biographer Guibert of eering research by Herwegen (1904), most specialists
Gembloux specifies that she was uneducated as to now agree that the role of Hildegards collaborators
her schooling in the art of grammar (Derolez, was restricted to minor grammatical and stylistic al-
19881989, p. 377). Her status, both as a woman terations. Generally speaking, they had to copy her
and an allegedly unlearned prophetess who may not words verbatim unless they received Hildegards

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 201


M. Kestemont et al.

Fig. 1 MS Wiesbaden, Landesbibliothek, 1, fol. 1r. (lost since 1945). Photo: Rheinisches Bildarchiv Koln 13321

explicit authorization for corrections (Schrader and Guibert, who only entered into her life when she
Fuhrkotter, 1956, p. 1823; Ferrante, 1998, p. 104). was already at the very advanced age of 79. Although
It is generally assumed, however, that Hildegard their involvement was short, Guibert nevertheless
must have granted a somewhat greater liberty to had a significant impact on Hildegards literary

202 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

legacy. For example, he may have assisted her as one which Hildegard grants Guibert the exceptional
of the correctors in the final redaction of the Liber right to revise her texts more fundamentally than
divinorum operum, of which MS Ghent, University simply at the level of style and grammar:
Library, 241 (Fig. 9), can be considered the auto-
When you correct [the Visio de sancto
graph copy most true to Hildegards own words
Martino] and the other works, in the emend-
(Derolez and Dronke, 1996, pp. XCIXCIV). He also
ing of which your love kindly supports my
aided her in both the writing and compilation of
deficiency, you should keep to this rule: that
portions of her epistolarium. On the basis of manu-
adding, subtracting, and changing nothing,
script evidence, content, and dating, we can distin-
you apply your skill only to make corrections
guish in Hildegards letter collection a part that must
where the order or the rules of correct Latin
have been written and compiled with the help of
are violated. Or if you preferand this is
Volmar and another group of letters that must
something I have conceded in this letter
have been written or transmitted under Guiberts
supervision.1 Last but not least, Guibert is also beyond my normal practiceyou need not
thought to have directed the compilation of the so- hesitate to clothe the whole sequence of the
called Riesenkodex (MS Wiesbaden, Landesbi- vision in a more becoming garment of speech,
bliothek, 2), the manuscript in which, by the end of preserving the true sense in every part. For
her life, Hildegard had collected all the authorized even as foods nourishing in themselves do
versions of her works (Van Acker, 1989, pp. 12934). not appeal to the appetite unless they are sea-
soned somehow, so writings, although full of
salutary advice, displease ears accustomed to
an urbane style if they are not recommended
3 Two Suspect Visions by some color of eloquence (translated by
Newman, 1987, p. 23)
The Visio de sancto Martino (Vision of Saint
Martin) and Visio ad Guibertum missa (Vision With this statement, Hildegard allegedly granted
sent to Guibert), which are at stake in this article, Guibert editorial privileges that she had not allowed
cannot be found in the Riesenkodex. They are only any other previous collaborator. The passage also
preserved in three manuscripts that can be linked to prompted scholars to have a closer look at the
the abbey of Gembloux and Guiberts own oeuvre.2 authorship, style, and content of these visionary
Therefore, both texts are traditionally not included texts. Already in his 1882 edition, Pitra voiced
in the core of Hildegards canon (Schrader and doubts with respect to Hildegards alleged author-
Fuhrkotter, 1956, p. 182; Embach, 2003, p. 469). ship. He stated that Guibert, if not their original
Whereas the titles in the manuscripts (Fig. 2), as author altogether, must at least have reworked the
well as Guiberts accompanying letters, firmly attri- texts profoundly. Pitra based his verdict on a
bute these visiones to Hildegard, there are good rea- number of syntactical features, on metaphors
sons to suspect that Guibert must have been which he considered typical of Guibert, and on
extensively involved in their final redaction. The the extensive insertion of Biblical quotations
figure of Saint Martin for instancethe main (Pitra, 1882, p. 3701, 375). Herwegen remained
topic of the Visio de sancto Martinois entirely more cautious: although he accepted that Guibert
absent from Hildegards oeuvre. Guibert, on the had refined the texts stylistically, he still discerned
other hand, developed a lifelong fascination for Hildegards authorial voice shimmering through
this saint and devoted nearly half of his life to Guiberts multiple corrections. He recognized
spreading his cult. The Visio ad Guibertum missa Hildegards genius in the overall structure of the
discusses the role of the priest as well as the topic visions and in some typically Hildegardian vocabu-
of literary collaboration, both issues of direct rele- lary. He also rejected Pitras assertion that the nu-
vance to Guibert. Moreover, the end of the latter merous Biblical quotations could only have been
text contains a passage of particular interest in inserted by Guibert (Herwegen, 1904, p. 3946).

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 203


M. Kestemont et al.

Fig. 2 MS Brussels, Royal Library, 55275534, fol. 141v. Epistula domine Hildegardis magistre cenobii sancti Roberti
Pinguensis de excellentia beati Martini episcopi Letter of lady Hildegard, magistra of the monastery of saint Rupert in
Bingen, on the excellence of the blessed bishop Martin

Newman recently stated that the Visio ad Guibertum of Clairvaux. We obtained these texts in raw format,
missa was written by Guibert in Hildegards persona corresponding to the way they are included in the
(Newman, 1987, p. 24), although Van Acker (1989, Brepols electronic Library of Latin Texts, on the
p. 130) and Coakley (2006, p. 61) continued to con- basis of modern critical editions.3 Fortunately,
sider Hildegard as the texts author and Guibert as a these editions are all based on manuscripts that
mere stylistic reviser. were compiled under the supervision of the original
These assertions concerning the authorship of authors or at least in their close vicinity, so that we
the visiones seem to have been predominantly do not have to worry about major scribal interven-
based on subjective appreciations of style and con- tions. The fact that all three authors in our corpus
tent and the arguments used in this debate remain, have been productive letter writers rendered their
at best, intuitive. The appearance of a new critical epistolaria an attractive point of departure. More-
edition of the visiones once more put the question of over, the two short visionary texts of dubious origin
their authorship at the forefront: should the texts be that are at issue in this article are mostly comparable
regarded as Hildegardian or pseudo-Hildegardian? with Hildegards letters with respect to length,
Stylometric methods may provide a more objective topics, and manuscript tradition. Obviously, we re-
basis for disentangling the issue and to re-assess the stricted our authors letter collections to the letters
nature of Guiberts secretaryship. they wrote themselves, leaving aside the letters that
were merely addressed to them and that were usu-
ally contained in the same manuscripts (Constable,
4 Corpus Preparation 1976). For Bernard, this resulted in a sub-corpus of
166,063 words and for Guibert of 124,580 words.4
For the present study, Brepols Publishers generously Hildegards letter collection contained 109,633
provided a digital corpus containing the nearly words, 82,154 of which are contained in the part
complete works of Hildegard, Guibert, and Bernard compiled with the help of her first secretary

204 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

Volmar, while the remaining 27,479 words consti- Table 1 Interchangeable medieval Latin character com-
tute the letters that, as discussed earlier, have most binations allowed in our permutation algorithm
probably been edited in some way by Guibert.5
ci vs. ti
Medieval Latin is characterized by unstable or- ch vs. h
thography. As even a single scribe often used differ- ph vs. f
ent spellings for the same word, modern editors h vs.
already tend to silently normalize minor ortho- w vs. uu vs. vv vs. uv vs. vu
i vs. j vs. y
graphic variants. We have normalized the orthog- k vs. c vs. ch
raphy in our corpus even further via lemmatization, g vs. gu
a useful procedure in stylometry for medieval texts
(Kestemont et al., 2010). The texts were first toke-
nized using the Natural Language Toolkit (Bird array with all possible variations for the consecutive
et al., 2009). The coordinating conjunction que character groups. Next, we combined these options
(and) was not realized as a separate word in medi- through the Cartesian product in the matrix by
eval Latin, but it was appended to the preceding means of a permutation algorithm (Kestemont
word (e.g. terra aquaque, land and water). To auto- et al., 2010). Table 1 lists the series of common alter-
matically isolate the clitic, we have stripped the native character combinations we have considered,
suffix (xque) from every word that did not occur loosely based on Riggs (1996).7 An example matrix
in a list of words proposed by Schinke et al. (1996, for a word like chirographum would be: {[c], [h j ],
p. 1801).6 We have also split up the medieval con- [i j y], [r], [o], [g], [r], [a], [ph j f], [u], [m]}. All
traction of the reflexive pronoun se and the idiom- unique, alternative word spellings that can be gener-
atic reinforcement ipsum in seipsum (or teipsum, ated on the basis of the matrix are: chirographum, ciro-
teipsam, etc.). graphum, chyrographum, cyrographum, chirografum,
A number of specific character combinations were cirografum, chyrografum, and cyrografum.
freely interchangeable in medieval Latin, such as ph Finally, we automatically annotated the tokens
for f, v for u, oe or ae for e (or for e , the so-called e with lemmas using the medieval Index Thomisticus
caudata) (Rigg, 1996). We have therefore lifted the Treebank (IT-TB: Passarotti and DellOrletta, 2010)
difference between v and u, as well as between ae, oe, as training material (ca. 170,000 tokens; ca. 9,000
and e, by substituting all vs for us and all aes and oes sentences).8 For the lemmatization of our corpus
for es. For the substitution of ae and oe by e, this we have used Morfette (Chrupala et al., 2008).
actually meant that we were sometimes forced to Unlike other popular lemmatization tools, such as
erase the distinction between grammatically import- TreeTagger (Schmid, 1994), Morfette also lemma-
ant morphemes (e.g. between the male vocative sin- tizes input tokens that the tagger did not already
gular domine and the female nominative plural encounter verbatim in the training data. Morfette
dominae). Yet, this was unavoidable, as a good deal considers pairs of input tokens and lemmas in the
of the aes and oes in our corpus were already con- training material. From these pairs it learns shortest
tracted to es, making it nearly impossible to automat- edit scripts or ways to transform tokens into their
ically normalize them the other way round. lemmas using character insertions, deletions, and re-
Subsequently, we checked whether the surface placements. An annotated sample from the Visio ad
tokens in our corpus were present in a large and Guibertum missa is listed as an example (Table 2),
representative word list from the Perseus Project illustrating how this procedure did not manage to
(Tufts University). When a token was not, we used identify all lemmas correctly. Especially content
a permutation algorithm to generate plausible spel- words that are not typical of Thomas Aquinass
ling variants for it. If one of these newly generated scholastic vocabulary were not always recognized.
forms was contained in the word list, the original For the function words used in our analyses (see
form was replaced by its newly generated counter- below), this problem was fortunately hardly an
part. To generate these variants, we constructed an issue.

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 205


M. Kestemont et al.

Table 2 Example of lemmatization based on Morfette corpus, listed in Table 3, we came across multiple
topic-specific nouns like deus, dominus, sanc-
Original Lemma Translation
tus, . . . and verbs like facio, uideo, uiuo, . . . The
in in in
inclusion of such lemmas obviously reflects the cor-
uisionem uisio vision
anime anima soul puss fairly specific, religious semantics. It is also
mee meus my related, however, to the simple fact that a highly
, / / inflected language like Latin with its many declen-
uidi uideo I see sions makes less use of function words than weakly
ingentem ingentem not recognized [ingens gigantic]
rutilantis rutilo glow inflected languages like English. A third explanatory
ignis ignis fire factor might be the fact that we worked with the
nubem nubem not recognized [nubes cloud] frequencies of lemmas instead of surface forms. It
Translation: In a vision of my soul, I saw a gigantic cloud of thus seemed advisable to remove these content
glowing fire. words from our data tables.
The content-rich words we chose to remove are
marked by a hashtag (#) in Table 3.9 The words
5 Feature Selection followed by an asterisk (*) in the same Table 3 are
non-reflexive personal pronouns, which are also
Todays stylometry has become an umbrella term often culled in stylometry to avoid the intrusion
for a still growing number of techniques for author- of genre-related or topic-specific features.
ship analysis. Each of these has been the subject of Naturally, a collection of letters will contain more
both criticism and praise, making it hard to discern instances of the second-person pronouns tu/vos
a consensus on best practice in this field. For this (you) or tuus/vester (your) than a saints life. In
research too, we had to balance the pros and cons of our analyses, we have deleted this kind of pronoun.
a number of tried and tested methodologies. Recent Just as in Table 2, one can still distinguish a certain
studies still tend to agree on the undeniable meth- number of wrongly lemmatized tokens in Table 3.
odological advantages of using function words in The surface form sui, for example, often seems to
authorship attribution (Binongo, 2003, p. 11). An have remained unchanged, whereas it should have
authors use of function words is said, for instance, been transformed into suus. This particular error,
to be relatively unaffected by a texts topic or genre. however, is neutralized by our elimination of non-
(Dis-)similarities between texts regarding function reflexive personal pronouns.10 In sum, our culling
words are therefore to a certain extent content-in- of the lemmas in Table 3 resulted in 65 function
dependent and can be more easily associated with words with which to form the basis for the actual
authorship than e.g. content words or other topic- analyses.
specific stylistics (Juola, 2006, p. 2645). Numerous It should be noted, however, that character
empirical studies have effectively demonstrated that n-grams might have been an attractive additional
analyses of the high-frequency strata of function feature type for our research, as these have often
words yield reliable indications about a texts been shown to be excellent features in authorship
authorship (Koppel et al., 2009, p. 1112; attribution (Koppel et al., 2009, p. 1213;
Stamatatos, 2009, p. 5401). In this research, we Stamatatos, 2009, p. 5412). This method, which
have therefore restricted our analyses to function does not require any kind of normalization or
words, using a number of approved methods lemmatization, segments texts into consecutive, par-
many of them implemented in the publicly available tially overlapping groups of n charactersthe word
script suite Stylometry with R (Eder et al., 2013). bigram for instance contains the bigrams _b, bi,
Preliminary analyses showed that the upper tail ig, gr, ra, am, m_. Contrary to a word-level
of the frequency spectrum in our corpus still con- approach, character n-grams are also sensitive to
tained a good deal of content-rich lemmas. Among stylistic information below the word level, like case
the ca. 200 most frequent lemmas in our entire endings or other grammatical morphemes that are

206 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

Table 3 Most frequent lemmas in the corpus (# content words; * non-reflexive pronouns)
et e quoniam #caritas #consilium contra
qui uel #uerbum #uenio #rex #pono
in #possum aut quasi dum #amicus
#sum pro idem scilicet #talis #honor
non quam super #causa #ceterus #nomen
#tu* #uester* #terra #manus #caro uelut
#is* autem #uolo #iustitia #fides ante
#ego* #multus nunc #modus #res #ta
#deus #habeo iam #primus #paruus #iudicium
ad ne #uita semper apud usque
hic #sanctus ac #audio #pax quantum
sed enim #cor #mundus #salus #lex
ut etiam #nam #debeo siue #fidelis
de #noster* #do #uiuo #eternus #sol
#suus* #uerus #solus #cado #inuenio #celestis
#ille* #uideo unde inter #frater #potior
a sicut quidem #o #uir uidelicet
cum #alius tam #diligo magis tunc
quod ita propter #uoluntas #fors #angelus
ipse tamen #quidam #gloria #us #diuinus
#tuus* #filius #bonus quoque #certus #summus
#omnis #spiritus ergo atque #loquor #ideo
si #christus #tempus #aliqui #uox #prior
#sui* #bonum sine #malum #iustus #populus
per #ecclesia nisi #mens post #episcopus
#facio #opus #unus #oculus #misericordia #similis
#homo xque #dies #nihil #celum #os
#dico sic #nullus #secundum adhuc #nouus
quia #magnus ubi #pars #domus #tantum
#dominus #iste* #corpus #mors #uis #uia
#meus* #anima #locus #peccatum #beatus licet
nec #pater #uirtus #scio #quomodo #predico
#quis #gratia #totus #hildegars #ueritas #fratres
#duo #quero

not realized as separate words (Rybicki and Eder, surviving in manuscripts with a strongly divergent
2011, p. 320). Latin, for instance, is a heavily in- orthography, stylometric methods may detect arti-
flected language that makes use of affixes to mark ficially large differences. Conversely, and likewise
the grammatical functions of wordsby iron, not due to scribal interference, texts of non-identical
by sword being for example ferro non gladio authorial provenance may show artificial similarities
(Sapir, 1921, ch. VI). Therefore, it would have when they survive in manuscripts with a similar
made sense to additionally study the character orthographical profile. In medieval manuscripts,
n-grams in the corpus. we might even find inconsistent word spellings for
However, one runs into the aforementioned the same words throughout the same text (Rigg,
problem that historical languages are characterized 1996). This ultimately implies that an approach
by unstable orthography (Piotrowski, 2012). based on character n-grams is unadvisable for medi-
Although Latin spelling variation seems to have eval Latin (cf. Kestemont and Van Dalen Oskam,
been less pronounced than in vernacular medieval 2008). Unfortunately, this means that our approach
languages, it does constitute a serious issue. When based on lemmatization cannot take into account
comparing two texts written by the same author, stylistic subtleties below the word level (e.g.

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 207


M. Kestemont et al.

indicative versus subjunctive mood, as expressed in (PCs) (Binongo and Smith 1999, p. 464). As is cus-
case endings). However, we will demonstrate that tomary since Burrows (1987), our PCA is based on
our method is still able to harvest sufficient stylistic the correlation matrix, appropriately scaling the ori-
information from the texts. Indirectly, our results ginal word frequencies.
will therefore even serve to emphasize how much Fig. 3 shows the scatterplot that results from our
grammatical information is in fact still expressed first experiment. Each authors samples are visualized
by isolated function words in medieval Latin. as black letter combinations: the first letter of the
authors name is followed by a digit, indicating the
samples indexed position in the respective episto-
6 Testing Principal Components laria. G_EP-4, for instance, is the fourth sample of
10,000 lemmatized words taken from Guiberts
Analysis epistolarium.11 At this stage, we are restricting
Hildegards epistolarium to the letters that are not
The first stylometric technique we adopt is principal
components analysis (PCA), a procedure derived associated in any way with Guiberts secretaryship.
from multivariate statistics and commonly used to Fig. 3 displays a remarkably clear authorial separ-
reduce the dimensionality of a data set (Binongo, ation of the samples. Guiberts samples (G_EP) are
2003). By combining the original variables of a data concentrated in the upper-right quadrant, whereas
table into new, uncorrelated compound variables or the samples from Hildegards epistolarium (H_EPNG)
principal components, PCA is able to summarize are invariably positioned to the left. Finally, Bernards
large and complex data sets into insightful lower- samples (B_EP) form a tight cluster of samples in the
dimensional scatterplots. When applied to the lower-right half of the plot. The density of this last
frequencies of high-frequency items in texts, this cluster thus points at a clear stylistic unity, despite
technique often successfully reveals the authorial the fact that, as noted earlier, Bernard must have
structure in a data set. PCAs good performance in been assisted in his epistolary work by a true personal
authorship attribution is due to the fact that it ex- chancellery consisting of at least five different collab-
plicitly tries to model correlations between word orators (Leclercq, 1987, p. 14752).
frequencies. Especially the frequencies of function Additionally, the plot in Fig. 3 contains a series of
words show complex correlations that are related high-frequency items in light grey, the component
to stylistic, arguably authorial choices between loadings, visualizing how strongly the 65 lemmas
small sets of alternative options. A mere visual in- have contributed to the creation of the PCs. If a
spection of the samples positions in PCA scatterplots word can, for instance, be found to the far left of
often shows that samples written by the same author the scatterplot, this demonstrates that it is relatively
will cluster, whereas groups of samples written by more frequent in samples with a similar position in
distinct authors lie further apart. the plot. Our first scatterplot thus shows that the use
Because of the considerable size of the epistolaria of et (and) and a (from) is surprisingly typical of
in the corpus, we could start with a large sample size Guiberts writings, whereas the use of the prepos-
of 10,000 lemmatized words per sample. Recent re- ition in (in) is very characteristic of the Hildegard
search has demonstrated that the accuracy of most samples. In comparison, the use of the lemmas non
authorship attribution techniques is likely to in- or si seems to be relatively more typical of Bernards
crease when larger samples are taken (Eder, 2010; writing. The scatterplot does not reveal any anoma-
Luyckx and Daelemans, 2011). Our selection of the lies and it is safe to assume that the high-frequency
epistolaria of exactly three authorsHildegard of grammatical lemmas argue in favor of a clear styl-
Bingen, Guibert of Gembloux and Bernard of istic differentiation between our authors.
Clairvauxrespects the fact that it is theoretically The remarkable stylistic differences with respect
unadvisable to include more than three authors in a to a number of specific lemmas used by our authors
PCA, especially when the discussion of the results is can be highlighted in another way. The boxplots in
restricted to the two first Principal Components Fig. 4 visualize information about the absolute

208 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

Principal Components Analysis

35
-6 -4 -2 0 2 4

et G_ep-3
G_ep-4

30
idem
G_ep-5
G_ep-11
G_ep-6
G_ep-1
G_ep-8
G_ep-2
0.2

4
G_ep-12 a
G_ep-9
G_ep-10
post inter
semper usque dum uel

Proportion of variance explained (in %)


scilicet G_ep-7
ad quoque e

25
quoniam
0.1

siue

2
sicut
in super ut
_epNG-3 per
quasi cum licet
PC2 (16.9%)

ne
0.0

20
pro
qui

0
H_epNG-6
uelut
H_epNG-5
etiam
H_epNG-4
H_epNG-8
propter adhuc
H_epNG-2 ipse B_ep-7
epNG-7 undetunc xque uidelicet quantum
H_epNG-1 atque
ita ante B_ep-9
sine
B_ep-13 apud
-0.1

ac B_ep-12

-2
quia B_ep-14 quam

15
B_ep-16
B_ep-8 de aut
autem contra B_ep-6
B_ep-11 tam
nisi
magis
nunc hic tamen nec iam
sic enim quidem
sed ubi B_ep-3
B_ep-15
B_ep-10
B_ep-5 si
-0.2

-4
B_ep-2 B_ep-1

10
ergo non
quod

B_ep-4
-0.3

-6

5
-0.3 -0.2 -0.1 0.0 0.1 0.2
PC1 (37.8%)

0
65 MFW Culled @ 0%
Pronouns deleted Correlation matrix

Principal components

Fig. 3 PCA of the epistolaria by Hildegard, Guibert, and Bernard (10,000 lemmas/sample)

frequencies (medians, quartiles, etc.) for three inter- words. Interestingly, these differences coincide with
esting function wordsin, et, and nonin samples stylistic observations that have been made in trad-
of 2,000 words. In boxplot (a) concerning the use of itional philological research. Given the visionary
in, the primary column refers to the counts in discourse developed in much of her writings
Hildegard; in the second boxplot (b) dealing with even in her lettersit is not surprising to come
et, the left column concerns Guibert; and in boxplot across an intensive use of the preposition in in
(c), with the results for non, Bernards results are Hildegards letters. She repeatedly sees things in
displayed in first column. The secondary column in divine visions; she continuously searches the alle-
all three boxplots refers to the material by the two gorical meanings buried in the multitude of details
other authors, e.g. Guibert and Bernard in boxplot that she discovers in her visions (Dronke, 1998).
(a). These boxplots indeed reveal unmistakable dif- Guiberts writings are especially notorious for their
ferences between the respective epistolaria with re- all too inflated and artificial style, and Guiberts
spect to the frequency of these important function wearisome tendency to compose extremely long

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 209


M. Kestemont et al.

sentences, full of coordinating conjunctions (see the problem of sample size needs to be put forward
also Derolez, 1988, p. V and IX). Bernards frequent (Eder, 2010; Luyckx and Daelemans, 2011): while
use of non can be related to the didactic nature of the first disputed visio at stake in this article still
his epistolary expositions in which he very often contains 7,489 lemmas, the latter only counts
relies on an antithetical style to illustrate his 3,301 words. The scatterplots in Fig. 5a and b
thoughts (Mohrmann, 1958; Pranger, 2011, p. 222). show the results of the same procedure as in Fig. 3
but using sample sizes of 5,000 and 1,000 lemmas,
respectively. This clearly illustrates the decrease
7 Testing Delta in discriminatory performance of our PCA when
we reduce the sample size in our experiments.
For our PCA displayed in Fig. 3, we have been work- Fig. 5b demonstrates that the authorial dis-
ing with extremely generous sample sizes of 10,000 crimination becomes less powerful, in particular
lemmas each. Because the ultimate goal of this art- between Guibert and Bernard in the vertical
icle remains the attribution of the Visio ad component.
Guibertum missa and the Visio de Sancto Martino To what extent will we be able to rely on PCA for a
of which the authorship seems very questionable, fairly solid attribution of a text, like the Visio de

(a) Boxplot for "in" (b) 180 Boxplot for "et"


120

160
Absolute frequency per slice (2000 words)

Absolute frequency per slice (2000 words)


100

140
80

120
100
60

80
40

60
20

Primary (41/41) Secondary (145/145) Primary (62/62) Secondary (124/124)

(Wilcoxon rank sum: p < 0.05) (Wilcoxon rank sum: p < 0.05)

Fig. 4. (ac) Boxplots of the absolute frequencies of in, et, and non in epistolary samples of 2,000 lemmas

210 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

(c) Boxplot for "non"


which it is closest in style according to the metric.

70
As such, Delta uses a nearest neighbor reasoning
(Argamon, 2008). We can apply a leave-one-out
validation with Delta as follows. We can temporar-
60

ily treat each sample in our collection as anonym-


ous. Next, we can have Delta attribute the
Absolute frequency per slice (2000 words)

anonymized sample to one of the candidate authors


and check whether the suggested attribution is suc-
50

cessful or not. If at the end of this procedure, we


divide the number of correct attributions by the
total number of samples in the data set, we get a
40

percentage that offers a useful approximation of the


general effectiveness of our technique, should it, for
instance, be applied to real-world samples of un-
30

known provenance.
Fig. 6 shows the result of this leave-one-out val-
idation for various sample sizes (multiples of 100
20

lemmas, ranging from 500 to 4,000). It is obvious


that larger sample sizes invariably lead to higher
accuracies in cross-validation. Yet, whereas the ini-
10

tial accuracies are fairly low (even < 85%), the attri-
bution success quickly rises above the psychological
barrier of 95% (sample sizes > 1,500 lemmas) and
Primary (83/83) Secondary (103/103)
becomes entirely flawless when dealing with sample
(Wilcoxon rank sum: p < 0.05)
sizes of ca. 3,000 lemmas or more. For a text count-
Fig. 4. Continued ing 3,301 lemmas, like the Visio de sancto Martino,
we might well reach an attribution accuracy of
about 99%. Moreover, because these numbers are
in line with earlier reports concerning modern lan-
Sancto Martino, of only ca. 3,000 words? Although guages (Eder, 2010; Luyckx and Daelemans, 2011),
the scatterplots in the previous section demonstrate Fig. 6 again demonstrates that even a highly in-
the general validity of the stylometric approach for flected language like Latin contains a satisfying
our corpus, it makes sense to apply a second attri- amount of useful stylistic information in its gram-
bution technique to our corpus to validate the out- matical lemmas alone.
come of the PCA more precisely. Because it is By now, we can assume that, when applied cau-
unfeasible to generate new scatterplots for every tiously, PCA should offer enough solid ground to
small change in parameter settings like e.g. sample make conjectures about the authorship of the vi-
size in our experiments, we additionally apply sions in the corpus traditionally attributed to
Burrowss Delta (2002) to the epistolaria. Hildegard. Following a nearest neighbor reasoning
In its traditional implementation, Delta offers a (Argamon, 2008), we can plot unseen, anonymous
similarity metric to determine the authorship of an- texts together with the works of established author-
onymous works. Based on the frequencies of a small ial origin and investigate to which of the authorial
set of high-frequency items, Delta computes the clusters the unseen work is most similar in style.
stylistic distance between an unknown sample and However, before moving on to the analysis of the
a set of samples written by a series of candidate visions, we have first tested this attribution proced-
authors. It will attribute the anonymous sample to ure. In the PCA scatterplot in Fig. 7, we have added
the author of the (single) sample in the data set to a new, anonymous sample (amounting to 3,706

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 211


M. Kestemont et al.

lemmas) by author X to equal-sized samples from Martini, written by Bernard around 1150. An
the aforementioned epistolaria. The new sample interesting fact about this example is that even
turns out to be stylistically much more similar to though the topic and genre of this text are perhaps
Bernards samples than to those by Hildegard or quite different from the epistolary material of our
Guibert. Should this sample have been truly an- candidate authors (viz. a sermon about the afore-
onymous, the analysis would have offered firm mentioned Saint Martin), it is clear that our PCA
grounds for conjecture that the text from which procedure allows for solid conclusions. Although
the sample is derived is actually authored by one should perhaps not always expect such clear-
Bernard of Clairvaux. In this specific case, this rea- cut stylistic, authorial differentiation in historical
soning would have led to a historically sound at- corpora, this promising example clearly illustrates
tribution, as the anonymous text we have the benefits of the present methodology for
questioned is in reality the Sermo in festo sancti (future) research.

B H
(a) G

2
PC2 (12.9%)

-2

-4

-5 0 5 10

PC1 (29.7%)
65 MFW Culled @ 0%
Pronouns deleted Correlation matrix

Fig. 5 (a and b) PCAs with reduced sample sizes (5,000 and 1,000 lemmas/sample)
(continued)

212 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

(b) B H
G

2
PC2 (6.7%)

-2

-4

-6

0 5
PC1 (14.3%)
65 MFW Culled @ 0%
Pronouns deleted Correlation matrix

Fig. 5 Continued

H_EPG, respectively, in a PCA, we get the result in


8 Guiberts Secretaryship: Fig. 8.
Synergy and Beyond? We notice that the first, horizontal PC captures an
impressive 37% of the original variation in our data
As discussed earlier, we have discerned two groups and primarily relates to the stylistic differentiation
of letters in Hildegards epistolarium: one that must between Guiberts own letter collections (G_EP) and
have originated at the time when Volmar was still the anterior portion of Hildegards epistolarium
Hildegards secretary and that bears no potential (H_EPNG). Interestingly, we see that the second PC
traces of Guiberts interference, and another con- in the right half of the plot (still capturing 9.4% of
taining the letters that are likely to have been revised the original variation) discriminates between
by Guibert. If we confront samples of 5,000 lemmas Hildegards non-Guibertian letters and her letters
from both portions, labeled here H_EPNG and that can be associated with Guiberts secretaryship.

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 213


M. Kestemont et al.

Cross validation

1.00 oo oo
o ooo o o oo ooo oo ooooooooooooo
o oo o o oo
o o o
oo o o
o o o
o
o oo
o
Cross-validation accuracy (%)

o o
o o
o
0.95

o
o
o o
oo o
o
o
0.90

o
o o

o
0.85

500 1000 1500 2000 2500 3000 3500 4000


Sample size

Fig. 6 Cross-validation using Delta (dotted lowess line fitted)

These results thus suggest that there do indeed exist they continue to be somewhat more similar to
stylistic differences between the oldest portion of Hildegards style. This result is reminiscent of
Hildegards epistolarium and the letters in which the Synergy Hypothesis, recently discussed by
we expected to discern Guiberts editorial finger- Pennebaker (2011).12 Pennebaker puts forward
prints. They also confirm what can be deduced three hypotheses concerning the stylistic effect of
from the surviving manuscript evidence. The so- collaborations between different authors. Such pro-
called autograph copy of the Liber divinorum jects can produce a language that is (1) similar to
operum mentioned earlier offers unique insight the one produced by a single person writing alone,
into the way in which Hildegards collaborators (2) the average of the two writers, or (3) unlike
must have edited her texts under her supervision either of one of the styles that the collaborating au-
(Derolez, 1972). Fig. 9, showing a number of lines thors would produce on their own. Based on ex-
from the randomly selected page 370 of MS Ghent, ploratory research on the Federalist papers and
University Library, 241, makes it clear that it was the Beatles songs, Pennebaker ultimately argues in
function words in particular that were often altered favor of the latter, so-called Synergy view on col-
by Hildegards correctors; tam being erased, quod laborative authorship, not refuting however the pos-
being replaced by ut or quia, ad being added, etce- sibility that one of the collaborating authors might
tera. A collaboratorespecially Guibert, who is have remained more influential with respect to the
known to have had a great deal of freedom in his end product (cf. Petrie et al., 2008). This Synergy
editorial workmay thus have had a notable Hypothesis thus might be applicable to a certain
impact on Hildegards stylistic profile. extent to the HildegardGuibert collaboration,
However, in Fig. 8, we see that the samples from where the result of the creative process does not
Hildegards epistolarium that bear the influence of fit in with the other letter samples written by
Guiberts interference do not seek the company of Hildegard or Guibert individually, although the
Guiberts own writings in the scatterplot. After all, result is somewhat more similar to Hildegard.

214 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

B H
G X

2
PC2 (11.6%)

-2

-4

-6

-5 0 5 10
PC1 (26.9%)
65 MFW Culled @ 0%
Pronouns deleted Correlation matrix

Fig. 7 Attribution of an anonymized sermo X to the Bernardian corpus

More can be learned about the stylistic dichot- letters written under Volmars secretaryship with
omy in Hildegards epistolarium by applying a those that become typical when Guibert took over
MannWhitney test to the lemmas occurring at the editorial work in the preservation of her letters.
least twice in 4,000 lemma samples. Here, we tem- The lemmas have been ranked and plotted accord-
porarily leave the realm of high-frequency lemmas ing to the U test statistic obtained for each lemma.
and venture into the lower-frequency strata of the Fig. 10 learns how the use of the relative pronoun
lexical spectrum. Hence, this test will not particu- qui (who) for instance only becomes prominent in
larly emphasize the discriminatory power of high- letters edited by Guibert, who is indeed notorious
frequency lemmas, as was the case with our other for constructing eloquent but complex sentences
tests (Kilgariff, 2001). Fig. 10 contrasts the words with a lot of embedded relative clauses. Moreover,
that were predominantly used in the Hildegards this latter group of letters is also characterized by a

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 215


M. Kestemont et al.

Principal Components Analysis


-15 -10 -5 0 5

35
H_epNG-14
0.2

H_epNG-
H_epNG-16

5
G_ep-14 H_epNG-15
H_epNG-8
non sed
H_epNG-1
hicergo H_epNG-4

30
G_ep-2 nunc
G_ep-9 ad
ubi
G_ep-8
et G_ep-1 H_epNG-11
dumG_ep-7 tamen H_epNG-7
tunc
H_epNG-3
G_ep-15 G_ep-16
si G_ep-17
G_ep-5 tam
G_ep-18
inter
aut sicut
H_epNG-2quod
quia
H_epNG-5
unde in
quidemG_ep-23
G_ep-11
nisi quasi
contraH_epNG-12
uel iam
proadhuc
magis G_ep-4
ut G_ep-3 ne

25
nec

Proportion of variance explained (in %)


0.0

0
G_ep-6 G_ep-12 G_ep-19
apud uidelicet
siue
quam G_ep-13 cum H_epNG-10
atque
uelut
G_ep-10
licet G_ep-24
G_ep-20
post usque
dequantum
super sic
ac ita
G_ep-21 propter autem
a
xque
e idem
G_ep-22
ante enim

20
quoque quoniam
PC2 (9.4%)

H_epNG-9
semper per
etiam
sine scilicet
ipse H_epNG-6
qui

-5
-0.2

15
H_epG-2
H_epG-5
H_epG-3
H_epG-1

10
-10
-0.4

5
-15
H_epG-4

0
-0.4 -0.2 0.0 0.2
PC1 (37%)
65 MFW Culled @ 0% Principal components
Pronouns deleted Correlation matrix

Fig. 8 PCA of the epistolarium of Guibert, of the letters of Hildegard transmitted without Guiberts editorial assistance,
and of the Guibertian letters in Hildegards epistolarium (5,000 lemmas/sample)

more dry and stereotypical ecclesiastical vocabulary


(omnipotens, sanctus, spiritus, verus, . . . ), whereas
the letters not influenced by Guibert betray a
more direct and lively narrative style (sed, tunc,
nunc, dico, ergo, deinde, . . . ), possibly more true to
Hildegards own preferred way of expressing
herself. We might thus be inclined to agree with
Newman (1987, p. 24) when she stated: Purists
can at least rejoice that the collaboration [between
Guibert and Hildegard] began only after the
seers major works were completed. From the
methodological point of view, these results also
show that the discriminatory effects in lower-
frequency strata correspond with the stylistic di-
chotomy present in the high-frequency vocabulary,
thus corroborating the performance of the latter Fig. 9 MS Ghent, University Library, 241, p. 370 (detail).
methodology. Reproduced with permission

216 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

Before Guibert With Guibert

sed omnipotens
nunc e
dico qui
in uerus
hic licet
deinde summus
ergo efficio
non
precipio
ubi
designo
populus
a
rectus
per
uenio
uel
mysticus
imito
amo
sanctus
quare
quomodo indumentum
mens numquam
surgus iesus
quod cesso
ualde semper
interdum possum
semetipse solus
0.0

0.2

0.4

0.6

0.8

1.0

0.00

0.02

0.04

0.06

0.08
Mann-Whitney U Mann-Whitney U

Fig. 10 Results of MannWhitney test (U statistic) comparing the vocabulary in Hildegards epistolarium before and
during Guiberts secretaryship

Let us finally turn to the original incentive All subplots in Fig. 11 clearly show that both
for the present article, namely, the authorship dis- visions tightly cluster with Guiberts epistolarium,
cussion concerning two texts of dubious proven- instead of with Hildegards. This effect is perhaps
ance: the relatively short Visio de Sancto Martino least prominent in Fig. 11a, where D_MART and
about Saint Martin (3,301 lemmas) and the some- D_MISSA display modest similarities to some of the
what longer Visio ad Guibertum missa (7,492 epistolary samples from the portion of Hildegards
lemmas). Fig. 11 offers the result of three PCAs in epistolarium that was revised by Guibert. In all three
which we have confronted both dubia (hence plots, however, the visions are generally speaking far
D_MART and D_MISSA) with the previously dis- more similar to Guiberts writings than to Hilde-
cussed epistolary collections, again using the same gards. Significantly, most samples resulting from
65 lemmas and a sample size of 3,301 lemmas. Fig. the combined authorial voices of Hildegard and
11a considers all texts by all authors; Fig. 11b ex- Guibert again do not display any significant rap-
cludes Bernards texts; Fig. 11c only considers prochement to the epistolaria of the individual au-
Guiberts epistolarium and the anonymous vision- thors. These observations seem to reinforce the
ary texts. Synergy Hypothesis. Moreover, the visions quasi-

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 217


M. Kestemont et al.

random position in the final subplot (Fig. 11c) re- works of his first hagiographers Sulpicius Severus
veals no pronounced stylistic differences with Gui- (c. 363425) and Gregory of Tours (538594). It
berts letters, regarding the high-frequency lemmas is interesting to note that despite their interwo-
analyzed. They invariably cluster with Guiberts venness within the same intertextual tradition,
epistolary oeuvre, making him a much more plaus- they are still clearly distinguished and therefore
ible author than Hildegardat the very least, from a demonstrate that topic-related stylistics hardly
stylistic point of view. interferes with the author-related differences. The
An important, yet inconspicuous, last feature of visionary texts under investigation thus betray
Fig. 11a is that it includes the Sermo in festo sancti Guiberts stylistic influence to such an advanced
Martini, even though it can hardly be spotted extent that we could wonder whether we should
among Bernards other samples. This sermon not entirely attribute these texts to Guibert, in-
deals, just like the Visio de Sancto Martino, with stead of arguing for any form of synergetical col-
Saint Martin. Both texts were even clearly influ- laboration, as was still possible for the portion of
enced by the same late Antique hagiographical the epistolarium over which both Hildegard and
narratives concerning this saint, namely, the Guibert labored.

Principal Components Analysis

B_ep-12
B_ep-13
B_ep-10 B_ep-4
4

B_ep-3
B_ep-18 B_ep-5
B_ep-2
B_ep-6
B_ep-8
B_ep-11
B_ep-32
B_ep-29 H_epNG-10
B_ep-28
B_ep-34 H_epNG-20
H_epNG-16
H_epNG-3
H_epNG-24
B_ep-1 B_ep-33
B_ep-14 H_epNG-7
B_ep-9
B_ep-41B_ep-48
B_ep-37
B_ep-22
B_ep-45 H_epNG-2
B_ep-7
B_ep-25
B_ep-44
B_Mart-1
B_ep-19
B_ep-38 H_epNG-21
H_epNG-19
B_ep-15 H_epNG-6
2

B_ep-31 H_epNG-4
H_epNG-15
B_ep-46
B_ep-23
B_ep-43
B_ep-50
B_ep-49 H_epNG-11
B_ep-36
B_ep-24 B_ep-17
B_ep-40
B_ep-35B_ep-42 H_epNG-23
H_epNG-5
B_ep-47
B_ep-26
B_ep-16
B_ep-30 H_epNG-1 H_epNG-22
B_ep-27
H_epNG-18
H_epNG-13
H_epNG-14
H_epNG-12
PC2 (11.6%)

B_ep-21
B_ep-20 H_epNG-17
B_ep-39 H_epNG-9
0

H_epNG-8
G_ep-21 H2_epG-7
G_ep-37 H2_epG-5
G_ep-23 H2_epG-2
H2_epG-3
G_ep-3
-2

G_ep-35
G_ep-29
G_ep-34
G_ep-20
G_ep-4
G_ep-2
G_ep-26
G_ep-10
G_ep-25
G_ep-5
G_ep-19
G_ep-16
G_ep-30
G_ep-27
G_ep-32
G_ep-28
G_ep-33
G_ep-14
G_ep-18
G_ep-13
G_ep-12
G_ep-22 H2_epG-8
G_ep-15
G_ep-7
G_ep-31 H2_epG-4
-4

G_ep-9 G_ep-24 H2_epG-1


G_ep-8D_Missa-2
G_ep-36
G_ep-17
G_ep-6 D_Missa-1 H2_epG-6
G_ep-11
G_ep-1

D_Mart-1
-6

-5 0 5 10
PC1 (24.6%)
65 MFW Culled @ 0%
Correlation matrix

Fig. 11 PCAs including the Visio de Sancto Martino and the Visio ad Guibertum missa

(continued)

218 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

Principal Components Analysis

10
H2_epG-6

8
H2_epG-8
6

H2_epG-5
H2_epG-1
4
PC2 (8.1%)

H2_epG-4
H2_epG-3
H2_epG-2
H_epNG-8
H2_epG-7 H_epNG-13
2

G_ep-17
G_ep-31 G_ep-33
G_ep-36
G_ep-6G_ep-19 H_epNG-14
G_ep-15 H_epNG-9
H_epNG-17
G_ep-8 G_ep-30
D_Mart-1
G_ep-28
G_ep-11D_Missa-1
G_ep-1 H_epNG-4
G_ep-18
0

G_ep-9 G_ep-34
G_ep-32 G_ep-25
G_ep-24
G_ep-35G_ep-4
G_ep-29 H_epNG-12
H_epNG-15
G_ep-14
G_ep-23
G_ep-26
G_ep-12
D_Missa-2
G_ep-7
G_ep-27 H_epNG-2
G_ep-10
G_ep-5
G_ep-37 H_epNG-18
H_epNG-3H_epNG-6
G_ep-16
G_ep-20
G_ep-13 H_epNG-7
G_ep-3
G_ep-22 H_epNG-19
-2

H_epNG-10
H_epNG-22
H_epNG-16
G_ep-21
G_ep-2 H_epNG-11
H_epNG-1
H_epNG-5
H_epNG-20
H_epNG-21
H_epNG-23
H_epNG-24
-4

-5 0 5
PC1 (31.1%)
65 MFW Culled @ 0%
Correlation matrix

Fig. 11 Continued

9 Conclusions to have demonstrated that these issues do not


need to imply that stylometry, when applied cau-
It is obvious that the experiments reported in this tiously, cannot yield valid research results in the
article only touch the tip of the iceberg of the field of medieval philology.
research on Hildegards complicated authorship, First we showed that authorial discrimination
to say nothing of the exciting, broader topic of was possible in the corpus studied. Although sam-
twelfth-century Latin writing. As stated in our ples had to be big enough to yield correct attribu-
Introduction, individuality and authorship tions, stylometric methods were generally able to
remain complex issues when it comes to medieval model the overall differences in writing style. This
literature. Even an authoritative and highly idio- suggests that superficial interference from scribes
syncratic author like Bernard of Clairvaux is (or even later editors) can be by-passed to a certain
known to have been assisted by a team of collab- extent, for instance through lemmatization.
orators. It is moreover clear that medieval scribes Interestingly, we obtained satisfying results with a
often gradually introduced errors and deviations word-level approach, notwithstanding the fact that
when successively copying exemplars, thus pos- Latin is a highly inflected language. Although other
sibly altering the original authors style in the strategies might increase attribution accuracies in
surviving copies of texts. Nevertheless, we hope the future, this shows that even in highly inflected

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 219


M. Kestemont et al.

Principal Components Analysis

G_ep-1
G_ep-2
D_Mart-1

4
G_ep-11
G_ep-17
G_ep-24
G_ep-8
2

G_ep-7
G_ep-13
G_ep-31
G_ep-36
G_ep-9 G_ep-18
G_ep-14
D_Missa-1
PC2 (8.3%)

G_ep-6
G_ep-12
G_ep-3
G_ep-25 G_ep-4
G_ep-29
0

G_ep-15 G_ep-10
G_ep-19G_ep-30
G_ep-20
G_ep-32 G_ep-27
G_ep-22 D_Missa-2
G_ep-35
-2

G_ep-16 G_ep-28 G_ep-5


G_ep-34
G_ep-33

G_ep-37 G_ep-26
-4

G_ep-23
G_ep-21

-6 -4 -2 0 2 4 6
PC1 (9.7%)
65 MFW Culled @ 0%
Correlation matrix

Fig. 11 Continued

languages, plenty of stylistic information can already what Pennebaker (2011) has called the Synergy
be harvested at the word-level. Hypothesis: when two authors are involved in the
In the course of our research, we have also same texts, the end result need not resemble
touched on collaborative authorship, an issue that the writing style of one of the two individually;
recently has raised considerable interest in stylom- the result might rather resemble that of a new,
etry (Reynolds et al., 2012). Our methodology third author. The evidence offered in this particular
enabled us to discover clear stylistic differences in case study is valuable in this light, but at the same
Hildegard of Bingens epistolary work between those time still too scant to come to a final verdict on this
letters for which she had relied on the modest as- fascinating topic.
sistance of her first collaborator Volmar and the Finally, with respect to our initial research ques-
letters that have been compiled and copy-edited tion, we hope to have convincingly disputed the
by Guibert of Gembloux. Interestingly, the letter authorship of two texts allegedly attributed to
samples influenced by the collaboration between Hildegard: the Visio de Sancto Martino and the
Hildegard and Guibert formed an isolated cluster Visio ad Guibertum missa. We argued that these vi-
that did not display advanced stylistic similarities sions are stylistically speaking completely in line
to Hildegards former epistolary oeuvre, nor to with the writing style of Guibert de Gembloux,
that of Guibert. These results argue in favor of Hildegards last secretary. These results offer

220 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

quantitative support to suspicions voiced in earlier, Sources from the Medieval Low Countries
traditional philological research: if Guibert is not to (SMLC), directed by Jeroen Deploige.
be considered their original author altogether, it is
clear that he reworked these texts so profoundly that
hardly anything of Hildegards writing style is still
discernible in them. In fact, it is noteworthy that our References
analyses could not offer any stylistic evidence at all Argamon, S. (2008). Interpreting Burrowss Delta: geo-
that Hildegard once authored (even a preliminary metric and probabilistic foundations. Literary and
or simply oral version of) these texts, although this Linguistic Computing, 23(2): 13147.
remains of course an interesting historical Binongo, J. (2003). Who wrote the 15th book of Oz? An
possibility. application of multivariate analysis to authorship attri-
bution. Chance, 16(2): 917.
Binongo, J. and Smith, W. (1999). The application of
principal components analysis to stylometry. Literary
Acknowledgements and Linguistic Computing, 14(4): 44666.
We thank the Corpus Christianorum Library & Bird, S., Klein, E., and Loper, E. (2009). Natural
Knowledge Centre of Brepols (Turnhout) and in Language Processing with Python. Analyzing Text with
particular Luc Joque for generously putting at our the Natural Language Toolkit. Sebastopol: OReilly.
disposal the corpora analyzed in this article. Marco Burrows, J. (1987). Computation into Criticism. A Study of
Passarotti (Universita Cattolica del Sacro Cuore, Jane Austens Novels and an Experiment in Method.
Milan) generously provided us with the IT-TB, Oxford: Clarendon Press.
while Helma Dik (University of Chicago) provided Burrows, J. (2002). Delta: a measure of stylistic differ-
the word list from the Perseus Project (Tufts ence and a guide to likely authorship. Literary and
University). We are moreover very grateful for the Linguistic Computing, 17(3): 26787.
valuable feedback from Albert Derolez, Wim Cerquiglini, B. (1999). In Praise of the Variant: A Critical
Verbaal, Antoon Bronselaer, and Guy De Tre. In History of Philology. Baltimore: JHU Press.
addition, we thank the anonymous reviewers of
Chrupala, G., Dinu, G., and van Genabith, J. (2008).
the Digital Humanities Conference 2013 for their Learning morphology with Morfette. Proceedings of
helpful comments on this research project, as well the International Conference on Language Resources
as the anonymous reviewers of this journal, in par- and Evaluation, LREC 2010, 17-23 May 2010.
ticular, for their extensive feedback on the normal- Marrakech, Morocco: European Language Resources
ization procedures described. Mike Kestemont Association, pp. 23627.
developed the stylometric methodology for this art- Coakley, J. (2006). Women, Men and Spiritual Power:
icle. Sara Moens brought in her domain expertise Female Saints and Their Male Collaborators. New
concerning Guibert of Gembloux and medieval York: Columbia University Press.
epistolography. Jeroen Deploige, who took the ini- Constable, G. (1976). Letters and Letter-collections.
tiative for this collaborative research, contributed Turnhout: Brepols.
from his involvement with Hildegard scholarship. Delehaye, H. (1889). Guibert, abbe de Florennes et de
All three authors contributed equally to the end Gembloux, XIIe et XIIIe siecles. Revue des Questions
result. Historiques, 46: 590.
Deploige, J. (1998). In Nomine Femineo Indocta.
Kennisprofiel en Ideologie van Hildegard van Bingen
Funding (1098-1179). Hilversum: Verloren.
Deploige, J. (2005). Anonymat et paternite litteraire dans
This work was supported by the Research lhagiographie des Pays-Bas Meridionaux (ca. 920 - ca.
Foundation Flanders, of which both Sara Moens 1320). Autour du discours sur loriginal et la copie
and Mike Kestemont are fellows, and by the Flemish hagiographique au Moyen Age. In Renard, E.,
Hercules Foundation, which finances the project Trigalet, M., Hermand, S., and Bertrand, P. (eds),

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 221


M. Kestemont et al.

Scribere Sanctorum Gesta. Recueil detudes dhagio- Kestemont, M., Daelemans, W., and De Pauw, G.
graphie medievale offert a Guy Philippart. Turnhout: (2010). Weigh your wordsmemory-based lemmatiza-
Brepols, pp. 77107. tion for middle Dutch. Literary and Linguistic
Deploige, J. and Moens, S. (eds), Visio de Sancto Computing, 25(3): 287301.
Martino et Visio ad Guibertum missa. In Deploige, J., Kilgariff, A. (2001). Comparing corpora. International
Embach, M., Evans, C., Gartner, K., and Moens, S., Journal of Corpus Linguistics, 6(1): 4966.
Hildegardis Bingensis opera minora. Pars secunda. Klaes, M. (ed.) (2001). Hildegardis Bingensis Epistolarium.
Turnhout: Brepols, forthcoming. Pars III. Turnhout: Brepols.
Derolez, A. (1972). The genesis of Hildegard of Bingens Kohler, R. (2005). Synergetic linguistics. In Kohler, R.,
Liber divinorum operum. The codicological evidence. In Altman, G., and Piotrowoski, R. G. (eds), Quantitative
Gumbert, J.P. and De Haan, J.M. (eds), Litterae Linguistik/Quantitative Linguistics. Ein Internationales
Textuales. Essays Presented to Gerard I. Lieftinck. II: Handbuch/An International Handbook. Berlin, New
Texts & Manuscripts. Amsterdam: Van Ghent, York: Walter de Gruyter, pp. 76075.
pp. 2333.
Koppel, M., Schler, J., and Argamon, S. (2009).
Derolez, A. (ed.) (19881989). Guiberti Gemblacensis epis- Computational methods in authorship attribution.
tolae: quae in codice B.R. BRUX. 5527-5534 inveniuntur. Journal of the American Society for Information Science
Turnhout: Brepols. and Technology, 60(1): 926.
Derolez, A. and Dronke, P. (eds), (1996). Hildegardis Leclercq, J. (1962). Saint Bernard et ses secretaires. In
Bingensis Liber Divinorum Operum. Turnhout: Brepols. Recueil detudes sur Saint Bernard et ses ecrits, Vol. 1.
Dronke, P. (1998). The allegorical world-picture of Rome: Edizioni di storia e letteratura, pp. 325.
Hildegard of Bingen: revaluations and new problems. Leclercq, J. (1987). Lettres de S. Bernard: histoire ou
In Burnett, C. and Dronke, P. (eds), Hildegard of litterature? In Recueil detudes sur Saint Bernard et ses
Bingen: The Context of Her Thought and Art. London: ecrits, Vol. 4. Rome: Edizioni di storia e letteratura,
The Warburg Institute. pp. 125225.
Eder, M. (2010). Does size matter? Authorship attribution, Leclercq, J. and Rochais, H. (eds), (19741977).
small samples, big problem. Digital Humanities 2010. Epistolae In Sancti Bernardi opera, Vols 78. Rome:
Conference Abstracts. Kings College London, pp. 1325. Editiones cistercienses.
Eder, M., Kestemont, M., and Rybicki, J. (2013). Leclercq, J., Talbot, C. H., and Rochais, H. (eds), (1957
Stylometry with R: a suite of tools. Digital Humanities 1977). In Sancti Bernardi opera. Rome: Editiones
2013. Conference Abstracts. University of Nebraska- cistercienses.
Lincoln, pp. 48789. Luyckx, K. and Daelemans, W. (2011). The effect of
Embach, M. (2003). Die Schriften Hildegards von Bingen. author set size and data size in authorship attribution.
Berlin: Akademie Verlag. Literary and Linguistic Computing, 26(1): 3555.
Ferrante, J. (1998). Scribe quae vides et audis. Hildegard, Moens, S. (2010). Twelfth-century epistolary language of
Her Language, and Her Secretaries. In Townsend, D. friendship reconsidered. The case of Guibert of
and Taylor, A. (eds), The Tongue of the Fathers. Gender Gembloux. Revue belge de Philologie et Dhistoire,
and Ideology in Twelfth-Century Latin. Philadelphia: 88(4): 9831017.
University of Pennsylvania Press, pp. 10235. Mohrmann, C. (1958). Observations sur la langue et le
Herwegen, I. (1904). Les collaborateurs de Ste. Hildegarde. style de saint Bernard. In S. Bernardi opera, Vol. 2.
Revue Benedictine, 21: 192204; 30215; 381403. Rome: Editiones cistercienses, pp. IXXXXIII.
Juola, P. (2006). Authorship attribution. Foundations and Newman, B. (1987). Sister of Wisdom. St. Hildegards
Trends in Information Retrieval, 1(3): 233334. Theology of the Feminine. LA: University of California
Press.
Kestemont, M. and Van Dalen-Oskam, K. (2009).
Predicting the past: memory-based copyist and author Newman, B. (ed.) (1998). Voice of the Living Light:
discrimination in medieval epics. In Calders, T., Hildegard of Bingen and Her World. LA: University of
Tylus, K., and Pechenizkyi, M. (eds), Proceedings of California Press.
BNAIC 2009. Eindhoven: Benelux Association for Nichols, S. (1997). Why Material Philology? Some
Artificial Intelligence, pp. 1218. Thoughts. Zeitschrift fur deutsche Philologie, 116: 1030.

222 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015


Collaborative authorship in the twelfth century

Passarotti, M. and DellOrletta, F. (2010). Improvements Stamatatos, E. (2009). A survey of modern authorship
in Parsing the Index Thomisticus Treebank. Revision, attribution methods. Journal of the American Society
Combination and a Feature Model for Medieval for Information Science and Technology, 60(3): 53856.
Latin. In Calzolari, N., Choukri, K., Maegaard, B., Van Acker, L. (1989). Der Briefwechsel der heiligen
Mariani, J., Odijk, J., Piperidis, S., Rosner, M., and Hildegard von Bingen. Vorbemerkungen zu einer kri-
Tapias, D. (eds), Proceedings of the International tischen Edition. Revue Benedictine, 99: 11854.
Conference on Language Resources and Evaluation, Van Acker, L. (ed.) (19911993). Hildegardis Bingensis
LREC 2010, 1723 May 2010. Valetta: European Epistolarium. Turnhout: Brepols.
Language Resources Association, pp. 169471.
Pennebaker, J. (2011). The Secret Life of Pronouns. What
our Words Say About Us. NY: Bloomsbury.
Notes
Petrie, K., Pennebaker, J., and Sivertsen, B. (2008).
1. Among the letters written with the help of Volmar, we
Things we said today: a linguistic analysis of the
count those in MS Wien, Osterreichische Nationalbi-
Beatles. Psychology of Aesthetics, Creativity, and the
bliothek, 963 (theol. 348), which offers a copy of a
Arts, 2(4): 197202. collection compiled by Volmar before 1173 (Van
Pitra, J. B. (1882). Analecta Sacra et Classica Spicilegio Acker, 1991, p. XXVI), and the limited number of letters
Solesmensi Parata, Vol. 8. Paris: A. Jouby et Roge. that can be found distributed over MS Stuttgart, Wurt-
Piotrowski, M. (2012). Natural Language Processing for tembergische Landesbibliothek, Cod. theol. phil. 48
Historical Texts. California: Morgan & Claypool 253; MS Wien, Osterreichische Nationalbibliothek,
Publishers. 881; MS Berlin, Staatsbibliothek Preussischer Kultur-
besitz, Cod. theol. lat. fol. 699; MS London, British
Pranger, B. (2011). Bernard the Writer. In McGuire, B.P.
Library, Cod. Add. 17292; MS Paris, Bibliotheque
(ed.), A Companion to Bernard of Clairvaux. Leiden:
Nationale, Nouv. Acquis. Lat. 760; MS Trier, Stadtbi-
Brill, pp. 22048.
bliothek, Cod. 771/1350 and MS Kynzvart, Cod. 40.
Reynolds, N., Schaalje, G., and Hilton, J. (2012). Who Among the letters compiled and edited under Guiberts
wrote Bacon? Assessing the respective roles of Francis supervision, we count those in the Riesenkodex Wies-
Bacon and his secretaries in the production of his baden, Landesbibliothek, 2 (dating from 1177-1179/
English Works. Literary and Linguistic Computing, 1180), that are not also found in MS Wien, Osterrei-
27(4): 40925. chische Nationalbibliothek, 963 (theol. 348) (Van
Rigg, A. (1996). Orthography and pronunciation. In Acker, 1991, p. XXVII), as well as those copied in MS
Mantello, F. and Rigg, A. (eds), Medieval Latin: An Berlin, Staatsbibliothek Preussischer Kulturbesitz, Cod.
Introduction and Bibliographical Guide. Washington: lat. 48 674, which bear traces of Guiberts editorial as-
The Catholic University of America Press, pp. 7982. sistance (Klaes, 2001, p. XVII). Among the letters con-
tained in the latter group, compiled under Guiberts
Rybicki, J. and Eder, M. (2011). Deeper delta across
supervision, we obviously encounter all Hildegards
genres and languages: Do we really need the most fre-
letters addressed to Guibert and the ones that have
quent words? Literary and Linguistic omputing, 26(3):
been written in the years in which he stayed in
31521.
Rupertsberg.
Sapir, E. (1921). Language: An Introduction to the Study of 2. MSS Brussels, Royal Library, 53975407 and 5527
Speech. New York: Harcourt, Brace & Co.. 5534 (both originating from Gembloux, early thir-
Schinke, R., Greengas, M., Robrtson, A. M., and teenth century) and MS Brussels, Royal Library,
Willett, P. (1996). A stemming algorithm for Latin 15101519 (originating from Sint-Maartensdal near
text databases. Journal of Documentation, 52(2): 17287. Louvain, fifteenth century).
3. See www.brepolis.net. The critical editions of the works
Schmid, H. (1994). Probabilistic part-of-speech tagging
of both Hildegard of Bingen and Guibert of Gembloux
using decision trees. Proceedings of the International
are published in several volumes in Brepolss own
Conference on New Methods in Language Processing.
Corpus Christianorum series. For the works of
Manchester, UK.
Bernardus, the Brepols Library of Latin Texts relies
Schrader, M. and Fuhrkotter, A. (1956). Die Echtheit des on Leclercq et al. (19571977).
Schrifttums der heiligen Hildegard von Bingen. Quellenkri- 4. Bernards letters, edited by Leclercq and Rochais
tische Untersuchungen. KeulenGraz: Bohlau Verlag. (19741977), contain the official epistolarium,

Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015 223


M. Kestemont et al.

compiled shortly after Bernards death, as well as letters orthographical variant that we leave unaddressed is
transmitted elsewhere. Guiberts letters were edited by ()exs versus ()ex, because it is difficult to auto-
Derolez (19881989) on the basis of MS Brussels, matically detect it using a rule-based approach.
Royal Library, 55275534. Nevertheless, this variant hardly affects any of the func-
5. See note 1. Hildegards letters are edited by Van Acker tion words to which we have restricted our analyses.
(19911993) and by Klaes (2001) 8. In these training data too, we have substituted all vs
6. We supplemented this list with three wordsplerum- for us and all aes/oes for es.
que, utrumque, and quicumqueyet did not allow any 9. Note that licet, which strictly speaking derives from
of these items into the restrictive set of function words the impersonal verb licere, is considered a function
we list below. We did not consider other, much less word because it is primarily used as a subordinating
frequent clitics (e.g. ne (if) or ve/ue (or)), because concessive conjunction.
it is difficult to automatically detect these using a 10. Other errors in the lemmatization displayed in
simple rule-based approach and to distinguish them Table 3 are hildegars, us, and ta.
from e.g. the ne in deuotione or the ue in serue. 11. Note that from this point onwards, we will express
7. We have described our approach in a generic way for the size of textual samples in terms of the number of
future reference. It should be noted, however, that consecutive lemmatized words they contain (a
there still remains a small number of possible spelling number which, after tokenization, need not be iden-
variants in medieval Latin that are hard to deal tical to the original number of surface forms in the
with but that were not relevant for the present research original texts).
because we worked with critical editions that have 12 For the sake of conceptual clarity we shall keep
already normalized orthography to a large extent. Pennebakers original terminology, although it
One can think here of the interchangeability should be stressed that our present use of the term
of mqu and nqu in some words and the problem Synergy Hypothesis is completely unrelated to the
of single/double consonants (as e.g. in litera and lit- concept of Synergetic Linguistics in the field of quan-
tera). A lesser frequent, yet still important, titative linguistics (Kohler, 2005).

224 Digital Scholarship in the Humanities, Vol. 30, No. 2, 2015

Potrebbero piacerti anche